Jump to content

Cap Question


lucky5115

Recommended Posts

The Flawed Idea Behind Any Form of AB Testing:

 

The premise behind all forms of audio AB testing is that the memory is the measuring instrument that provides output data to the experiment. You listen to a sample, followed by listening to another sample, and you then must compare your memory of one or more samples to the current experience. You repeat this serially. This is using the human memory as a measuring instrument to provide output data. That represents a flawed premise for the entire experiment which is rarely debated.

When DBT is used in say, pharmaceutical trials, most of the output data is provided by specific, calibrated and reliable instruments like X-Ray machines, MRI, blood gas analysis, chemical assay and so on. This provides the concept of repeatability - any other researcher should be able to confirm or duplicate the results, because the instruments and measures are standards.

 

In audio, the instrument is human memory. Here's a few very significant factors to be considered about memory:

 

1. Memories are constructions made in accordance with present needs, desires, influences, etc.
2. Memories are often accompanied by feelings and emotions.
3. Memory usually involves awareness of the memory.

 

In this model any memory has a dynamic, instantaneous value in relation to the sum total of all sensory inputs, current emotions, and state of being of the "machine" (person) storing the memory. Strike a bell at this moment, and the memory will be different than striking the bell at some other moment. Same bell, different universal circumstances, and therefore different memory. That's roughly equivalent in reliability to a volt meter which zeroes itself to some random voltage before you make each measurement.

 

What we know about AB testing is that when A is grossly different than B, people easily pass the test. As you shrink the difference between A and B, more and more people fail the test. When you reach the useful "resolution" of an AB memory test, most people fail. This is where the technicians trip over their feet. They assume the failure means: "people have reached the limit of differences they can hear." That's not the meaning of the results at all. The meaning is: "We have reached the limits of resolution for using human memory as the measuring instrument in an AB test." 

 

You can measure a lot of things with a wooden yardstick - very useful tool. You can't reliably measure the thickness of paper though. The tool doesn't have the necessary resolution at that small dimension. Now you need a caliper or something similar to that.

 

Referring to items 1, 2 and 3 above, you simply can not extract the intention of the subject from the act of listening. All listening experiences contain their intent, the subject and the object as an indivisible whole. This is why bias can never be eliminated - because even intent of the subject is a bias.

 

The problem is the reliance on memory, period. I fail to see how DBT solves the "unreliable memory" problem in any way at all. To the contrary, it is wholly based on memory for the entire test.

 

I think the people promoting DBT show a lot of misunderstanding of neuroscience, memory, and perception. Short term, long term, makes no difference. It is not an instrument for measurement and comparison for such fine resolution. Sure if A is an apple, and B is a banana, it works fine. When A is a Pioneer and B is a Mark Levinson, that's not going to work the same way.

 

1. We DO NOT RECORD all input! And just in that fact alone, the use of memory should be tossed out the window. We only store essential patterns, partial inputs, enough to make the recall. If we were storing ALL sensate input in full bandwwidth, our brains would have to be the size of Texas. (EX: If you drove an hour on the freeway tonight to get home, can you recall the license # of each car that passed you? The year, make and model of each car? Description of all the occupants? Obviously not---but you "saw" it all with your eyes. If you tried to store all the sound you heard in a few minutes at something like CD resolution, your brain would explode.)

 

2. A "memory" of a sound is not a recording of the soundwaves! Nor is it a digital representation, nor is it necessarily even stored in a contiguous brain space. Electrical impulses from the ear system are "mixed" (yeah, like a 16 track mixer) along with visual inputs, other sense inputs, emotional content, other similar patterns from previous events, and wellness stimulus. Point being, there is no discrete memory object of "just the sound" that was heard when A was played or when B was played. Proponents of this are pretending that the brain functions like a tape recorder. Ain't so.

 

3. DBT depends on this real time analysis: A past memory must be compared to a current stream of conscious perception and a determination for differential must be made. Well, hang on, these are two vastly different brain processes. You might as well try to compare volt meter readings to the direction a weathervane is blowing. It's nonsense. That's why it only works with "Apple and Banana" level comparisons

There's just absolutely no science behind using DBT in audio. All the assumptions are flawed.

 

The Null Hypothesis or How I Batted .650 in The Audio League:

 

ABX testing sets out with a common goal: to disprove the Null Hypothesis. In the beginning, looking at A and B, it is hypothesized that no difference exists between A and B. This is referred to as the "null hypothesis." It is the starting point of all ABX testing in audio. "A and B have no sonic difference."

 

The object for the Subject of the test is to disprove the null hypothesis. This is done by accepting a number of challenges to measure an unknown (X) against the known A and B, and correctly declare X to be A or B. The results are tallied, and using statistical techniques, it is determined if the Subject "disproved" the null hypothesis or not. If he did not disprove it, he failed. Then further, all the subjects, and all the results are tallied to compute the result of whether the null hypothesis was disproved for the entire test.

 

Let's assume each Subject is given 20 challenges. In order to disprove the null hypothesis they must be correct on more than half of the challenges. How many more than half is established by what confidence level is desired in the test. The confidence level describes how certain you want to be that the right answers were not caused by luck or accident. The higher the confidence level, the more right answers are required, and the higher is the confidence that the result is not due to luck or accident. To achieve the typical 95% confidence level with 20 challenges, the Subject must be right 14 times out of his 20 challenges.

 

Subjects with 13 correct identifications then, "fail" the test. And of course, subjects with 10 correct are assumed to have the same percentage as anyone simply "guessing" their way through the test by flipping a coin. They are dubbed "fail" also - even though half the time they were (potentially) right. (And also, you could simply guess 15 right answers and pass the test.)

So, if you hear that 60 people took the test and they all failed, you can assume that more than half of all the challenges were met with the RIGHT ANSWER - meaning the Subject correctly identified X as being either A or B.

 

Now, this ABX testing is a statistical exercise, not empirical. It's conclusions are drawn through computation, not observation and direct experience. People who made honest effort and got 13 right only to fail, are computed in with those who guessed and got 15 right which is a PASS. There is no discrimination as to how the actual test was experienced for each subject. A field might consist of all guessers or all experts and the results might be the same.

 

Statistics, in order to work its own significance, has to assume that 13 right answers is the result of an accident, or pure guessing, and thus a "fails to disprove the null hypothesis". But empirically, we could ask the Subject if they were guessing, and the Subject might say no. In other words, his 13 right answers were a direct experience, empirical observation he made, along with his 7 wrong ones, and on this basis, you could say, "The Subject had 13 hits in 20 at bats and is hitting .650!" Not bad for such a difficult task.

 

I'll take a .650 batting average to the stock market, or to Vegas, or to the racetrack any day.

Now it could be that he was guessing, and lying, and that's what the statistical model seeks to neutralize. But to be sure, it's just a computation not direct experience.

So all of this is to clarify that the claims of ABX tests, such as "everyone failed to identify the Pioneer from the Mark Levinson" leaves a lot of information off the table.

 

If you are going to gamble $200M of your company's capital on a new drug, the statistical virtues of DBT and the Null Hypothesis makes a lot of sense. You can run many tests, and invest the money when you get the highest statistical confidence level, right? That makes perfect business sense, and the DBT is a valuable tool in the toolbox. But, is that how you do your hobby? You might be perfectly happy to correctly pick the right amp for you using a .650 batting average! Is that a "FAIL"?

 

Imagine trying to judge camera lenses the same way we do AB testing in audio. Here's an experiment to consider the problem.

Using two makes of high quality 50mm lenses, two photographs are taken of the same scene filled with a lot of detail and contrast. The two nominally identical photos are given to a subject who is asked to find the differences that exist, if any.

 

Now, to be clear, the lenses are physically very different in construction. Number of lens elements, arrangement of elements, glass composition, and so on. They only have in common their specifications as to aperture and focal length so that the picture will be the same in terms of exposure, field of view and content. Light passing through a lens and being refocused into a plane behind is one of the most complex engineering feats. It involves many compromises even in the finest lenses ever made. The two lenses are different because their design invokes different sets of compromise.

 

Back to our subject. The subject will usually take both photos and lay them side to side and study them together as two whole, inclusive entities. Using full parallel perceptual processing the photos are examined in detail. Casual review may reveal "no difference" for some subjects but, a more studied review, particularly by someone with expertise and or training, will reveal small differences in sharpness, spherical distortions near the edges, coma, color aberrations, and so on. For those subjects who see the difference, some judgment can be made about which is better. Ok---easy enough.

 

But now suppose the subjects weren't given both photos simultaneously to compare side by side? Suppose it worked like this: One photo is labeled "A" and the other is "B". They will be viewed using a special technique that creates a serial memory presentation. The "A" photo is loaded into a "roller box" with a horizontal slit measuring perhaps 1/4" high by 8 inches wide - (the width of the photo). The photo is rolled past the viewing slit at the rate of 1-inch per second. In 12 seconds the entire photo is rolled through the box with the subject viewing it as it moves past the slit. The entire photo is viewed, but not all at once.

 

Now, the "B" photo is loaded and rolled past the slit in the same 12-seconds.

 

The subject is asked to identify whether there is a difference between "A" and "B" by guessing each time which photo went past the slit. How well would even the best photography experts do on this test?

 

That is essentially what an AB test is like in audio. A song or musical piece, is a serial stream of aural sensations just like the photo is a stream of light sensations rolling past the slit. You can't ever hear the audio stream as a "whole" the way you can view a whole picture all at once. And you sure can't hear TWO streams of audio simultaneously in the way you can examine two photographs at the same time.

 

In the photo AB testing, it would be doubtful that many subjects correctly identify "A" from "B". And testers would exclaim, "See, there IS NO DIFFERENCE, and we scientifically proved it!" And yet, a person could be handed the two lenses, and assuming they knew anything about optics they would understand immediately that light will pass differently through these two very different constructs. Well, there's no deep paradox here - - it is obvious the contrived photographic "slit test" was meaningless as a method of discriminating differences between lenses.

 

Likewise with audio, AB/X testing is the bulwark that is used by all those who like to prove "no differences" in wire, cable, amplifiers, cd players and so on. They make a false assumption from the start that the test is valid, when it's obviously not.  It's not, because the human memory is not a scientifically valid instrument for such a comparison. The principle of taking my memory of event "A" and comparing it for differences to my other memory of event "B" is flawed from the start. Yes, gross differences can be remembered, but in this area of extremely subtle differences between "A" and "B" memory isn't useful.

 

-- Mark Deneen. Used with Permission.

  • Thanks 1
Link to comment
Share on other sites

OK Dean, going by what you've posted, you could never tell a sound that you liked to a sound you don't like.  If you can tell the difference 100% of the time between sample A and sample B then DB testing should be valid.  Derrick recently posted a guitar playing notes through a SS amp then tube amp.  The purpose of the exercise was to try to tell which was which.  It was evident to me along with others.  Point being that you could hear and identify the differences.  If there's an obvious difference and you can consistently identify it then it's a valid test.  If you can't then you've spent money needlessly on boutique caps.

Link to comment
Share on other sites

2 minutes ago, CECAA850 said:

OK Dean, going by what you've posted, you could never tell a sound that you liked to a sound you don't like.  If you can tell the difference 100% of the time between sample A and sample B then DB testing should be valid.  Derrick recently posted a guitar playing notes through a SS amp then tube amp.  The purpose of the exercise was to try to tell which was which.  It was evident to me along with others.  Point being that you could hear and identify the differences.  If there's an obvious difference and you can consistently identify it then it's a valid test.  If you can't then you've spent money needlessly on boutique caps.

If you ask me with reference to what Derrick posted, it was more of an apples to banana comparison. You can definitely tell one from the other.

Link to comment
Share on other sites

Well, this is going nowhere, especially after you brought Deneenthedickbag into it.

 

If you have never done any direct comparison testing, then please don't tell me that A is better than B or C.  

 

Another cap discussion gone to hell.  So, what else is new?

Link to comment
Share on other sites

19 minutes ago, mike stehr said:

You can definitely tell one from the other.

OK, we've proved that your brain can distinguish different amplifiers playing the same thing.  Logic says we should be able to do the same thing with caps.  If better, more expensive caps sound better, you should be able to distinguish the two.  Otherwise, why spend the extra money if you can't hear the difference?

Link to comment
Share on other sites

2 minutes ago, jimjimbo said:

If you have never done any direct comparison testing, then please don't tell me that A is better than B or C.  

Dean is on record as having used different caps and preferring the sound of one over the other.  I'm not quite sure why he's trying to say that a listening test isn't a valid tool for discerning the difference. 

 

Gotta go to the casa.  Jim, behave.

 

Peace out.

Link to comment
Share on other sites

3 hours ago, CECAA850 said:

I'm not quite sure why he's trying to say that a listening test isn't a valid tool for discerning the difference. 

 

I'd like to know how someone would go about swapping out caps directly on the fly, and try to discern the difference. The Flash with clip-leads?

One way would be to rig up two different networks with different brand/type of caps, and somehow use a DPDT switch to swap between networks. Using just one side of an amplifier, or a mono amp. 

You could quickly swap between networks, to hear differences. A friend and I did this once with a 2A3 SET, and some large Mitsubishi amplifier...DX some such...

We used one speaker with a switch to switch between amps. Once we had the volume levels set even between both amps, switching back and forth quickly showed no differences unless they were really subtle. One's (at least mine) audible memory isn't that quick to note subtle differences. YMMV.

 

But when you take 2A3 amplifier or the Mitsubishi, and listen to one of them in your system for a week, and then switch to the other, one will or should notice the differences.

 

I would guess that this the same for capacitor networks.

Link to comment
Share on other sites

15 hours ago, Deang said:

Well, he already admitted he was messing with me ... so I guess I assumed correctly.

 

He's still half deaf. Yeesh - look at his rig. You think he's doing the low level listening thing?

 

We are all half deaf by now.  Besides my active setups I have a Peach/VRD/Lascala setup, and Scott LK-72 /Cornwall setup both for near field listening and both sound great.  I use the lascalas all the time.  They have Dave's horns stock drivers with replaced daiphragms, and I recapped with Dayton's..........enjoyed them for awhile..........then changed the AAs to ALK Jr. per Al's schematic on his website...........Now I have the red caps.  Audyn?  I don't even remember.  Those sound fine to me as well.

 

Remember I listen to Grateful Dead recordings.

Link to comment
Share on other sites

14 hours ago, jimjimbo said:

Well, this is going nowhere, especially after you brought Deneenthedickbag into it.

 

If you have never done any direct comparison testing, then please don't tell me that A is better than B or C.  

 

Another cap discussion gone to hell.  So, what else is new?

 

For this discussion, the point of argument is of more interest than who wrote it.

 

The argument isn't against comparing, it's a case for how not to do it, and why the most accepted way doesn't work well.

 

You can build two pairs of networks with different parts, put a pair in, find a fixed position on the attenuator, and listen to a couple of recordings that you know well. When you're done, load up the other pair. Sometimes the difference is surprising, sometimes not so much.

 

In DBT, amplifiers typically sound the same. So, I guess that means all amplifiers sound the same? Even the simple A/B comparison is flawed, because the ears aren't the only thing involved here, but also the brain -- we need time to process what we're hearing.

 

Somewhat related; I've learned that some just don't pay that close attention to what they are hearing. As long as it's not distorted and sounds clean, they're happy.

 

Finally, there is the question of build quality. This is my primary problem with inexpensive metalized types. The cap is unprotected, uses recycled film (which has flaws in it), and lousy lead terminations. So, you have this pretty important thing that sits between all of your downstream stuff and your drivers -- and God forbid anyone spends more than $20 in capacitors.

Link to comment
Share on other sites

22 minutes ago, mark1101 said:

 

We are all half deaf by now.  Besides my active setups I have a Peach/VRD/Lascala setup, and Scott LK-72 /Cornwall setup both for near field listening and both sound great.  I use the lascalas all the time.  They have Dave's horns stock drivers with replaced daiphragms, and I recapped with Dayton's..........enjoyed them for awhile..........then changed the AAs to ALK Jr. per Al's schematic on his website...........Now I have the red caps.  Audyn?  I don't even remember.  Those sound fine to me as well.

 

Remember I listen to Grateful Dead recordings.

 

Mark, I was just messing with you, lol.

Link to comment
Share on other sites

2 hours ago, Deang said:

Finally, there is the question of build quality. This is my primary problem with inexpensive metalized types. The cap is unprotected, uses recycled film (which has flaws in it), and lousy lead terminations. So, you have this pretty important thing that sits between all of your downstream stuff and your drivers -- and God forbid anyone spends more than $20 in capacitors.

I can get on board with this (no pun intended).

 

What about the caps in the H2s?  Any idea on shelf life?

Link to comment
Share on other sites

20 hours ago, Deang said:

Yes, gross differences can be remembered, but in this area of extremely subtle differences between "A" and "B" memory isn't useful.

 

-- Mark Deneen. Used with Permission.

Wow Dean, I was really impressed with your writings on ABX testing and the technical correctness of your statistical explanation until the very last line which identified Mark Deneen as the author!  :lol:

 

Seriously, that was excellent observation using statistics and spot on in its accuracy.

 

So my fellow Klipschites, in your own words, explain what the null hypothesis is.  There will be a 10 point quiz at the beginning of the next class.  B)

Link to comment
Share on other sites

Right, I don't have the command of the language like Mark.

 

Whatever criticisms some might have regarding Mark, try not to forget that he has a long list of impressive credentials, including starting two audio companies. Paragon was a big deal back in the day, and the Blueberry is a Stereophile Class B rated component. And oh yeah, the NBS is basically a point to point wired version of it.

 

7 hours ago, CECAA850 said:

What about the caps in the H2s?  Any idea on shelf life?

 

Epoxy coated oval Mylars -- bottom of the barrel stuff. Bob has measured these ad-nauseum, and most measure as bad or worse than anything he's ever come across. They have all reached the end of their useful life.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...