Jump to content

Double Blind testing is like fishing without knowing what kind of fish you're after!


russ69

Recommended Posts

  • Replies 100
  • Created
  • Last Reply

Top Posters In This Topic

During the first pilgrimage to Indy the good people there had an ABX on speaker wire. IIRC this was zip cord versus home made woven CAT-5. Also IIRC this was originally planned to be Monster Wire versus something else. But this was not done.

There were a selection of CD's. I chose pink noise. I might have brought that CD with me.

My thought was and is that ABX testing is far too tough in the ususal setting with music. You are switching successively between different times in the recording. Say: time span 1 versus timespan 2 and then 3 versus 4 and 5 versus 6, etc. In music notation, those would be different bars. In view of the constantly shifting target, I don't find it odd that so many people / systems flunk the ABX test.

That could be solve by using the same 3 second sample restarted with every button push. More easy to do these days through the magic of computers, but I have not read of it.

The alternative is just pink noise. That way the program is the same.

Wm McD

There is also another aspect to this. Those of us familiar with statistics know what reversion to the mean means (that is, mathematically, not just the "concept") .

I took the ABX speaker wire test at the first Pilgramage. It was also done with a recording that I was very familiar with and that I considered to be of very high quality, and recorded with methods I'am familiar with & use in my own live recordings.

As I recall, Trey told me afterwards that initially I had actually been quite correct on most of my choices, but as the listening test wore on, the positive results started to decline, in the end came out not much better than average. There actually is some math to this, however, lets not consider the math for a moment. What happens the longer we listen, and the more closely we listen, concentraing ever more, for longer and longer periods of time? We get fatigued. Our ear/brain hearing system gets tired, just as a well trained athlete would be tired at the end of a long run, certainly more tired than when they started. And this doesn't even take into consideration the unfamiliar components in the rest of the system, nor the room/acoustics. Do these things help to isolate the one thing being evaluated? Or do they make it more difficult?

ABX double blind long term short term whatever, all have their inherent strengths and weakness. If you don't have an "original" live source for comparison, or at least familiarity (from memory and extensive experience - you are a concert pianist for instance), then you have no basis to form anything other than a subjective opinion. A sounds different from B, and B is playing. I don't like the sound of B. Who cares.

Link to comment
Share on other sites

What you're suggesting (and it's something I agree with) is that not all of the differences between pieces of gear are always audible. You have to have source material that brings out that difference...and sometimes it may only be a few notes in a piece that will demonstrate the difference. Heck, sometimes you gotta be in the right listening mood to hear it too.....most audiophiles get so up tight when doing blind listening because they're too worried about their ego to enjoy the music.

With that in mind, there is no time limit on a DBT. If you think you would have more success flipping between A and B once every week, then by all means conduct the test that way. Another thing you might try is find that short segment of a piece that brings out the difference and then use that in your 'quick' AB. This is why I keep suggesting to have someone change your gear on you without you knowing. If you can identify without your eyes that something changed, even if it took a few weeks, then you have demonstrated a perceptible sonic difference. And heck, we're only looking for a difference....identifying which is better is a whole different issue. The only way acoustic memory matters is in being able to identify the model #'s of the different gear.

Hi Mike -- I actually agree with most of what you said. I'm not opposed to the idea of DBT, but think the process is flawed. When 60 people can't tell the difference between a $200 Pioneer receiver and a pair of $4000 Mark Levinson monoblocks -- something's seriously wrong. I once spent some time comparing an Aragon 4004 MK II and a Bryston 3BST. They sounded pretty different to me, but something tells me that in a DBT, they would sound "identical".

Link to comment
Share on other sites

>The DBT process is based on the premise that auditory memory is
unreliable, and was designed to minimize or mitigate the effects.

IF I get into DBT for some reason, the above will not apply as I'd have no idea what it was SUPPOSED to sound like. My auditory memory of audio space/time events is rather excellent. I wish the rest of it worked so well. Whether mine or someone elses playback chain, I cannot make a judgement past "I like it" or "I don't like it" without a complete understanding of what the recording sounded like before it was a recording. Without a reference point of known accuracy, I go with the "fishing" analogy.

There are a few exceptions, such as certain albums or CD's I've heard hundreds of times on my system, but even my judgement of them is referenced to my own location work.

Dave

Link to comment
Share on other sites

What you're suggesting (and
it's something I agree with) is that not all of the differences between
pieces of gear are always audible. You have to have source material
that brings out that difference...and sometimes it may only be a few
notes in a piece that will demonstrate the difference. Heck, sometimes
you gotta be in the right listening mood to hear it too.....most
audiophiles get so up tight when doing blind listening because they're
too worried about their ego to enjoy the music.






With that in
mind, there is no time limit on a DBT. If you think you would have more
success flipping between A and B once every week, then by all means
conduct the test that way. Another thing you might try is find that
short segment of a piece that brings out the difference and then use
that in your 'quick' AB. This is why I keep suggesting to have someone
change your gear on you without you knowing. If you can identify
without your eyes that something changed, even if it took a few weeks,
then you have demonstrated a perceptible sonic difference. And heck,
we're only looking for a difference....identifying which is better is a
whole different issue. The only way acoustic memory matters is in being
able to identify the model #'s of the different gear.











Hi Mike -- I actually agree with most of what you said. I'm not opposed to the idea
of DBT, but think the process is flawed. When 60 people can't tell the
difference between a $200 Pioneer receiver and a pair of $4000 Mark
Levinson monoblocks -- something's seriously wrong. I once spent some
time comparing an Aragon 4004 MK II and a Bryston 3BST. They sounded
pretty different to me, but something tells me that in a DBT, they
would sound "identical".






The problem I have with
your example is that you're focusing on the price. For example, I just
designed a $0.40 circuit at work that outperforms the noise performance
of a $5 solution by about 10dB. That's about the same price
differential between your Pioneer and your Mark Levinson [:o] Now I'm
not trying to say the cheapy Pioneer is better, but the price points
alone don't tell you anything other than you're gonna have a hard time
selling the point that the cheaper solution is better [;)]






I have every reason to believe
the Mark Levinson is a better amplifier, but in what ways? If you're
buying extra performance that already was good enough, then I wouldn't
expect it to show up in a DBT. I think a perfect example would be tire
traction on a 1/4 mile drag. If your engine can't overcome the traction
of your tires, then installing tires with better traction isn't going
to get you down the track faster. You need to upgrade the engine first.






I
would be very surprised if you couldn't identify a difference between
the Aragon 4004 and the Bryston 3B, but at the same time I'm pretty
confident that I would be able to configure the test so that you didn't
hear a different either. It's all about knowing where your amplifiers
start to lose traction. A great place to start would be different
speaker loads and then different preamps....basically where the
amplifiers directly interact with the rest of the system. But even
then, items further away like the source material and room acoustics
can play a huge role too.






To be honest, I'm not at all troubled by the fact that there was a
scenario where the Pioneer and the Mark Levinson sounded the same.
However, I wouldn't use that test to encourage my Mark Levinson owning
friends to buy the Pioneer without listening to it in their own system
first. I bet if we dove into the designs that we would learn why they
were able to sound so similar in that application. They didn't have all 60 in the same room at the same time did they?


Link to comment
Share on other sites

I

would be very surprised if you couldn't identify a difference between

the Aragon 4004 and the Bryston 3B, but at the same time I'm pretty

confident that I would be able to configure the test so that you didn't

hear a different either. It's all about knowing where your amplifiers

start to lose traction. A great place to start would be different

speaker loads and then different preamps....basically where the

amplifiers directly interact with the rest of the system.

That sounds reminiscent of the 'ol Bob Carver challenge, which he won against Stereophile by doing exactly what you describe. He made his solid state amp sound similar enough to a Conrad Johnson tube amp to fool the editors at Stereophile in a single blind test. He later produced that amp, but it sounded nothing like the Conrad Johnson then, because the variables he controlled in the test no longer applied in buyers listening rooms.
Link to comment
Share on other sites

The problem is the reliance on memory, period. I fail to see how DBT solves the "unreliable memory" problem in any way at all. To the contrary, it is wholly based on memory for the entire test.

The DBT doesn't need to be any more accurate than a person's acoustic memory.

I think it's hypocritical to claim that acoustic memory is good enough for "....a SUBJECTIVE approach [that doesn't] rely on specs and DBT, but rather on subjective intuitive resources", but not good enough for DBT. In fact, the reliance on acoustic memory is no different for either situation.

Link to comment
Share on other sites

If you want to see something really depressing (for me), check out the beating I took on the AVS forum this week.

.... They wanted a lot of measurements and DBT under my belt to support the subjective improvement I was claiming -- and I couldn't provide either. I didn't do well. ...

I read most of that thread and thought you did well. It seemed like some of the posters on that thread just wanted to get you to jump through some hoops for their own entertainment and they had no real interest in Klipsch speakers or improvements to the XOs. It means more to me to have an honest testimonial from someone who can give a before and after evaluation of an upgrade. Converting a room full of skeptics during a ten minute listening session says a lot more than a stack of charts to me.

Hey Fastlayne -- meant to thank you for the good words and forgot. Thanks for that!

Link to comment
Share on other sites

Memory is unreliable as a measuring instrument. It can't be calibrated.

It doesn't need to be reliable or calibrated for a DBT.

The "subjective, intuitive approach" is NOT an experiment, does not rely on making measurements, and has no requirements for calibration.

Sure, but the "subjective" approach requires memory....the very same memory capacity that a DBT requires.

Link to comment
Share on other sites

Then the results of your DBT are therefore uncalibrated and unreliable also.

Quite the opposite actually.

Since when is a DBT a measurement? There are no units of measure. It is a test of the listener, which includes the listener's own memory. Thus why the memory doesn't need to be calibrated.

Link to comment
Share on other sites

...For me, the fun of DBT is the debate itself. I could care nothing about the implementation or use of it. I relish my bias, my whole subjective approach.

I heard of the DBT years ago and never really paid that much attention to it. Some forums won't allow discussions about the results of them, at least, but this thread has been helpful for me to understand why the ABX comparison is fundamentally unreliable.

Link to comment
Share on other sites

Then the results of your DBT are therefore uncalibrated and unreliable also.

 

Quite the opposite actually.

Since when is a DBT a measurement? There are no units of measure. It is a test of the listener, which includes the listener's own memory. Thus why the memory doesn't need to be calibrated.

 

 

 

DEFINE: MEASUREMENT

"Measurement is the process of estimating the magnitude of some

attribute of an object, such as its length or weight, relative to some

standard."

In an ABX test, the question asked is X=B, or is X=A? To answer requires a comparison of the sound of X relative to previously heard sample A and previously heard sample B.  A direct comparison of any object to a standard object is called a  "measurement." 

What is being measured? Sample X is being measured. 

Who is doing the measuring? The subject is doing the measuring.

What is being deduced from the analysis of the measurements? The subject's ability to discriminate A from B.  

What is the instrument the subject uses to do the measuring? He/she uses a past memory of A and B to compare to a real time event called X.

What is the Standard for the measurement? A memory of A and a memory of B

What are the units of measurement? Equality and Not Equality.

How is the standard calibrated? It isn't.

SUMMARY: Subject is asked to make a series of measurements using faulty, uncalibrated and unreliable instruments in order to determine how accurately he can make said measurements.

Yeah, that makes a lot of sense to me. 

Summary: Mark doesn't understand ABX testing. I agree that the above is invalid, but it's not what the real engineers are testing with an ABX. Making incorrect assumptions and conclusions is a problem with the one making them, not the process/test itself.

Link to comment
Share on other sites

The statistical analysis has nothing to do with a DBT directly, it is a tool for interpreting data. In an ABX, it is merely a convenient way to draw a line in the sand for quick comparisons - a snapshot if you will. You don't have to use statistical analysis or even ABX for evaluation. Heck, you can even use subjective analysis if you want. None of the interpretations change the raw data itself.

Btw, you can't just assume that a person that fails a 90% confidence level has an average greater than 50%.

Link to comment
Share on other sites

I fail to see what point you are trying to make here.

The point is that a test is not flawed simply because you can find shortcomings with one method of analysis. I might also suggest that understanding the possible shortcomings of an analysis method is an intergral part to the analysis process.

Heck, you can just sit around and plug stuff in and out like most audiophiles do, if you want to!

Except the blind part removes the possibility for bias unrelated to the sound. For what it's worth, there is absolutely nothing wrong with buying something for reasons other than the perceived sonic performance, but those reasons should be stated as such...

Link to comment
Share on other sites

Except the blind part removes the possibility for bias unrelated to the sound. For what it's worth, there is absolutely nothing wrong with buying something for reasons other than the perceived sonic performance, but those reasons should be stated as such...

Mike and Mark, forgive me for interrupting this interesting and informative debate. But the above quote Mike made is something I've always wondered about DBT. It seems audio is the only hobby or profession that is constantly weighed down by those who favor DBT. If you purchase a new pair of skis, nobody asks or challenges another person to tell the difference between two visually identical although different skis after they've reached the bottom of the hill. Perhaps a better example would be this; as a guitarist I've read many guitar magazines and chatted with many fellow musicians and have never heard DBT ever mentioned. An original vintage guitar amplifier sells for more, or in some cases about the same as a new clone of that same amplifier. Never, has anyone suggested that visual or other bias plays a part in determining which sounds better, or more authentic. The reviewer, or musician might state their preference for ones aesthetics over the others, or the vintage appeal of stains and smoke residue verse new chrome and tweed. But I've yet to hear anyone challenge another to determine the sonic differences between the two in a DBT. I'm really not sure why this seems to only plague audio. Anybody?
Link to comment
Share on other sites

Just to be clear, if the subject's task is to choose which of A and B is the same as X, it seems there are three possible results but only two allowable responses:

1] Subject hears that A is the same as X and that B is not the same as X; selects A.

2] Subject hears that B is the same as X and that A is not the same as X; selects B.

3] Subject cannot distinguish a difference, wants to select missing choice of "Can't tell a difference", but that choice is missing, so he has to guess (and will be correct about half the time).

If the test is for discrimination ability, the assumption is that when the answers become 50/50 the limit of discrimination has been reached. But why not include a third choice of "can't tell"?

Why is there no third choice to indicate no difference is heard? Is there an assumption that the subject might hear a difference that they can't detect? Does that even make sense? Does the forced guess past the limit of subjective differentiation assume there are subliminal aspects to the choice (or guess)? If so, is it a given that a system that sounds subliminally better is desired? Something about the forced guessing and no third answer bugs me about this... it seems one would want to know at what point the discriminations reaches its limits and the guessing begins. That real limit is fuzzed up when the guessing is required.

There are those that test the frequency response of an amp by providing it an impulse signal. For some reason that method fails when used on listeners...

This thing reminds me of the problems with testing for paranormal powers. If ESP really existed, all the experiments would be confounded by the ESP powers. Guessing the future (a throw of the dice) would be blamed on telekinesis (making the dice move as "predicted"), reading someone's mind would be blamed on forcing that thought into their mind in the first place - for every test of an ESP power, there is another ESP power (either of the experimenter or the subject) that confounds any result (with ESP there is no "double blind"...)

For a different reason, all the placebo experiments have a fatal flaw. The fundamental assumption is that the placebo has no effect whatsoever, yet it does, thereby confounding the fundamental assumption of the experimental design.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...