Jump to content

Double Blind testing is like fishing without knowing what kind of fish you're after!


russ69

Recommended Posts

...

Heck, you can just sit around and plug stuff in and out like most audiophiles do, if you want to!

Except the blind part removes the possibility for bias unrelated to the sound. For what it's worth, there is absolutely nothing wrong with buying something for reasons other than the perceived sonic performance, but those reasons should be stated as such...

That sounds a bit elitist to me. To whom am I supposed to state my reasons for buying gear? Correct me if I'm wrong, but does that mean unless I can pick out a piece of equipment in a DBT, I cannot make the claim I chose it for perceived sonic performance?

Link to comment
Share on other sites

  • Replies 100
  • Created
  • Last Reply

Top Posters In This Topic

Just to be clear, if the subject's task is to choose which of A and B is the same as X, it seems there are three possible results but only two allowable responses:

1] Subject hears that A is the same as X and that B is not the same as X; selects A.

2] Subject hears that B is the same as X and that A is not the same as X; selects B.

3] Subject cannot distinguish a difference, wants to select missing choice of "Can't tell a difference", but that choice is missing, so he has to guess (and will be correct about half the time).

If the test is for discrimination ability, the assumption is that when the answers become 50/50 the limit of discrimination has been reached. But why not include a third choice of "can't tell"?

...

If the bias is there is no sonic difference between A and B, not allowing choice 3 will favor the bias in the final results. There should be a third choice and those answers should be considered separately. The number of 3 answers could say more about the participants ability to discern differences in sound than actual sonic differences (or lack of) with A or B.

Link to comment
Share on other sites

That sounds a bit elitist to me. To whom am I supposed to state my reasons for buying gear? Correct me if I'm wrong, but does that mean unless I can pick out a piece of equipment in a DBT, I cannot make the claim I chose it for perceived sonic performance?

Actually the DBTers are more egalitarian as they often think they can get as good sound from cheap gear as from expensive gear.

You don't have to justify why you buy something to anyone. And you can make any claims you want.

The other day I bought two Rado watches. They don't keep better time than a cheap watch but I don't care, I bought them because they look cool. That's it.

Link to comment
Share on other sites

I'm really not sure why this seems to only plague audio. Anybody?

Actually it is used often in food testing. Normaly new or "improved" flavors.

I did see a report about instrument makers who are trying to improve the sound of the violin in a report The Mystery of the Stradivarius. There was one in the group who uses a very scientific way to build his violins including computer diagrams and such as well as traditional methods, Jacques Fustier. They did a blind test between 4 seperate, 2 not named, one from Jacques Fustier and a Stradivarius. Everyone thought the Strat would win hands down because of the supreme craftmanship (the aged wood etc etc.) and were very snooby about it.

The winner was the Jacques Fustier and the looks on the faces of the judges was a riot. The whole series is interesting and worth a watch.

http://www.youtube.com/watch?v=Nphg4YVm37I

Link to comment
Share on other sites

The purpose of conducting a test is to have an outcome. That outcome is the result of analysing the results. If the analysis is flawed and produces meaningless outcomes, I feel pretty safe in declaring the test "flawed."

There are many ways to analyze the same set of data. I personally don't prescribe to the snapshot method you take issue with.

Link to comment
Share on other sites

That sounds a bit elitist to me. To whom am I supposed to state my reasons for buying gear? Correct me if I'm wrong, but does that mean unless I can pick out a piece of equipment in a DBT, I cannot make the claim I chose it for perceived sonic performance?

Ummmm, who said you have to state reasons for buying gear?

As per your second point, if you can't identify a difference without knowing what you're listening to, then how can sonic performance be the ONLY criteria for the choice? How does knowing what you're listening to make it sound different?

Link to comment
Share on other sites

The number of 3 answers could say more about
the participants ability to discern differences in sound than actual
sonic differences (or lack of) with A or B.

The purpose
of the ABX is only to verify the "participants ability to discern
differences:"......the very fact that it is ABX and not AAX or BBX is
stating up front that A and B are not at all the same.

The only
thing that can be concluded by an ABX is that the listener did in fact
hear a difference. An ABX will not tell you that the listener could
never hear a difference, nor will it tell you that A is similar to B or
whatever.

I seriously feel like I'm watching people claim the
car is broken because turning the key in the exhaust pipe doesn't turn
the engine on....

Link to comment
Share on other sites

That sounds a bit elitist to me. To whom am I supposed to state my reasons for buying gear? Correct me if I'm wrong, but does that mean unless I can pick out a piece of equipment in a DBT, I cannot make the claim I chose it for perceived sonic performance?

Ummmm, who said you have to state reasons for buying gear?

For what it's worth, there is absolutely nothing wrong with buying something for reasons other than the perceived sonic performance, but those reasons should be stated as such...

Ummmm, you did?

Link to comment
Share on other sites

I'm really not sure why this seems to only plague audio. Anybody?

Actually it is used often in food testing. Normaly new or "improved" flavors.

I did see a report about instrument makers who are trying to improve the sound of the violin in a report The Mystery of the Stradivarius. There was one in the group who uses a very scientific way to build his violins including computer diagrams and such as well as traditional methods, Jacques Fustier. They did a blind test between 4 seperate, 2 not named, one from Jacques Fustier and a Stradivarius. Everyone thought the Strat would win hands down because of the supreme craftmanship (the aged wood etc etc.) and were very snooby about it.

The winner was the Jacques Fustier and the looks on the faces of the judges was a riot. The whole series is interesting and worth a watch.

http://www.youtube.com/watch?v=Nphg4YVm37I

Very cool. I would say Jacques Fustier has demonstrated a great ability to use science in the service of art. There was a comment made in that video about cultural shackles being the ultimate conservatism (or something to that effect), which I thought really resonates with a lot of the viewpoints I hold.

Link to comment
Share on other sites

That sounds a bit elitist to me. To whom am I supposed to state my reasons for buying gear? Correct me if I'm wrong, but does that mean unless I can pick out a piece of equipment in a DBT, I cannot make the claim I chose it for perceived sonic performance?

....As per your second point, if you can't identify a difference without knowing what you're listening to, then how can sonic performance be the ONLY criteria for the choice? How does knowing what you're listening to make it sound different?

So you are saying I, and others, misunderstood your post about stating reasons for buying equipment? I already said I have pretty much ignored DBTs over the years and now I reject them as unreliable for audio. I base all audio purchases on the hopes of improved sonic performance, but often that's not the only reason. I recently bought an integrated amp that had to have certain features, including a cinema bypass to use with my HT receiver. Unless I could expect improved sonic performance, I'd have continued to use the main speakers connected to the receiver.

Link to comment
Share on other sites

For what
it's worth, there is absolutely nothing wrong with buying something for
reasons other than the perceived sonic performance, but those reasons
should be stated as such...

Ummmm, you did?

You
must be taking that the wrong way then... I was referring to Mark's
list of non-audio reasons why people buy gear and I am trying to imply
that there are people that claim a sonic difference because of the
non-audio reasons.....they of course don't think that's the reason, so
it's certainly not intentionally deceptive.

I very much despise
the snoot appeal of an audiophile that hears an awesome sounding
system, but won't make any comment on the sound until they verify what
kind of cables are being used. I once witnessed a situation where the
system owner lied about owning "great cables", which unleased a huge
slew of praise upon the system.....and then when he corrected the dude
about the cables the dude became very indignant and started trying to
backtrack on his comments.

It happens all the time (not usually
that extreme) and some of the people most vocal about not being that
way I've found to be the worst. I simply can't fathom it, except to
call it sheer arrogance.

I'm not trying to imply anyone in
particular behaves like this in this thread, but I see it all the time
and it's incredibly depressing. In fact, this kind of craziness is why
many of my non-audio friends won't ever consider touching the audio
hobby.

Blind listening is just a great way to call these people out....like the example above.

Link to comment
Share on other sites

....The winner was the Jacques Fustier and the looks on the faces of the judges was a riot. The whole series is interesting and worth a watch.

http://www.youtube.com/watch?v=Nphg4YVm37I

Thanks for posting the link and I did enjoy watching the video. A couple of questions came up as I was watching. Does a Stradivarius sound the same today as it did when it was new 300 years ago? I wonder how the tone of a wooden instrument would not change over that much time, regardless of the superiority of the design. Also, I think when they are asking the judges their opinions it should not be in a group. Maybe they should not even be seated together or be able to see each other during the performances. Why blindfold the violinists? Do you think they did not know what violins they were playing anyway? That seemed a bit gimmicky

Link to comment
Share on other sites

You must be taking that the wrong way then... I was referring to Mark's list of non-audio reasons why people buy gear and I am trying to imply that there are people that claim a sonic difference because of the non-audio reasons.....they of course don't think that's the reason, so it's certainly not intentionally deceptive.

I very much despise the snoot appeal of an audiophile that hears an awesome sounding system, ...

...Blind listening is just a great way to call these people out....like the example above.

Thanks for the correction. I should taken into consideration your need to be concise, since you had stated you were replying from a cell phone, and not have reacted to that post at all.

I also don't care much for snobbery and find most of the people who enjoy the audio hobby are very down to earth. They are also having a great time experimenting with different audio equipment and tweaks.

But, the case where the guy led the listener to believe he had other cables could be seen as the listener being polite with the heaps of praise. I wasn't there, but I'm sure he wasn't happy about being duped either.

Link to comment
Share on other sites

The purpose
of the ABX is only to verify the "participants ability to discern
differences:"......the very fact that it is ABX and not AAX or BBX is
stating up front that A and B are not at all the same.

I think the BBX sounds best. [;)]

Link to comment
Share on other sites

The purpose of the ABX is only to verify the "participants ability to discern differences:"......the very fact that it is ABX and not AAX or BBX is stating up front that A and B are not at all the same.

I think the BBX sounds best. Wink

I believe BBC is best, as it appears to be somewhat more objective and refined sounding. [8-|]

Dave

Link to comment
Share on other sites

Just to be clear, if the subject's task is to choose which of A and B is the same as X, it seems there are three possible results but only two allowable responses:

1] Subject hears that A is the same as X and that B is not the same as X; selects A.

2] Subject hears that B is the same as X and that A is not the same as X; selects B.

3] Subject cannot distinguish a difference, wants to select missing choice of "Can't tell a difference", but that choice is missing, so he has to guess (and will be correct about half the time).

If the test is for discrimination ability, the assumption is that when the answers become 50/50 the limit of discrimination has been reached. But why not include a third choice of "can't tell"?

...

If the bias is there is no sonic difference between A and B, not allowing choice 3 will favor the bias in the final results. There should be a third choice and those answers should be considered separately. The number of 3 answers could say more about the participants ability to discern differences in sound than actual sonic differences (or lack of) with A or B.

Ten people participate in an audio ABX DBT.

A = X C = sound the same

4 select A

1 selects B

5 select C

The most reliable results would be to discard those who answered C. If those five participants were forced to only select A or B, the results would end up between A 90% or B 60%, unreliable.

Link to comment
Share on other sites

I'm really not sure why this seems to only plague audio. Anybody?

 

Actually it is used often in food testing. Normaly new or "improved" flavors.

I did see a report about instrument makers who are trying to improve the sound of the violin in a report The Mystery of the Stradivarius. There was one in the group who uses a very scientific way to build his violins including computer diagrams and such as well as traditional methods, Jacques Fustier. They did a blind test between 4 seperate, 2 not named, one from Jacques Fustier and a Stradivarius. Everyone thought the Strat would win hands down because of the supreme craftmanship (the aged wood etc etc.) and were very snooby about it.

The winner was the Jacques Fustier and the looks on the faces of the judges was a riot. The whole series is interesting and worth a watch.

http://www.youtube.com/watch?v=Nphg4YVm37I

 

Very cool. I would say Jacques Fustier has demonstrated a great ability to use science in the service of art. There was a comment made in that video about cultural shackles being the ultimate conservatism (or something to that effect), which I thought really resonates with a lot of the viewpoints I hold.

Blind testing in the food industry I was aware of, and it certainly has its purposes there. I was referring more to everyday items and hobbies that we all enjoy. Consumer reports will tell you which toaster makes the best toast, but nobody accuses them of bias for not doing it blind. I think Marks answer earlier probably has some merit for the specific reasons he sited. The video is interesting but there are differences between that test and say a DBT between amps or CD players. Changing the violin in a live setting is akin to changing the source, amp and speaker in an audio DBT. I have no doubt that most would be able to differentiate between what is a completely different sound system. The nuance of changing amps, or cables would be more similar to changing the violin strings or bow used. These are more subtle differences that are harder to notice in a short test and because of their subtle nature, they are even more of a test of auditory memory. Mike, you say: "cultural shackles being the ultimate conservatism", and I understand your point of view but I guess my point of view is more like "scientific knowledge is the ultimate conservatism". Because science, which is always progressing, is currently limited in its ability to measure sonic differences between say... cables, or fully understand how humans process sound. Is adhering to that restricted viewpoint not the "ultimate conservatism"? Just a thought.
Link to comment
Share on other sites

The following is a post by Mark from another thread.

Imagine trying to judge camera lenses the same way we do AB testing in audio. Here's an experiment to consider the problem.

Using two makes of high quality 50mm lenses, two photographs are taken of the same scene filled with a lot of detail and contrast. The two nominally identical photos are given to a subject who is asked to find the differences that exist, if any.

Now, to be clear, the lenses are physically very different in construction. Number of lens elements, arrangement of elements, glass composition, and so on. They only have in common their specifications as to aperture and focal length so that the picture will be the same in terms of exposure, field of view and content. Light passing through a lens and being refocused into a plane behind is one of the most complex engineering feats. It involves many compromises even in the finest lenses ever made. The two lenses are different because their design invokes different sets of compromise.

Back to our subject. The subject will usually take both photos and lay them side to side and study them together as two whole, inclusive entities. Using full parallel perceptual processing the photos are examined in detail. Casual review may reveal "no difference" for some subjects but, a more studied review, particularly by someone with expertise and or training, will reveal small differences in sharpness, spherical distortions near the edges, coma, color aberrations, and so on. For those subjects who see the difference, some judgment can be made about which is better. Ok---easy enough.

But now suppose the subjects weren't given both photos simultaneously to compare side by side? Suppose it worked like this: One photo is labeled "A" and the other is "B". They will be viewed using a special technique that creates a serial memory presentation. The "A" photo is loaded into a "roller box" with a horizontal slit measuring perhaps 1/4" high by 8 inches wide - (the width of the photo). The photo is rolled past the viewing slit at the rate of 1-inch per second. In 12 seconds the entire photo is rolled through the box with the subject viewing it as it moves past the slit. The entire photo is viewed, but not all at once.

Now, the "B" photo is loaded and rolled past the slit in the same 12-seconds.

The subject is asked to identify whether there is a difference between "A" and "B" by guessing each time which photo went past the slit. How well would even the best photography experts do on this test?

That is essentially what an AB test is like in audio. A song or musical piece, is a serial stream of aural sensations just like the photo is a stream of light sensations rolling past the slit. You can't ever hear the audio stream as a "whole" the way you can view a whole picture all at once. And you sure can't hear TWO streams of audio simultaneously in the way you can examine two photographs at the same time.

In the photo AB testing, it would be doubtful that many subjects correctly identify "A" from "B". And testers would exclaim, "See, there IS NO DIFFERENCE, and we scientifically proved it!" And yet, a person could be handed the two lenses, and assuming they knew anything about optics they would understand immediately that light will pass differently through these two very different constructs. Well, there's no deep paradox here - - it is obvious the contrived photographic "slit test" was meaningless as a method of discriminating differences between lenses.

Likewise with audio, AB/X testing is the bulwark that is used by all those who like to prove "no differences" in wire, cable, amplifiers, cd players and so on. They make a false assumption from the start that the test is valid, when it's obviously not. It's not, because the human memory is not a scientifically valid instrument for such a comparison. They principle of taking my memory of event "A" and comparing it for differences to my other memory of event "B" is flawed from the start. Yes, gross differences can be remembered, but in this area of extremely subtle differences between "A" and "B" memory isn't useful.

Link to comment
Share on other sites

I disagree. The general purpose is to determine if there is a perceptible difference between A and B, and that is done by using the participant as the perceiving machinery to examine A and B. It is not per se a test of the participants capability. You have it backwards

Yet the general purpose you claim is something that you find fault with...I believe the scientific approach would say that you need to change the conclusion. There is always truth to be found in any test - whether or not that truth is significant is a whole different issue.

Btw, engineers developing anything are trying to optimize for the most perceived perfromance. Improving the performance in ways that aren't perceptible often results in compromises to other aspects of the design and is just bad engineering in my mind. The engineer must learn where the customers can definitely perceive a difference....and the only way to do that is to test the customers with devices of a known performance. It doesn't make sense to release something that isn't undeniably better, let alone at least different.

Whenever I perform my own ABX testing on myself, I am learning where I can undeniably perceive a difference between two things that I already know to be different. If I can't prove that a I heard a difference, then I need to analyze why I can't hear the difference.....if the conclusion is that it's beyond the limits of my perception, then I can be fairly certain that I have enough traction and then focus my concentration elsewhere. It's quite possible that I might have to revisit a spec after cleaning up other aspects of performance....like the car analogy, that would be like improving the engine to the point that I need more traction again. It's a progressive process and one that will never prove a difference doesn't exist (in fact it's absurd to think that a difference can be proven to not exist when it's already known that there are differences). However, the confidence level is usually a good indication for relative magnitude....something that I can identify 90% of the time is going to be a lot more important than something I can only identify 60% of the time.

Btw, ABX is just one type of a DBT. There are many other blind listening tests too. Shortcomings to ABX don't necessarily apply to the other methods. I think one of my favorite methods of blind listening is where you have a known reference and then a series of samples that you rate subjectively against the reference. Mixed within the samples is the reference, and then often a known bad performer, which should ideally get rated as 100 and 0 respectively. I've also seen it performed where after the blind part, the test is taken again where you get to know what you're listening to. I believe this was pretty popular with Toole.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...