The skills of the recording engineers and producers generally determine what you'll get out of a recording (performar talent is an underlying or b'ackground' assumption).
Ideally, a great recording should be able to give a listener the experience of sitting in the 'sweet spot' of an excellent performance hall. Since most people don't listen to music in anything like a great concert hall, and probably don't get many opportunities to experience live music at the optimum points in those halls even if they go there, the potential for a recording to meet or exceed the quality of live performances likely to be experienced by a typical listener is huge.
In an audiophile context, being able to leverage a well engineered/well produced recording to maximize the quality of the listening experience (get as close to being in the ideal spot in a great concert hall) is probably the answer to ''What, then, is it that we are trying to recreate with two speakers?''
That being said .... most of the recording engineers and producers today are producing and engineering for the (largely 'tin-eared') iPod/MP3 market -- with almost zero-db dynamic range, heavily reverbed mid-range, boomy bass, etc. (none of which may matter much, since most of the 'music' is electronically generated by computers, and many of the 'vocals' are so heavily 'processed' that they might as well be computer generated [comng in 2012: will HAL-9000 and the 'Lost In Space' Robot team up for Beatlemania?]). The results are predictable: overall music sales have dropped by 9 - 10 percent annually in each of the past 6 years, led by a 17 - 22% drop in new music sales.