How digital audio ate itself and the music industry
Part One: The birth of a new science
Special Report Digital audio began life with high ideals and worthy engineering feats, with its extended dynamic range came the promise of noise-free recording. This is a story of how it first charmed and then choked the industry it was designed to enhance.
It's a long and complicated story, full of challenges and unforeseen consequences. In the first part, I'll explain how some of the strange choices were made by the earliest digital audio engineers.
It's 1987, and London Zoo is playing host to a conference called the Digital Information Exchange (DIE) run by HHB – a professional audio hire and sales company. Theoretical papers are being delivered and real products are being demonstrated from the leading lights of the era, the likes of AMS and Fairlight.
But there's an excitement to the event. Something new is emerging, finally. The talk of the show is a brand new digital stereo recording format and milling around me are audio engineers of a quite different nature to the ones sat in studios. These guys are behind the designs of digital recorders, including the multitrack machines that would become the mainstay of well-heeled recording studios for the next 10 years. The fruits of their labour are on display and the future looks bright.
Mitsubishi's X-880 ProDigi 32-track recorder
Looking back, 10 years doesn’t seem long enough to recoup all that technological effort, but it was a necessary step that would eventually take us to desktop music-making and with it, a mass culling of professional recording studios. Song demos and remixes that would once keep a studio busy as a 24-hour concern, were lost to the bard in the bedroom and, inevitably, whole albums would be made by musos sitting around in their underpants.
Digital audio itself wasn't new, of course. The compact disc was already done and dusted as a consumer product by the early 1980s, along with digital recording in stereo in the studio to facilitate CD masters. Yet on show at the DIE was the means to bring CD-quality audio beyond stereo mastering and use it for the actual track laying in the studio. Admittedly, the phenomenally expensive 3M systems had already captured performances of jazz and classical works along with Donald Fagen’s The Nightfly, years earlier, but here in the flesh was a new generation of digital multitrack recorder, not some technological rarity.
Readily available for hire or sale, these Japanese reel-to-reel digital tape machines were capable of recording 24-tracks (Sony DASH) or 32-tracks (Mitsubishi ProDigi) on one-inch tape. They were still phenomenally expensive but they were impressive – Sony would later double the track count to 48 and license its technology to analogue multitrack stalwart, Studer, once considered the Rolls Royce of reel-to-reel recorders.
Two Sony PCM-3324 DASH machines with mixer and digital mastering gear package from 1987
Although these two tape standards, DASH and ProDigi, were incompatible, they were nonetheless remarkable examples of data storage and precision timing. With analogue-to-digital (A/D) and digital-to-analogue (D/A) converters for each track, these multitrack recorders also featured digital outputs that conformed to one protocol, the AES/EBU standard. And this was crucial: digital interfacing was the conduit that enabled these recorders to speak unto other digital recorders and mixers. What we take for granted with a multichannel Toslink optical cable on the back of a AV receiver today, was in embryonic form here on XLR jacks and, in the case of MADI (multichannel audio digital interface), BNC connectors.
Here comes the scums
To give studios what they wanted, HHB modified Sony's DTC-1000ES and made a killing...
Also on show at DIE was a brand new tape-recording system from Sony, called RDAT, or rotating head digital audio tape. Primarily used as a two-channel mastering recorder – theoretically it could manage four tracks – RDAT was an innovative remodelling of earlier digital recording methods. However, its reliability, convenience and relative cost were an overnight sensation. And while its potential excited recording engineers and producers, it made the record companies very nervous.
Record company lawyer contemplates new tape format?
Caption of the time for this DIE promo pic
The music industry was already nervous. It fretted that digital's ability to make faithful carbon copies of a recording would allow free duplication of sound recordings, and that the genie would slip out of the bottle.
With digital interfacing on CD players – that was intended for fancy amps and alternative D/A converters – now on digital tape recorders too, the music industry feared that the whole world was about to start to pirate CDs on RDAT machines. The fact that DAT recorders at the time were extremely expensive was rather overlooked.
So spoiler tactics were introduced, by manufacturers themselves, in the hope of pacifying the music industry, and its fear of the digital audio genie duplicating itself at will. DAT recorders were restricted to recording at 48kHz through the digital interface – the professional audio standard sample rate. Surely, that would scupper the CD pirates, with their consumer digital audio recordings at 44.1kHz?
Among the more affordable DAT recorders
was the portable Casio DA-2
In addition, RDAT models were designed to only record analogue sources at 48kHz. Playback was also intended to be 48kHz-only, except for the 44.1kHz pre-recorded tapes that never gained popularity – if you understand Japanese, you can hear Ryuichi Sakamoto describe his Beauty album remix for RDAT duplication here. And if those hoops weren't enough to jump through, just for good measure, SCMS (serial copyright management system) known as "scums" was tacked on to the digital audio signal, a flagging mechanism that would prevent further digital copying from a clone recording.
The sample rate switch was needed for
mainstream studio acceptance
So if you think DRM is a relatively new evil, we’ve been here before.
Despite these obstacles, HHB was happy to sell you a modded Sony DTC-1000ES that you could switch between sample rates when recording analogue sources. Inevitably, professional RDAT machines were established that did the same, with the cheaper models from the likes of Casio and Sony's own TCD-D3 DAT Walkman stuck on 48kHz recording.
Politics aside, just what was digital audio?
For all the issues in the analogue world that it was supposed to resolve, digital audio managed to create its own shortcomings along the way. Fear not, I'm not going to reopen the vinyl versus CD debate here. But perhaps the simplest way of looking at it is that digital audio imposes certain restrictions that are different to analogue. With analogue, if the recording was too quiet, it would get lost in tape hiss, and if it was too loud, then you’d get distortion. The latter might even sound good, if it was not too overcooked.
The numbers game
U-matic companion: Sony's PCM-1630 adapter
With digital, go too loud and you’ve run out of road. Excessive peaks can’t be digitally encoded accurately because the A/D converter simply hasn’t enough numbers to do it, and so it just flatlines at the top level. There’s no harmonic pleasantries to be had up there, just a disturbing clipping effect. Yet go too quiet, and you get granulation noise – it’s when the A/D conversion can’t detect significant changes in level and so the binary signal stays at a high or low state for "unnatural" durations which deliver erroneous tones to the quiet passages. Sounds like a problem or two there.
Well, the first one is fixed easily: don’t record too loud, or have a limiter (dynamics processor) in tow to flatten any wayward peaks. Going from 16-bit to 24-bit converters also allows significantly more headroom too. The second issue with low-level recording is fixed by introducing a randomising element, so those quiet passages don’t sound weird. And you know what that randomising element is? Noise. Low-level noise, called dither, fixes the problem at the other end of the dynamic range. So don’t ever let anyone tell you digital systems aren’t noisy – digital audio depends on analogue noise patterns to mask the presence of its own artifacts, granulation noise.
As for the sample rates, have you ever wondered why the seemingly arbitrary 44.1kHz and 48kHz were chosen? So, let’s start off with the upper limits of human hearing, say 20kHz. Now let’s double it, because A/D conversion needs to capture the peaks and the troughs of sine waves at this frequency, and let’s add a bit more top end for good measure.
Now, what can we record all this data on the technology available back in the late 1970s? Enter Sony’s trusty U-matic video recorder.
Before DAT, Sony migrated its U-matic-based
digital recording to Betamax.
Rather than have tape spinning at enormous speeds, you could use a rotary head machine and spray the stereo digital audio data as a multiplexed video signal diagonally onto tape. Both the 44.1kHz and 48kHz sampling frequencies were derived because they were mathematically convenient for both PAL and NTSC U-matic recorders. For early CD mastering, these video tape recorders were fitted with PCM-1610/1630 (pulse code modulation) adaptors and the fact remains that the 44.1kHz sample rate of most of the music we listen to today has its origins in a video recording system that went into production exactly 40 years ago.
Still, the ideas behind the U-matic (and later, Betamax variants) as the CD mastering machine were not without merit and RDAT, in essence, was just a miniaturised version. And whereas the analogue VCR had led the way to provide a storage mechanism for digital recordings, the RDAT recorder evolved into a convenient storage system for back-up and archiving computer data. Sony only discontinued RDAT in 2005. Eighteen years is a pretty good innings for a format.
Perhaps more interesting still, are the parallels between the first analogue noise reduction systems from the likes of Dolby and dbx, and later compression techniques in the MP3 era, in use today.
Dolby B-type Signetics IC from 1973
Back in the day, just about every pre-recorded cassette tape had Dolby B stamped on the cover. The Dolby NR (noise reduction) technology owed a lot to a process called "compansion" – compression on recording and expansion on playback – with a few clever tweaks. Compression meant attenuating the loud bits and leaving the quiet passages unaffected. It has many creative uses, and crucially with analogue recording it enabled quieter sections to be recorded at higher levels (minimising the noise) because the louder sounds were being levelled out automatically, thus avoiding overloading.
1968 KLH Model Forty reel-to-reel recorder with Dolby B NR.
Without going into all the details of compression ratios, sensitivity thresholds, sliding frequency bands and attack and release times, Dolby’s trick with domestic noise reduction was a rethink on compansion. His tech paper is here, but simply, rather than attenuate the loud sounds, his circuits detected lower level signals – typically in the mid- and high-frequency range – and boosted them during recording. It had a similar effect to compression; the signal that went down on tape was ironed out a bit, so you could increase the recording level slightly too. Hence the tape was more evenly saturated.
Indeed, magnetic tape can only 'hold' so much of a signal – magnetic flux is measured in nano-Webers per metre (nWbm) – and the skill is in getting the level right. If you overload (oversaturate), you get full-on clipping, but there is a twilight zone around the outer limits of tape saturation that produces low-level distortion that is harmonically rich and pleasant to the ear.
The Fisher RC 80 cassette recorder from 1970 with Dolby B NR
Conventionally, using an expander will make the loud bits louder and the quiet bits quieter. Again, Dolby achieved a similar but more refined effect. On playback, the low-level signals amplified during recording had the reverse treatment applied to restore the original dynamics. What was typically a treble boost during recording, was now filtered off during playback. In the process, the hiss that's inherent in the analogue tape-recording medium was attenuated as a consequence. And that really was noise reduction. Re-sult!
Dolby B tapes were also playable on equipment that didn’t feature it. Here, the playback just sounded brighter. The trouble is, a brighter recording is perceived as louder and many people chose not to engage the noise reduction, as it appeared to stifle the audio, when really, all they needed to do was turn it up a bit to hear it better, along with the improved dynamic range and a less noisy output.
OK, so how does this tie in with digital recording?
Apple's Waveburner utility accommodates legacy mastering
For a start, emphasis and de-emphasis equalisation circuitry was often applied when capturing mixes on digital recorders such as RDAT. Before oversampling became the norm, 14-bit and some 16-bit A/D converters had a few shortcomings encoding higher frequencies, so a simple boost – that would be removed at the D/A output stage – was applied to similarly bolster the transmission of high frequency content and, on playback, suppress quantization noise artefacts produced by the A/D converters.
And just like Dolby B, people didn’t follow the rules. Not everyone used pre-emphasis when recording digital mixes and yet CDs would get mastered and pressed with the de-emphasis flag active, which in turn would have CD players filtering digital content that didn’t need it. In effect, this would muffle the mix, much like playing a cassette with Dolby NR that wasn’t recorded with it.
Moreover, digital audio on CD has its own legacy issues. If you rip tracks from disc for use on computer or an MP3 player, depending on the software you use, you could lose any reference to emphasis flags. Consequently, with some older CDs, early classical masters in particular, you could end up with brighter audio than you bargained for. Apparently, iTunes looks for pre-emphasis and makes the relevant equalisation compensations when ripping a CD, so it’s not all madness.
Besides compression, analogue recording had other methods of trying to squeeze more into less. Instead of striving for cleaner recordings, alternatives that innately muddied the waters were used for reasons of cost and convenience. Narrower tape tracks and slower recording speeds meant you used less media and enabled cassettes to oust reel-to-reel for domestic use.
U-matic, S-VHS ADAT, Betamax PCM-F1, Hi-8 DTRS, DCC and DAT
Just a few of the digital tape formats that have come and gone
Likewise, digital audio could attempt equivalent shortcuts by halving both the sample rate and the resolution. A sampling rate of 22.05kHz and an 8-bit resolution was on a par with a ferric oxide cassette recording. And instead of one minute of 16-bit, 44.1kHz CD quality PCM stereo audio taking up 10MB of storage, doing it by halves would only take up 2.5MB. But just as with analogue, half speed sample rates did for the high frequencies much the same as half speed tape recording.
As digital audio matured, more adventurous processing techniques were employed to deliver more from less. In the next instalment we'll pick up the story of how artists began to use these new recording tools creatively, and the domestication of the art of digital audio. ®