Transporting audio into the ether

Given the advancements that have been made in both digital audio and IT networking technologies, we now have more ways than ever to make interconnections and create increasingly complex systems.

More and more capacity is on offer for system designers, installers and operators with progressively smaller and smaller footprints, producing systems which belie their true power and, of course, offer endless flexibility.

Gone are the days of cumbersome analogue audio snakes. Gone are the days of miles and miles of analogue cable. Gone are the days of actual, physical patching, for these are the days of the digital audio snake and network audio, configurable via software and iPads, and inter-connected via CAT5 cable. Audio transport hasn’t been the same since.

But where does one start? After all, there are myriad AoE protocols out there and many are proprietary to specific licenced manufacturers so knowing what protocol suits your needs is key. Also, knowing the features of each protocol, their compatibility with other equipment, latency, upgradability and market longevity, to name a few important considerations, is paramount.

Briefly the brand or protocol you choose is often the one you are stuck with and there are no guarantees to how “future-proof’ one protocol is over another. In light of all this, let’s take a look at some of the current forerunners in the AoE arena, their current list of features, and subsequent supporting manufacturers. But first let’s introduce Audio over Ethernet.

Robust

To fully get your head around network audio, it might be worth your while to get acquainted with how it actually works; how audio is integrated into a standard 801.3x network (a wireless network is 802.11), the limitations standard network gear poses and how protocol developers have got around them.
At first glance AoE might look similar to Voice over IP (VoIP). However, because AoE systems are designed to deliver high-fidelity, low latency professional audio, they do not generally incorporate any sort of audio data compression and therefore require a robust, high throughput network. Typically, an AoE network requires at least 1 Mbit/sec per channel and less than 10ms of latency. Although – as we will see – these figures are quite high.

First thing to know is that not all AoE protocols are compatible. Often manufacturers have had to make adjustments to standardised network processes to accommodate the need for high channel counts and low latency, therefore rendering the technology proprietary in most cases. For example, because computer networking standards such as IP (Internet Protocol) have been adopted in most AoE systems, so have their limitations such as the fact that they are “packet-based’.
A “packet’ is exactly what it sounds like, a small collection of data that is part of a larger set. In the case of audio, these packets contain header information that specifies parameters such as source, destination, and involve an encoding /decoding process.

Once a packet of audio data has been encoded, it can then be transmitted over the network where it is decoded by the receiver. However, in a standard IP-based system, there is no guarantee that the data packets will arrive in the order that they were sent. But it’s not all bad news. In this approach the upside is that such systems are compatible with off-the-shelf IT equipment such as standard routers and switches, which can be helpful. On the other hand, because of the unpredictable nature of packet delivery and to ensure smooth, glitch-free operation, a buffer must be implemented, therefore increasing system latency.

Generally this is not acceptable for applications such as in-ear monitoring where ultra-low latency is required. A solution to this, such as is found in AES50-based systems, is to use “frame-based audio transmission’. This transmission style uses only the physical layers of the IP network which are the physical cables and transceivers at each end. In this system there is no need for an encoding / decoding process as audio data is sent from point to point instead of as generic data across a computer network. To put it another way, audio samples are streamed continuously using Ethernet frames from transmission to reception, which makes far better use of the throughput capacity of IP-based systems and ensures smooth, low-latency, high band-width operation.

Ethersound by Digigram

One of the more established of the bunch, Ethersound by network audio masters Digigram, is a 64-channel (24-bit/48kHz PCM), low-latency, bi-directional audio networking solution over Ethernet with full compliance to the 802.3x standard.
As we see with many other AoE protocols, there are two variants: a “high capacity’ and a ’standard’ version, namely the ES-Giga and the ES-100, respectively. In addition, in 2008, Digigram released a unidirectional version of the protocol entitled ES-100/spkr, which enables manufacturers to implement the ES-100 protocol where its full features are not required, such as in loudspeaker systems with Ethersound integration where a return stream is not needed.

Basically, the difference between the two is that ES-100 utilises a 100Mbit/sec network and the ES-Giga uses a Gigabit network, which has ten times the bandwidth. Obviously, the higher the bandwidth the higher the capacity of the network, so the ES-100 system can handle 64 bidirectional channels at 48kHz; and the ES-Giga system; 256 channels. At 96kHz all channels halve and at 192kHz they halve again.

Interestingly, in a single ES-Giga system, channel count can actually exceed 512 channels by “overwriting’ existing channels in parts of the network. In terms of latency, the end-to-end transmission time of an Ethersound audio network is six samples at 48kHz, which equates to 125 microseconds at 48kHz. A further 1.5 microseconds (.5 microseconds in an Es-Giga system) is picked up for each slave module in a daisy chain configuration that is added. These figures are extremely low and acceptable for almost any pro-audio scenario. But how low should they be?

One of the central features when selecting an AoE protocol is, of course, latency. These figures can be a huge selling point for manufacturers, especially with the growing popularity of in-ear monitoring systems. Generally, anything over 3ms of latency is considered unacceptable these days, even if it cannot be immediately perceived by the performer, and the rule of thumb is that total system latency cannot be greater than the time it takes sound to reach a vocalist’s ears from his / her mouth. Now that’s a pretty small margin because it can take milliseconds in the single figures, sometimes less, for this to occur, which creates a huge design constraint for protocol developers in the AoE field

How manufacturers handle this problem is a core consideration and one that I’m sure many dollars of R&D get poured into. Some, like Ethersound, strive to use existing network standards to stay non-proprietary. This has its advantages – such as the ability to integrate into a pre-existing network with standard IT hardware – provided you can keep latency figures below the acceptable threshold. Some have found other ways to jump this hurdle and often results in the development of proprietary technologies and hardware to sidestep the latency constraints of a standard 802.3x network infrastructure.

Currently there are many mainstream manufacturers that are Ethersound partners including: Yamaha, Allen & Heath, Digico, Innovason, Martin Audio, Nexo and many more big players in the industry. We’ll let you decide what that means.

CobraNet by Cirrus Logic

CobraNet is also a long time player in the AoE game, fittingly created in 1996 by Peak Audio in Colorado to provide background music at the Animal Kingdom theme park. It was eventually bought by Cirrus Logic in May 2001.
CobraNet differs from proprietary AoE systems in that it utilises standard Ethernet packets and network infrastructure hardware such as controllers, hubs, switches and routers. Due to this latency figures come in a bit higher than that of Ethersound at 256 samples which calculate to 5.33 milliseconds. An additional delay of a dozen or so samples per process will also be picked up when analogue to digital conversion, digital to analogue conversion and sample rate conversion is performed. These figures are deterministic and therefore consistent at every point in the system.

However, despite the fact that the difference in latency figures between CobraNet and Ethersound seems meagre, they are actually quite large. When you’re talking about microseconds and samples, milliseconds are huge. This could be why CobraNet has seen a bit of a decline in the past few years as many other protocols have arrived on the scene that offer better performance and is why it might be best suited to what it was originally designed for: background music. The upside, however, is that there is no need for proprietary hardware so if you can live with five or so milliseconds of latency, then CobraNet might just be a more cost effective choice.

One possible solution to the latency problem with CobraNet, however, is to send smaller packets more often, which can be defined by the system programmer. This can reduce latency to as low as 1.33 milliseconds, which is a lot lower but still almost ten times higher than the reported 125 microseconds for Ethersound. How any given CobraNet device handles this lower latency is revealed on a case-by-case basis and does not always equate to higher data transmission at lower latencies. There are almost always trade-offs, be it lowered channel counts or glitches in the audio stream and performance relies heavily on the devices’ bundle capacity.

The other issue of interest is no doubt channel count. Just like the Ethersound ES-100 protocol, CobraNet can handle 64 bi-directional channels at 48kHz over a single CAT5 cable but these figures increase with increased bandwidth as is found in a gigabit network and when 16-bit audio is used instead of 24-bit. How many extra channels can be picked up is unclear from my research, but suffice to say that it is indeed possible.

Because CobraNet is one of more mature of the AoE protocols currently on offer, there are many manufacturers who have implemented the technology, including Biamp, Bosch, Bose, BSS, Clear-Com, Crest Audio, Crown, DBX, Digitech, DOD, Dynacord, EAW, EV, JBL, Klark Teknik, Lab Gruppen, Mackie, QSC, Rane, Shure, Soundcraft, Tascam, Yamaha, plus many others. Indeed, the extensiveness of their licensee list reveals CobraNet’s longevity and might attest to its performance.

Dante by Audinate

Dante by Audinate has been a protocol that has been popping up more and more lately as it offers a one cable solution to low-latency network audio and multi-track recording via their Virtual Soundcard software.
One of the younger technologies of the bunch, it was originally developed to construct and expand upon existing AoE technologies such as Ethersound and CobraNet and offers several advantages over the aforementioned, such as the ability to pass through network routers, native gigabit support, higher channel count, lower latency and auto configuration.

At a glance, once might be inclined to think that Dante is the obvious choice. Well, it may be, but as always, it’s probably a good idea to know why and delve into the inner workings. I suppose the first point of interest is that Dante is auto-configurable and “plug and play’. Automatic device discovery and system configuration are both now a reality because Dante-enabled devices will seek each other out on any given network and configure themselves. This is a huge selling point.

Another advantage is that Dante runs on standard, inexpensive, off-the-shelf IT hardware and does not require a proprietary network infrastructure. Dante digital media streams are transmitted alongside ordinary data traffic so you can integrate your Dante system into a pre-existing network and with the Dante Virtual Soundcard software, your PC or Mac is recognised as and acts like any other Dante-enabled device on the network. This enables you to record low-latency, high-channel count multi-track audio directly to your computer without the need for extra hardware!

But what kind of figures are we working with here? Well, Dante works over most modern Ethernet flavours, including 100Mbit/sec, 1 Gbit/sec and 10Gbit/sec supported. Both digital audio and control data are distributed with some of the lowest latency figures in the business, and I mean LOW. The point to point transmission time of an optimised Dante system has been measured to be 83.3 microseconds. One interesting point is that latency can be configured to be different between devices in the same network, which means that more critical connections can be configured to have lower latency and less critical; higher latency, such as with a broadcast or recording feed.

Now, channel count. Dante supports a mammoth 512 bi-directional channels over standard gigabit Ethernet at 24-bit, 48kHz resolution. As we often see, for sample rates over 48kHz, channels halve to a paltry 256 bi-directional channels. For 100Mbit/sec networks, 48 bi-directional channels are supported at 24-bit, 48kHz resolution and (once again) at higher sample rates channel count is halved. This may be where Ethersound picks up the slack because at the same bandwidth it offers a further 24 channels for a total of 64 bi-directional channels at similar resolutions.

Of course because Dante is relatively new, its licensee list is considerably shorter but it is growing. For now, Dante is supported by Allen & Heath, Bosch, Digico, Dolby, Dynacord, EV, Focusrite, JoeCo, Lab Gruppen, Lake Processing, Peavey Commercial Audio, Symetrix, Turbosound, Whirlwind and Yamaha.

AES50 / SuperMAC / HyperMAC by Klark Teknik

In July 2005 the Audio Engineering Society (AES) released a paper entitled AES50-2005 which outlined a new and exciting way of using a standard 100Mbit/sec CAT5 cable to transmit multi-channel digital audio over a network.
This technology was developed by a team of geniuses at the Sony Pro-Audio Lab at Oxford, England and is now licenced under the names SuperMAC (for 100Mbit/sec networks) and HyperMAC (for 1Gbit/sec networks). We’ll see why shortly, but it was quickly picked up by audio console giant Midas to implement into the audio and control network of its flagship XL8 digital audio console.

Soon after it was decided to put the Sony Pro-Audio Lab networking division up for sale which was picked up by Klark Teknik with equal swiftness. Klark Teknik, in a move of corporate benevolence, has now made the technology available on a royalty-free basis, bless them. It’s interesting to note that Midas, Klark Teknik, along with Behringer and Bugera, are all owned by the Music Group holding company and they are the primary owners and licensers of the technology.

Indeed, the technology, at this point, has been thoroughly road tested on mega tours with such acts as Metallica, AC/DC, Oasis, REM, The Verve, Depeche Mode, OMD, Arctic Monkeys and Led Zeppelin at London’s O2 Arena, not to mention many high profile festival shows such as Glastonbury. Is that a great start, or what?
So, why all the fuss over this AoE technology? Well, it simply has some of the best performance figures money can currently buy, even when compared to some Dante specifications. Although, to be fair, in some cases Dante has the edge. As we have seen before, there are two variants, both based upon either a 100Mbit/sec or a Gigabit network and called SuperMAC and HyperMAC, respectively. As we’ve seen with the Dante protocol, we get 48 bi-directional channels at 48kHz for SuperMAC over a 100Mbit/sec network. For HyperMAC, we get 192 bi-directional channels at 96kHz or 384 bi-directional channels at 48kHz. In this case Dante is the clear winner with an additional 184 channel capacity over a Gigabit network. However, when it comes to latency in a Super/HyperMAC system, there is one victor.

SuperMAC boasts a ridiculously low latency of 62.5 microseconds over a 100Mbit/sec network. As if that wasn’t impressive enough, HyperMAC takes it several steps further with a figure of 41.6 microseconds. These values are as close to real time as one can possibly get at this date and time, and certainly sets the bar high for future developers of AoE protocols who wish to up the game, so to speak.

So what else is there to know about Super/HyperMAC? Well, it’s completely proprietary so it’s not compatible with off-the-shelf IT hardware. However, that is how it accomplishes such low latencies – because of its frame-based, point to point approach and because it uses proprietary hardware. Also, is fibre optic cable supported? Absolutely, but only when using HyperMAC. SuperMAC does not support it.

Finally, which manufacturers have licensed the technology? So far, only four: Midas, Klark Teknik, Lynx and Auvitran. Let’s hope that once this technology gets more interest we will see more products hit the shelves that support it but for now, it seems that Dante is the up-and-comer with a wider market behind it.

The wrap

Selecting a suitable AoE protocol can be a daunting task as the technology, despite being a relatively recent phenomenon, is more and more prevalent with many contenders that continue to pop up as time goes on.

Indeed, the four listed here are one of many, including A-Net by Aviom, AudioRail M11, MaGIC by Gibson, AVB, Roland REAC, Hydra by Calrec, DSPRO, Livewire by Axia Audio, Audio Contribution over IP by the EBU, Q-LAN by QSC, RAVENNA by ALC NetworX. The list goes on. However, it seems that these protocols all work on the same principles so once you get your head around the fundamentals, the inner workings become clearer and better judgement starts to set in.

One thing’s for sure, though; the need for low-latency, high-bandwidth operation will never go away and knowing what you’re up against when selecting a system / protocol combination is definitely key. Happy hunting!

By Greg Bester