The streaming industry is on an ever-evolving quest to reduce latency to bring it in line with, or to beat linear broadcast. When streaming started, latency of a minute or more was not uncommon and whilst there are some simple ways to improve that, matching the latency of digital TV, around 5 seconds, is not trivial. And that’s not to mention auctions, gambling or ‘gamification’ which need sub-second latency. So the question becomes, why is low-latency streaming hard?
Why is Low-Latency Hard?
Our modern streaming toolset has its foundations in Apple’s HLS. This protocol was the first to bring the idea of chunking the video stream into files to a large, mainstream audience. The protocol was released into a world which had recently had great success in scaling web servers to handle millions of requests a second. The important realisation was that if you delivered lots of small files quickly enough, the decoder could piece them together and it would seem like they had been streamed in one continuous ‘broadcast’. As such, it combined the best assets of file distribution to compensate for the difficulties of doing true streaming.
Why not just use a real stream?
Scaling: A ‘real’ stream is, like a stream of water, a continuous flow of media into a decoder. Within a local network, an encoder can send out such a stream and the network can use multicast to deliver that same data to every decoder at the same time. Unfortunately, multicast isn’t available on the internet at-large. If it were, this would be a very efficient way of distributing video. When multicast isn’t possible, you have two choices, your encoder sends multiple streams – one for each decoder – or you have a server downstream which ‘fans out’ the stream. Let’s say each server can send out 200 streams, you would need 50,000 servers for a stream with an audience of 10 million. This type of scaling is still difficult and expensive.
Caching: Using files rather than streams allows the video to be cached using a CDN, meaning chunks of the file can leave the encoder once and be distributed around the world quickly and efficiently for decoders to pick up locally. With decoders only connecting to a ‘local’ CDN point of presence, the high bandwidth multiplication happens only on the last link meaning 20,000 Indian viewers only download from an Indian server and not over transatlantic fibre.
Perfection: Often missed as a benefit of chunked protocols is that when you see any video from them, it’s exactly what you were intended to see; it’s a perfect copy. If you’re watching satellite TV and it starts to rain heavily, you may start to see artefacts on your screen when the receiver can’t compensate for the errors cause by the suddenly poor reception. Streaming media online introduces delay and includes high- and low-bitrate versions. This allows decoders time to get a good copy of each, say, 6-second chunk of video. If the decoder runs out of chunks, you will see nothing. If it does have a chunk, you’ll see a perfect rendition of it. Where the bandwidth is poor, your chunks will be lower resolution, but they are exactly what was sent. So if the service provider put in a lot of effort to optimise for that bitrate, then you will receive the benefits. If online videos were streamed in the traditional sense, your bad network would mean lost packets and would corrupt your image.
We can see from the above the strong reasons we’re still using file-based, or segmentation-based, streaming protocols. Although they have major benefits, they are by their very nature, not optimised for low-latency. They come from a time where succeeding in delivery was the hard bit and adding latency helped overcome problems. We’ll look at the solutions in another article but in the meantime, this video from Akami’s Will Law explains the main approaches.