Content Delivery in P2P networks
Scalability is a hot topic, especially in Content Delivery Networks (CDN) where scalability, robustness and availability are crucial. A CDN is a system that has the aim to deliver Web content, such as videos, images, and any other type, to end users. Content providers such as media companies need to deliver their content with high availability and high performance, and CDNs offer them the infrastructure to do it. There are different approaches, each one with its advantages and flaws.
Old approach: client-server model
Nowadays a traditional client-server approach is not effective: it is not scalable, expensive, and not robust. When the network grows, it is necessary to keep up with the hardware on the server, with high costs, and a simple DoS attack would take the whole system down.
The most common infrastructure adopted by CDNs is the proxy-client one: a network of distributed severs (or proxies) caching the content coming from the main server. Such a network guarantees high availability and robustness through redundancy of the content, and high performances thanks to the geographical proximity of the proxies to the end users.
Such an infrastructure has a better scalability, since it is sufficient to add servers in overloaded areas. Such servers do not need to be very powerful, since they will take only a part of the load. However, adding servers is still quite costly, and requires time to deploy.
A network infrastructure that solves a lot of scalability and cost issues is the peer-to-peer (P2P) one.
In P2P networks each user acts both as a client and a server, exploiting partially its storage and computing resources. There is no server, but the overall resources of the network are used: the sum of the resources of each peer forms a bigger total resource.
Obviously each peer will contribute with a small part of its resources, such as upload bandwidth and memory, in order to not be loaded too much. It is very important to not impact peers’ performances, otherwise they might be discouraged in using the network.
P2P networks scale themselves organically, since the more peers are part of it, the more resources, the better they work, and all this comes for free.
For all this reasons, P2P networks are very interesting for CDNs. There is a recent trend to use hybrid models that exploit P2P technology. In such a model, P2P helps to alleviate the servers load. An example of CDN using such a hybrid model is Akamai.
However, the biggest problem to be tackled is data availability, since peers tend to churn.
It is not possible to rely on a single peer to deliver a particular content, because when it leaves the content will not be available anymore. It is necessary to have some content redundancy, and spread it around many different peers. In such a context, the data allocation strategy is vital.
Another element to be taken into account is data demand: when it is not possible to guarantee high availability for all the content, because the network’s storage is not large enough, it is important to have a strategy to guarantee a better availability for more valuable content (usually the most popular), at expense of less valuable content (usually the least popular).
Data are usually split into chunks, and spread all over the network: this way a better data allocation, and so content distribution, is reached, since peers do not necessarily need to hold an entire file to take part into the content uploading.
In some P2P networks also the reliability of peers is taken into account: peers that are judged more reliable, and so unlikely to leave soon, will have a more important role in the distribution.
A P2P network example: BitTorrent
BitTorrent is probably the most famous P2P file sharing protocol. It allows us to have a good example on how the previous problems are tackled and solved in P2P models.
Such protocol is based on 3 main concepts:
A peer is a user having only a part of a file, a seed is a user having the whole file, and a swarm is the ensemble of peers and seeds participating in the file sharing.
Each file is divided in segments that are shared across the swarm. A segment is the sharing unit: a peer can only download an entire segment from a single peer, while it can download different segments from different peers.
Another important actor is the tracker, which is a server coordinating the file distribution. Typically the tracker contains information such as which peers are part of the swarm. Some clients, like Azureus, use Distributed Hash Table instead, eliminating any server, and so utilizing a pure P2P model.
Once a peer has knowledge of the swarm, it can start asking for segments, usually in a random fashion. In some cases a rarest first approach is used.
While downloading a peer takes an active part in the network, since it also starts uploading the segments it has to other peers asking for them. The result is a flood like spreading of the file throughout the network.
In order to encourage peers to keep sharing the file, instead of churning, Bittorent utilizes a tit-for-tat scheme with ‘optimistic unchoking’. Peers tend to prefer uploading segments to peers having segments in demand in a trade like fashion. This way peers not having anything to upload will be strongly penalized. However, this way new peers having nothing to trade will not be able to start downloading and so eventually become uploader, wasting useful resources. For this reason a unchoking mechanism is used: once in a while peers randomly select who uploading to, even if they have nothing to trade for.
Because of such mechanism, new peers start uploading very slowly but eventually, when they have enough segments to upload, they will download very quickly.
Typical BitTorrent work flow
Let’s now have a look on how typically file sharing works in BitTorrent.
In order to download a file it is necessary only to have the related torrent file, and a BitTorrent client.
Firstly, a user creates a torrent file containing metadata about the file (such as size, number of segments, order of the segments), and information about the tracker. The torrent file also contains a hash for each segment to test that it is error free.
A user looking to download such file, must firstly obtain the related torrent file. Typically this happens through websites acting as torrent search engines.
Once obtained the file, the user (through the BitTorrent client) is able contact the tracker, and have information about the swarm, and so who to download from.
The client starts asking for segments in a random fashion. Other peers will upload applying the tit-for-tat with optimistic unchoking mechanism. As a result the peer starts downloading very slowly and sees the downloading speed increasing with the time.
While downloading, the peer has already an active role uploading to other peers, and eventually will become a seed when the file will be completely downloaded.
P2P networks are an interesting and fascinating solution for content delivery, even if they can be very challenging.
They offer a low-cost and very robust alternative to servers, reducing dependency. They guarantee availability through a high redundancy of the content, without impacting users’ performances.
They can also guarantee high performances thanks to the vicinity of peers, if big enough. Indeed, not only they scale themselves, but work better with large networks: more users means more resources. This makes P2P networks very resistant to flash crowds, unlike any model requiring servers.
However, availability remains the biggest challenge. P2P networks work very well with popular content but less well with unpopular content, that not only can result to be downloaded slowly but can even be not always available.
Being scalability a very big problem nowadays, destined to exacerbate in the future, P2P is definitely a model CDN are moving towards.
Delivering content through P2P networks is a well established reality today (companies like Facebook, Twitter, Amazon S3, Blizzard, Wargaming adopts BiTorrent for some of their services), that has a large margin of improvement and is still a subject of study.
The full potential of the P2P model is yet to be exploited!