Tuesday, April 9, 2013

Distributing files across a web farm or cluster.

We have 100's of servers in several locations , as part of our web content management we need to push content out frequently, some times several times a hour or more.

To date, we have used a mixture of http downloads and rsync script to accomplish this. Now we are testing new mixture, that we hope will scale out.

In our central location we have a large archive with all the files we need to distribute.  Our remote datacenters have a single node in each datacenter to help with distribution.  we take the archive, lets pretend its a freebsd iso file, and we make it available via https, so we can download it over the internet between our datacenters, not via our mpls or other expensive transits.  using metalink files, you can also specify the internal source as a lower preference.

then within the datacenter we share the file via torrent with the single node mentioned above being the seed for the datacenter, and also the tracker.  encryption is optional.

This works well for single files, like large tar files, we need to experiment with multiple individual small files.

1.     start a tracker in each datacenter (assume bt.local.dc)
o    bttrack  --dfile /tmp/dfile --port 85 --reannounce_interval 5
2.     create a torrent file of the archive in the central distribution point. (assume bt.master)
o    btmakemetafile  http://bt.local.dc:85/announce FreeBSD-8.4-BETA1-i386-dvd1.iso
3.     Make a metalink file describing the archive and torrent too
o     echo -e "external 100 https % https://bt.master.internet.ip \n internal 100 bittorrent % http://bt.master \n internal 10 http http://bt.master" | metalink -d md5 FreeBSD-8.4-BETA1-i386-dvd1.iso | sed  's/<url preference="100" location="internal" type="bittorrent">\(.*\)<\/url>/<url preference="100" location="internal" type="bittorrent">\1.torrent<\/url>/g' > f.metalink
4.     make the archive available on https, and the torrent file on https and a metalink too.
o    cp FreeBSD-8.4-BETA1-i386-dvd1.iso /var/www
o    cp f.metalink /var/www
o    cp FreeBSD-8.4-BETA1-i386-dvd1.iso.torrent /var/www
5.     on the node that is your tracker in each datacenter, start aria2c to download the metalink, it will then download the torrent and start to seed as it downloads the the archive with multi-part https download
o    aria2c --seed-ratio=0.0 --disable-ipv6 -V -d /var/www http://bt.master/f.metalink
6.     on your endpoints start aria2c to download the torrent , they will automatically then download the file in the torrent from the swarm.  set a post download hook to finish the job.
o    aria2c --seed-ratio=0.0 --disable-ipv6 -V -d /var/www --on-bt-download-complete=nextsteps.sh http://bt.local.dc/FreeBSD-8.4-BETA1-i386-dvd1.iso.torrent

using magnet links and dht, this process can probably be simplified , removing the need for a tracker, if I figure it out, i'll post it .

Also, initiating this with lsyncd looks like a good thing to do too.