At the end of March we migrated the Raspberry Pi website from a very big multi-core server to a tiny cluster of eight Raspberry Pi 3s. Here’s a bit more detail about how it worked.
The Pi rack not fooling anyone on April 1st
Booting
For the Raspberry Pi 3 launch we tried out some Pis running in a data centre environment with high load using the SD card for the root filesystem. They kept crashing, if you exceed the write capability of the card the delays make the kernel think the storage has failed and the system falls over. We also want to be able to remotely rebuild the filesystem so we can fix a broken Pi remotely. So we’ve put the root filesystem on a network file server, which is accessed over NFS.
The Raspberry Pi runs the latest kernel, 4.1.18-v7+ and boots from the SD card with a configuration as follows:
dwc_otg.lpm_enable=0 console=ttyAMA0,115200 console=tty1 root=/dev/nfs rootfstype=nfs
ip=10.46.189.2::10.46.189.1:255.255.255.252::eth0:off
nfsroot=10.46.189.1:/export/10.46.189.2 elevator=deadline
fsck.repair=yes rootwait
This brings up a block of 4 IP addresses on eth0. One address for the network, one for broadcast, one for the Pi and one for the network fileserver. It then mounts the NFS filesystem at:
nfsroot=10.46.189.1:/export/10.46.189.2
and uses that as the root filesystem.
Overly simple introduction to VLANs
On a traditional switch, you plug things and any ethernet port can talk to any other ethernet port. If you want to have two different networks you need two different switches, and any computer that needs to be on both networks needs two network ports. In our case we’re trying to have a private network for storage for each Raspberry Pi, so each Pi requires its own switch and the fileserver needs it’s own network port for every Raspberry Pi connected to keep them separate. This is going to get expensive very quickly.
Instead we turn on virtual LANs (VLAN). We connect our fileserver to port 24 and create a VLAN for ports 1 & 24, another for 2&24, etc. The switch configuration for the fileserver port specifies these VLANs as “tagged”, meaning our switch adds a header to the front of every packet from a Raspberry Pi port that allows the fileserver to tell which VLAN, and therefore which Raspberry Pi, the packet came from. The fileserver can reply with the same header, and that packet will only be sent to that specific Raspberry Pi. It behaves as if each Raspberry Pi has its own switch.
Network on the fileserver
The fileserver sees each VLAN as a separate network card, named eth0.N where N identifies the VLAN. We can configure them like any other network interface:
auto eth0.10
iface eth0.10 inet static
address 10.46.189.1
netmask 255.255.255.252
auto eth0.11
iface eth0.11 inet static
address 10.46.189.5
netmask 255.255.255.252
eth0.10
and eth0.11
appear to be network cards with a tiny network with one Raspberry Pi on the end, but in reality there’s a single physical ethernet connection underneath all of them.
Network on the Raspberry Pi
On the Raspberry Pi, eth0 is already configured on the Raspberry Pi by the boot line above to talk to the fileserver. On our switch configuration, we specify that private network is “untagged” on Raspberry Pi port, which means that it won’t have a VLAN header on it and we can access it as “eth0” rather than “eth0.N” as we did on the fileserver.
In order to do anything useful, we also need to give the Raspberry Pis access to the public network. On our network, the public network is accessible on VLAN 131. We configure this to be a “tagged” VLAN on the Raspberry Pi port, meaning it becomes accessible on the eth0.131 interface. We can configure this in the normal way, and in keeping with other back-end servers on the Raspberry Pi setup, it only has an IPv6 address:
auto eth0.131
iface eth0.131 inet6 static
address 2a00:1098:0:84:1000:1::2
netmask 64
gateway 2a00:1098:0:84::1
Effectively the Raspberry Pi believes it has two network cards, one on eth0
which is a private network shared with the fileserver, one on eth0.131
which has an IPv6 address and is connected to the real internet.
Why all that configuration?
In an ideal world we’d have a single IPv6 address for each Pi, and mount the network filesystem with it. However, with an NFS root filesystem, potentially another user on the LAN who can steal your IPv6 address can access your files. There’s a second complication, IPv4 is built into the standard kernel on the Raspberry Pi and the differences per Pi are constrained to just the kernel command line, with IPv6 we’d have to build it into an initrd which would load up the IPv6 modules and set up the NFS mounts.
Planning for the future we’ve spoken to Gordon about how PXE boot on the Raspberry Pi will work and it’s extremely likely that it’s going to require IPv4 to pull in the bootloader, kernel and initrd. Whilst there is native IPv6 in the Raspberry Pi office, there isn’t any IPv6 on their test lan for developing the boot code and it’s a currently not a major priority for the Pi despite around 5% of the UK having native IPv6.
So if we want to make this commercial, each Pi needs its own storage network and it needs IPv4 on the storage network.
Power over Ethernet
We’ve added a Power over Ethernet HAT to our Raspberry Pis. This means that they receive power over the ethernet cable in addition to the two separate networks. As well as reducing the amount of space used by power bricks, it also means you can power cycle a Raspberry Pi just by re-configuring the switch.
Software
Each Raspberry Pi runs Raspbian with Apache2 installed. We’ve pulled in PHP7 from Debian Stretch to improve PHP performance and then copied all the files for the Raspberry Pi website onto the NFS root for each Raspberry Pi (so the fileserver effectively has 8 copies – one for each Pi). We then just added the IPv6 addresses of the Raspberry Pis into the site’s load balancer, deleted the addresses for the main x86 servers and waited for everything to explode.
Did it work?
Slightly to our surprise, yes and well. We had a couple of issues – the Pi is much slower than the x86 servers, not only clock speed but also the speed of the network card used to access the filesystem and the database server. Some rarely used functions, such as registering a new Raspberry Jam, weren’t really quick enough under the new setup and gave people some error pages as the connections timed out. Uploading images for new WordPress posts was similarly an issue as receiving a 3MB file and distributing eight copies on a 100Mbps network isn’t very fast. But mostly it worked.
Did power cycling the Pis via the switch work?
We never tested it in production, every Pi remained up and stable for the whole 3.5 day duration we had the system in use. In testing it’s been fine.
Can I buy one?
Not yet. At present you can still break a Pi by destroying the flash, and the enclosure doesn’t allow for replacement without taking the whole shelf (which in production would contain 96 Pis) offline. Once we have full netboot for the Pi, it is a service we could offer.
Can I register my interest to buy a Pi in the cloud?
Sure – email us at sales@mythic-beasts.com and we’ll add you to a list to keep you up to date.