The little computer that did

April 13th, 2016 by

At the end of March we migrated the Raspberry Pi website from a very big multi-core server to a tiny cluster of eight Raspberry Pi 3s. Here’s a bit more detail about how it worked.

The Pi rack not fooling anyone on April 1st

The Pi rack not fooling anyone on April 1st

Booting

For the Raspberry Pi 3 launch we tried out some Pis running in a data centre environment with high load using the SD card for the root filesystem. They kept crashing, if you exceed the write capability of the card the delays make the kernel think the storage has failed and the system falls over. We also want to be able to remotely rebuild the filesystem so we can fix a broken Pi remotely. So we’ve put the root filesystem on a network file server, which is accessed over NFS.

The Raspberry Pi runs the latest kernel, 4.1.18-v7+ and boots from the SD card with a configuration as follows:

dwc_otg.lpm_enable=0 console=ttyAMA0,115200 console=tty1 root=/dev/nfs rootfstype=nfs
  ip=10.46.189.2::10.46.189.1:255.255.255.252::eth0:off 
  nfsroot=10.46.189.1:/export/10.46.189.2 elevator=deadline 
  fsck.repair=yes rootwait

This brings up a block of 4 IP addresses on eth0. One address for the network, one for broadcast, one for the Pi and one for the network fileserver. It then mounts the NFS filesystem at:

nfsroot=10.46.189.1:/export/10.46.189.2

and uses that as the root filesystem.

Overly simple introduction to VLANs

On a traditional switch, you plug things and any ethernet port can talk to any other ethernet port. If you want to have two different networks you need two different switches, and any computer that needs to be on both networks needs two network ports. In our case we’re trying to have a private network for storage for each Raspberry Pi, so each Pi requires its own switch and the fileserver needs it’s own network port for every Raspberry Pi connected to keep them separate. This is going to get expensive very quickly.

Instead we turn on virtual LANs (VLAN). We connect our fileserver to port 24 and create a VLAN for ports 1 & 24, another for 2&24, etc. The switch configuration for the fileserver port specifies these VLANs as “tagged”, meaning our switch adds a header to the front of every packet from a Raspberry Pi port that allows the fileserver to tell which VLAN, and therefore which Raspberry Pi, the packet came from. The fileserver can reply with the same header, and that packet will only be sent to that specific Raspberry Pi. It behaves as if each Raspberry Pi has its own switch.

Network on the fileserver

The fileserver sees each VLAN as a separate network card, named eth0.N where N identifies the VLAN. We can configure them like any other network interface:

auto eth0.10
iface eth0.10 inet static
	address 10.46.189.1
	netmask 255.255.255.252

auto eth0.11
iface eth0.11 inet static
	address 10.46.189.5
	netmask 255.255.255.252

eth0.10 and eth0.11 appear to be network cards with a tiny network with one Raspberry Pi on the end, but in reality there’s a single physical ethernet connection underneath all of them.

Network on the Raspberry Pi

On the Raspberry Pi, eth0 is already configured on the Raspberry Pi by the boot line above to talk to the fileserver. On our switch configuration, we specify that private network is “untagged” on Raspberry Pi port, which means that it won’t have a VLAN header on it and we can access it as “eth0” rather than “eth0.N” as we did on the fileserver.

In order to do anything useful, we also need to give the Raspberry Pis access to the public network. On our network, the public network is accessible on VLAN 131. We configure this to be a “tagged” VLAN on the Raspberry Pi port, meaning it becomes accessible on the eth0.131 interface. We can configure this in the normal way, and in keeping with other back-end servers on the Raspberry Pi setup, it only has an IPv6 address:

auto eth0.131
iface eth0.131 inet6 static
	address	2a00:1098:0:84:1000:1::2
	netmask 64
	gateway	2a00:1098:0:84::1

Effectively the Raspberry Pi believes it has two network cards, one on eth0 which is a private network shared with the fileserver, one on eth0.131 which has an IPv6 address and is connected to the real internet.

Why all that configuration?

In an ideal world we’d have a single IPv6 address for each Pi, and mount the network filesystem with it. However, with an NFS root filesystem, potentially another user on the LAN who can steal your IPv6 address can access your files. There’s a second complication, IPv4 is built into the standard kernel on the Raspberry Pi and the differences per Pi are constrained to just the kernel command line, with IPv6 we’d have to build it into an initrd which would load up the IPv6 modules and set up the NFS mounts.

Planning for the future we’ve spoken to Gordon about how PXE boot on the Raspberry Pi will work and it’s extremely likely that it’s going to require IPv4 to pull in the bootloader, kernel and initrd. Whilst there is native IPv6 in the Raspberry Pi office, there isn’t any IPv6 on their test lan for developing the boot code and it’s a currently not a major priority for the Pi despite around 5% of the UK having native IPv6.

So if we want to make this commercial, each Pi needs its own storage network and it needs IPv4 on the storage network.

Power over Ethernet

We’ve added a Power over Ethernet HAT to our Raspberry Pis. This means that they receive power over the ethernet cable in addition to the two separate networks. As well as reducing the amount of space used by power bricks, it also means you can power cycle a Raspberry Pi just by re-configuring the switch.

Software

Each Raspberry Pi runs Raspbian with Apache2 installed. We’ve pulled in PHP7 from Debian Stretch to improve PHP performance and then copied all the files for the Raspberry Pi website onto the NFS root for each Raspberry Pi (so the fileserver effectively has 8 copies – one for each Pi). We then just added the IPv6 addresses of the Raspberry Pis into the site’s load balancer, deleted the addresses for the main x86 servers and waited for everything to explode.

Did it work?

Slightly to our surprise, yes and well. We had a couple of issues – the Pi is much slower than the x86 servers, not only clock speed but also the speed of the network card used to access the filesystem and the database server. Some rarely used functions, such as registering a new Raspberry Jam, weren’t really quick enough under the new setup and gave people some error pages as the connections timed out. Uploading images for new WordPress posts was similarly an issue as receiving a 3MB file and distributing eight copies on a 100Mbps network isn’t very fast. But mostly it worked.

Did power cycling the Pis via the switch work?

We never tested it in production, every Pi remained up and stable for the whole 3.5 day duration we had the system in use. In testing it’s been fine.

Can I buy one?

Not yet. At present you can still break a Pi by destroying the flash, and the enclosure doesn’t allow for replacement without taking the whole shelf (which in production would contain 96 Pis) offline. Once we have full netboot for the Pi, it is a service we could offer.

Can I register my interest to buy a Pi in the cloud?

Sure – email us at sales@mythic-beasts.com and we’ll add you to a list to keep you up to date.

Let’s Encrypt SSL Certificates using DNS API – HOWTO

March 16th, 2016 by

Here at Mythic Beasts, we’ve been busily undermining sales of our SSL certificates by rolling out support for free certificates from Let’s Encrypt, partly because we think that the internet should be secure by default, but mostly because we’re lazy and Let’s Encrypt makes it easy to fully automate certificate issue and deployment.

Domain validated certificates

The majority of SSL certificates in use today are “Domain Validated” certificates. These are issued automatically by a certificate authority once you have completed some action that proves that you are in control of the domain for which the certificate is being requested. This can include responding to an email send to an address at your domain, or posting a file to a specific location on your website.

Let’s Encrypt DNS challenge

One of the options for validation offered by Let’s Encrypt is a DNS challenge (known as “dns-01”), whereby you prove ownership of your domain by adding a specific entry to its DNS zone. This option is quite interesting, as it allows you to avoid meddling in any way with your web server configuration and, if your DNS is hosted with Mythic Beasts, you can automate the addition of the necessary records using our DNS API.

Automating via our DNS API

In order to support this, we’ve developed a hook script that works with the letsencrypt.sh client.

We’ve also written a step-by-step guide to configuring dns-01 validation using our DNS API.

Please note, if you’re a hosting account customer, you don’t need to worry about any of this. You can get an SSL certificate for your website simply by hitting a button in the control panel.

Thanks go to David Earl for testing this and providing the initial implementation of the hook script..

Additional Managed Rack Capacity

March 14th, 2016 by

We’ve spent even more time than usual in data centres recently as we’ve been kitting out our new cage in the Meridian Gate data centre.

Much of the new capacity is being deployed as “managed racks”.  Racks are generally supplied with the bare essentials of electricity, cooling and locked doors.  At Mythic Beasts, we transform them into managed racks, including all the features you need to effectively administer your equipment remotely, including:

logging serial consoles

Logging serial consoles

  • Internet connectivity – we’ve got 10Gbps connections onto both LINX networks, connecting at different sites.  We’ve also got multiple transit providers, and are present on the LoNAP peering exchange.   Our network has native IPv6 support, and if you have your own address space, we can provide you with BGP feeds from our routers. We can also offer private LANs, both as VLANs or as physically separate networks.
  • Remote power management – power cycle your server immediately, at any time using our customer control panel.
  • Serial connectivity – a 115.2kbps serial connection may seem a bit old fashioned in an age when we’re wiring our switches together at 40Gbps, but they remain an extremely effective mechanism for out-of-band control of servers and other equipment, particularly when coupled with our logging serial console software.
  • On-site support – all of our London facilities have 24/7 access to the data centres’ on-site engineers.  We are also able to arrange for our own staff to carry out routine maintenance, such as replacing failed hard drives.

Meridian Gate is the third London data centre in which we have a presence, along with Sovereign House and Harbour Exchange, with the three sites connected by our own dark fibre ring.

One-click DNSSEC – public beta

March 4th, 2016 by

It’s been a long time coming, but we’re now pleased to announce that we’ve got DNSSEC support in public beta, and you can enable it for your domain at the click of a button.

What is DNSSEC?

DNSSEC is a set of extensions to the DNS protocol that ensures that you can trust the IP addresses that you get back from the DNS system. For example, if you visit www.yourbank.com, the first thing that happens is that your browser uses a DNS server to find out the IP address of your bank’s web server. But how do you know that you can trust the address that you get back? Your request will probably get bounced through multiple DNS servers, such as your home router, your ISPs servers, and finally the authoritative server for the domain. If any one of those gets compromised (and let’s face it, home routers have a terrible security record) it could easily insert a different IP address and direct your request to an entirely different server.

DNSSEC means that all responses are signed with encryption keys that have been lodged with the registry, so you can’t inject bogus responses just by compromising an intermediate server.  Of course, the system only works if the systems making the requests check the signatures of the responses that they receive, something which certainly doesn’t happen everywhere yet.

Sounds complicated?

Yes it is, particularly as it is recommended that the encryption keys that you use are changed (or “rotated”) regularly. Fortunately, we’ve now automated all the hard stuff, and if you’ve got your domain registration and DNS hosting with Mythic Beasts, you can make DNSSEC go just by hitting a big green button.  We’ll take care of the rest:

Screen Shot 2016-02-29 at 18.37.29

Unlike some people, we believe that the internet should be a safe place to do business by default, so this service is, and will continue to be, provided at no extra cost.

If you want to try it out, simply visit our control panel, find the domain under “My Domains” and follow the “DNSSEC” link.

Free SSL certificates for hosting accounts

January 29th, 2016 by

Customers with hosting accounts on either yali or onza can now get free SSL certificates for websites, allowing you to have an https version of your website. We’re using the Let’s Encrypt certificate authority to provide the certificates.

To get a certificate and enable https hosting for your site, simply press the button in the control panel, and within 5 minutes you should have a working https site.  You can find the option under “Web and Email Hosting“.

Free SSL at the press of a button

Free SSL at the press of a button

Let’s Encrypt certificates have a short expiry period, but we will take care of automatically renewing them for you.

Why use HTTPS/SSL?

Using SSL on your website means that traffic between our server and your user’s computers is encrypted and can’t be intercepted (despite David Cameron’s desires).  It allows browsers to guarantee that they are indeed talking to the website shown in the address bar, even if they are using an untrusted network connection.  Even if you don’t view the security aspects as a benefit, Google have previously announced that they will boost the page ranking of SSL-enabled sites.

Sphinx accounts

Unfortunately, this service is not yet available to customers on our sphinx server.  We are working on that, and will have it enabled in the near future.

Testing failure: Raspbian

December 6th, 2015 by
Programmer art, just say no.

Programmer art, just say no.

If you’ve had a look at the Raspbian website today you’ll have noticed the big red !!!FAILOVER TEST!!! logo at the top right corner. That’s because today is officially unimportant for Raspberry Pi, whereas in three weeks time it will be officially very important. Historically Christmas day sees our highest traffic loads as people unwrap their new Raspberry Pis and try them out. The most critical things for us to worry about are some of the educational and getting started resources on the website, and Raspbian and the mirror director so people can download new packages for their existing Raspberry Pis.

The majority of the website has a relatively small amount of data, so pulling an image from backup and redeploying is a relatively quick operation. Raspbian however is a bit harder – it’s a big image with around 4TB of data.

So we picked today to schedule a failover of Raspbian from it’s normal dedicated server to a VM hosted in the Raspberry Pi cloud. This is aiming to check

  • Is the failover server up to date and does it work?
  • Is the failover setup fast enough to keep up with the traffic load?
  • Does every service successfully fail over?

So far we’ve had a very smooth operation, we’ve had to add a couple of missing packages that had been overlooked during setup and testing, but basically we did a DNS flip and the whole site moved over.

If you like to discover that your disaster recovery system works before you have a disaster, have a look at our Managed Services or get in touch – sales@mythic-beasts.com.

Raspberry Pi Zero: Not executing a trillion lines of PHP

November 27th, 2015 by

A number of people noticed that Raspberry Pi had launched their $5 Pi Zero yesterday. We had advance warning that something was going to happen, even if we didn’t know exactly what. When the Pi2 launched we had some difficulties keeping up with comment posting and cache invalidation. We gave a very well received talk on the history and launch at the UK Network Operators Forum which you can see below.


Since then we’ve worked with Ben Nuttall to rebuild the entire hosting setup into an IPv6-only private cloud, hosted on one of our very large servers. This gives us :

  • Containment: One part of the site can’t significantly impact the performance of another.
  • Scalability: We can pull VMs into our public cloud and duplicate them if required.
  • Flexibility: We no longer have to have a single software stack that supports everything.

For the Pi 2 launch we sustained around 4500 simultaneous users before we really started struggling with comment posting and cache invalidation. So our new plan was to be able to manage over 5,000 simultaneous site users before we needed to start adding more VMs. This equates to around 1000 hits per second.

In order to do this, we need to make sure we can serve any of the 90% of the most common requests without touching the disks or the database; and without using more than 10ms of CPU time. We want to reserve all our capacity for pages that have to be dynamic – comment-posting and forums, for example – and make all the common content as cheap as possible.

So we deployed a custom script staticify. This automatically takes the most popular and important pages, renders them to static HTML and rewrites the webserver configuration to serve the static pages instead. It runs frequently so the cache is never more than 60 seconds old, making it appear dynamic.  It also means that we serve a file from filesystem cache (RAM) instead of executing WordPress. During the day we improved and deployed this same code to the MagPi site including some horrid hackery to cache popular GET request combinations.


 


Some very vague back-of-the-envelope calculations give us:

 


It’s fair to say that we exceeded our target of 5,000 simultaneous users,

 


Liz Upton was quite pleased:

 


Not to mention a certain amount of respect from our peers

 


If you deployed the blog unoptimised to AWS and just had auto-magic scaling, we’d estimate the monthly bills to be many tens of thousands of dollars per month, money that instead can be spent on education. In addition you’d still need to make sure you can effortlessly scale to thousands of cores without a single bottleneck somewhere in the stack causing them all to lie idle. The original version of the site (with hopeless analytics plugin that processed the complete site logs on every request) would consume more computer power than has ever existed under the traffic mentioned above. At this scale optimisation is a necessity, and if you’re going to optimise, you might as well optimise well.

That said, we think some of our peers possibly overstated our importance in the big scheme of things,

 


IPv4 is so last century

November 11th, 2015 by
A scary beast that lives in the Fens.

A scary beast that lives in the Fens.

Fenrir is the latest addition to the Mythic Beasts family. It’s a virtual machine in our Cambridge data centre which is running our blog. What’s interesting about it, is that it has no IPv4 connectivity.

eth0 Link encap:Ethernet HWaddr 52:54:00:39:67:12
     inet6 addr: 2a00:1098:0:82:1000:0:39:6712/64 Scope:Global
     inet6 addr: fe80::5054:ff:fe39:6712/64 Scope:Link
     UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

It is fronted by our Reverse Proxy service – any connection over IPv4 or IPv6 arrives at one of our proxy servers and is forwarded on over IPv6 to fenrir which generates and serves the page. If it needs to make an outbound connection to another server (e.g. to embed our Tweets) it uses our NAT64 service which proxies the traffic for it.

All of our standard management services are running: graphing, SMS monitoring, nightly backups, security patches, and the firewall configuration is simpler because we only need to write a v6 configuration. In addition, we don’t have to devote an expensive IPv4 address to the VM, slightly reducing our marketing budget.

For any of our own services, IPv6 only is the new default. Our staff members have to make a justification if they want to use one of our IPv4 addresses for a service we’re building. We now also need to see how many addresses we can reclaim from existing servers by moving to IPv6 + Proxy.

Rebuilding Software RAID 1 refused to boot

October 30th, 2015 by

Dear LazyWeb,

Yesterday we did a routine disk replacement on a machine with software RAID. It has two mirrored disks, sda and sdb with a RAID 1 partition with software RAID, /dev/md1 mirrored across /dev/sda3 and /dev/sdb3. We took the machine offline and replaced /dev/sda. In netboot recovery mode we set up the partition table on /dev/sda, then set the array off rebuilding as normal:

mdadm --manage /dev/md1 --add /dev/sda3

This expects to take around three hours to complete, so we told the machine too boot up normally and rebuild in the background while being operational. This failed – during bootup in the initrd, the kernel (Debian 3.16) was bringing up the array with /dev/sda3, but not /dev/sdb3, claiming it didn’t have enough disks to start the array and refusing to boot.

Within the initrd if I did:

mdadm --assemble /dev/md1 /dev/sda3 /dev/sdb3

the array refused to start claiming that it didn’t have sufficient disks to bring itself online, but if I did:

mdadm --assemble /dev/md1 /dev/sdb3
mdadm --manage /dev/md1 --add /dev/sda3

within the initrd it would bring up the array and start it rebuilding.

Our netboot recovery environment (same kernel) meanwhile correctly identifies both disks, and leaves the array rebuilding happily.

To solve it we ended up leaving the machine to rebuild in the network recovery mode until the array was fully redundant at which point the machine booted without issue. This wasn’t an issue – it’s a member of a cluster so downtime wasn’t a problem – but in general it’s supposed to work better than that.

It’s the first time we’ve ever seen this happen and we’re short on suggestions as to why – we’ve done hundreds of software RAID1 disk swaps before and never seen this issue.

Answers or suggestions in an email or tweet.

IPv6 Graphing

October 15th, 2015 by
it's a server graph!

it’s a server graph!

One of the outstanding tasks for full IPv6 support within Mythic Beasts was to make our graphing server support IPv6 only hosts. In theory this is trivial, in practice it required a bit more work.

Our graphing service uses munin, and we built it on munin 1.4 nearly five years ago; we scripted all the configuration and it has basically run itself ever since. When we added our first IPv6 only server it didn’t automatically get configured with graphs. On investigation we discovered that munin 1.4 just didn’t support IPv6 at all, so the first step was to build a new munin server based on Debian Jessie with munin 2.0.

Our code generates the configuration file by printing a line for each server to monitor which includes the IP address. For IPv4 you print the address as normal, 127.0.0.1, for IPv6 you have to encase the address in square brackets [2a00:1098:0:82:1000:0:1:1]. So a small patch later to spot which type of address is which and we have a valid configuration file.

Lastly we needed to add the IPv6 address of our munin server into the configuration file of all the servers that might be talked to over IPv6. Once this was done, as if by magic, thousands of graphs appeared.