High-availability OpenStack Networking: Implementations and Lessons Learned
TSAI
A TSAI engineer explains how the company implemented a high-availability network architecture using OpenStack. For organizations looking to build and operate an on-premises private cloud solution using OpenStack, one of the most important aspects to consider is networking. As deployments scale, so do the complexities of performance, operational stability, and reliability. Below, TSAI engineer Tony Walker explains from a practitioner’s perspective how the company implemented a high-availability network architecture suitable for deployment at enterprise scale and beyond.
When TSAI started building its private cloud, we had some key goals/objectives that we could not reasonably meet with OpenStack “out of the box,” so we decided to implement our own solution. Since then, many alternatives to building and running custom solutions have emerged, not only from the OpenStack project (see Distributed Virtual Router), but also from vendors specializing in proprietary solutions.
Networking and OpenStack
When we started thinking about our cloud network architecture, three things were critical to us:
- Users should be able to create tenant networks with “real” IPs that are accessible from anywhere in our environment.
- Tenant networks should be highly available with no single point of failure.
- OpenStack network performance should be as close to bare metal as possible.
Transition from Traditional to High Availability
One choice that needs to be made early on is which network isolation technology to use. If you are running a deployment consisting of multiple racks of servers, they will most likely exist in separate Layer 2 network domains (at least for us, this was a conscious choice). In this case, the only real option available is to use an “overlay” provider. Doing so allows OpenStack networks to span multiple Layer 2 domains, which gives you the first building block you need to configure a high availability solution. We chose VXLAN as the overlay protocol, primarily because the NICs support hardware offload of the encapsulation/decapsulation process. By leveraging the ability to move the computational heavy lifting into the NIC hardware, we were able to realize a ~4x performance improvement over a standard software-only setup.
With VXLAN, you can start with a large network block (we chose /20 to start) from which OpenStack users can create their own networks. Vanilla OpenStack allows users to specify arbitrary IP ranges when creating networks, which can complicate things when using fixed blocks. For this reason, we not only wanted to block this functionality, but also implement a service to automatically select the next available subnet from that range. We then adopted a separate service to orchestrate the first phase of the HA configuration.
Every time a new network is created, two logical routers (Neutron gateways) are attached and connected to that network. Each router is located in a different rack and connected to the external gateway network, ensuring redundant connectivity to the VMs.
Deployment Strategy
Deciding how many Neutron gateways should be deployed and where is initially fairly straightforward. At a minimum, there should be two Neutron gateway nodes spread across multiple racks to meet basic HA requirements. Another option is to run the Neutron L3 agent on each compute node, which will significantly improve network performance as the load is distributed across more nodes in the cluster. However, overlay networks and Neutron gateways have introduced a new level of complexity, and we believe that handling a gateway per node would increase complexity to the point where the costs outweigh the benefits. As the number of tenant networks increases and the number of compute nodes increases, you will also need to scale up the gateways. We have found that running three to six Neutron gateways per rack provides a good balance between redundancy, performance, and complexity.
Looking Ahead
While this sounds great in theory, effectively replicating the network namespace into the compute nodes brings with it a ton of new concerns and considerations, especially from an operations and maintenance perspective. If troubleshooting is difficult with a limited set of Neutron gateways, the idea of multiplying that number by the compute nodes in each region is mind-boggling. Also, in that sense, having each node act as a router leads to other side effects, such as increased compute usage and contention when dealing with the network stack. There are also issues with the overhead required when launching new VMs, for example. The ARP tables in all compute nodes are pre-populated with all possible entries, which means that all compute nodes in a tenant network need their tables to be in sync at all times, which can come with performance penalties and takes time to discover at scale.
There have been many interesting developments in the Openstack networking space since we started deploying it, and we look forward to exploring the possibilities and challenges ahead.