One of the fun parts of my work as a networking and security specialist is the different challenges customers task you with. One of the most interesting tasks I get to do is migrating datacenters. A lot of our customers run 24/7 operations so often the request is to do zero-downtime migrations. But are they a thing, or is a zero downtime migration something like the mythical unicorn?I have executed a lot of zero to low datacenter migrations from different distances. In recent years I did many migrations between datacenters in a metropolitan area (+- 50km radius) but also across greater distances. The last migration I did was from Berlin to Amsterdam which is about 700km. All with existing equipment.In this blog I will tell you a bit about different migration strategies and how you would execute a zero to low downtime migration.
Why zero downtime migrations need to happen
Zero downtime migrations are difficult to execute and need a great deal of planning. Why would someone go through the trouble of starting an endeavour like this?Often there's a business reason for migrating a datacenter:
- The existing datacenter is out of date and a refresh is unfeasible
- Colocation space is more expensive than a prospective new space
- There's no room for growth in the current space
- Failed negotiations forcing the end of a contract
Or one organization acquires a new business that needs to integrate in their own infrastructure.Organizations usually have all kind of different reasons to migrate datacenters and most of them aren't technical of nature. They do pose a big risk to daily operations, there's a chance of hardware failure, data inconsistency or even data loss.
Key 1: Redundancy
Redundancy is important in every 24/7 operating datacenter but I still see cases where not all components are redunant. Key to all migration scenarios is redundancy so before you start: make sure you know that the environment is redundant.
Key 2: Planning
Get all stake holders in a (regular) meeting and work out the best migration scenario. Map out all the requirements and make sure you know all the applications and their dependencies. This will influence which migration scenario will work for your environment. Key to a succesful migration is a planning that defines all the steps and the timelines. One of the caveats you can run into is wanting to do things too quick. This leads to scenario's where you plan too much work in a day. Make sure you plan for a lot of slack in your days work, for example: if it takes you 4 hours to migrate a couple of servers, plan for 6 and don't plan extra work on the same day.
Key 3: Ownership
Don't make the migration team too big. Most migrations I did the key migration team only consisted of 2-3 people. This leads to ownership of the migration and people who own a task are more dedicated to get it done.
So when you have your kick ass migration team together it's time to look at the different migration scenario's. From a network perspective there are two different kinds of scenario's: frontend and backend might be different.
- Layer-2 adjacent networks - In this scenario you extend your existing network to a new location. Spanning all your VLANs across and then move your machines one-by-one. This scenario is the easiest but it has also a couple of downsides. Getting a temporary layer-2 link between two datacenter locations might be expensive. A second downside is the fact that dispersed VLANs pose a risk to your network stability.
- Separated networks - When the backend networks have different IP space it's impossible to keep the original IP's. This means you need to renumber all the hosts you are migrating to the new datacenter. This might cause problems with applications or services using hard coded IP addresses.
- Migrate services to new (or swing) hardware - You can migrate your applications and services to a new set of hardware. This way you don't need to migrate the production environment but swing the frontend to the new backend. I have also seen this scenario used with temporary swing hardware to ease a migration.
- Same public IP space - If you have your own IP space and have the possibility of advertising this in many datacenters you can migrate services using techniques as AnyCast or disable a Virtual IP or NAT in the old location and enable it on the new side.
- Different public IP addresses - When you don't have the luxury of your own IP space you need to migrate your services using the change of DNS records. Make sure you lower the Time To Live (TTL) of the DNS records to 5 minutes well before the migration. A good strategy is to change the TTL to a day about a week before the migration and to 5 minutes a day before.
Mythical Unicorn or not?
Low downtime migrations are pretty manageable. Zero downtime migrations need a great amount of planning and the right type of equipment. But it's definitely doable.I migrated a 14 cabinet datacenter location across 15km by using a Cisco VSS cluster and HA loadbalancer pairs by using a temporary layer-2 connection. As a first step we extended the network to the new location, then migrated most of the servers to the new datacenter over a couple of weeks. When we reached a critical mass (most of the servers were running in the new location) we migrated all the standby components. This took about a day to unrack, move and get things running on the new location. The next day we failed over the VSS cluster and when it was redundant we failed over the loadbalancer pairs. After that we were able to move the remaining equipment. From start to finish this project took about 6 weeks.As you can see a zero downtime migration is possible and a very nice challenge once in a while for a network engineer.