migration « Remi Bergsma's blog

Replacing both DRBD nodes while keeping the storage online

19 November 2012 — Leave a comment

DRBD (Distributed Redundant Block Device) is an open source storage solution that is best compared with a RAID-1 (mirror) between two servers. I’ve implemented this for both our cloud storage as our cloud management servers.

We’re in the process of replacing both cloud storage nodes (that is: everything except the disks and its raid array) and of course no downtime is allowed. Although DRBD is made for redundancy (one node can be offline without impact), completely replacing a node is a bit tricky.

Preventing a ‘split brain’ situation
The most important thing to remember is that only one storage node is allowed to be the active node at all times. If this is violated, a so-called ‘split brain’ happens. DRBD has methods in surviving such a state, but it is best to prevent it from happening.

When discussing this project in our Team, it was suggested to boot the replaced storage node without any networking cables. Usually this is a safe way to prevent the node from interacting with others. In this case, it is not such a good idea: since the new secondary server has the same disks and configuration as the old one unexpeced thing may happen. When booting without networking cables attached, both nodes cannot find one another and the newly booted secondary may decode to become primary itself. A split brain situation will then occur: both nodes will be primary at the same time and access the data. You won’t be able to recover from this, unless you manually decide which node is the master (and lose the changes on the other node). In this case this can be easily decided, but it’s a lot of unneccessary trouble.

Instead, boot the replaced node with network cables connected so the replication network will be up and both nodes immediately will see each other. In our case, this means connecting the 10Gbps connection between the nodes. This connection is used by DRBD for syncing and by Heartbeat for sending the heartbeats. This prevents entering the ‘split brain’ state and immediately starts syncing.

Note: If you want to replace everything including the disks, you’ll have to manually join the cluster with the new secondary node and then sync the data. In this case it doesn’t matter whether the networking cables are connected or not, since this new node won’t be able to become primary anyway.

The procedure
Back to our case: replacing all hardware except for the disks. We managed to successfully replace both nodes using this procedure:

shut down the secondary node
replace the hardware, install the existing disks and 10Gbps card
boot the node with at least the 10Gbps connection active
the node should sync with the primary
when syncing finishes, redundancy is restored
make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
when all is fine, check if DRBD and Heartbeat are running without errors on both nodes
then stop heartbeat on the primary node. A fail-over to the new secondary node will occur
if all went well, you can now safely shut down the old primary
replace the hardware, install the existing disks and 10Gbps card
boot the node with at least the 10Gbps connection active
the node should sync with the primary
when syncing finishes, redundancy is restored
make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
the old primary is now secondary
If you want, initialize another fail-over (In our case we didn’t fail-over again, since both nodes are equal powerful)

Congratulations: the cluster is redundant again with the new hardware!

Using the above procedure, we replaced both nodes of our DRBD storage cluster without any downtime.

In Linux sysadmin drbd, howto, migration, redundancy, replication, storage

Removing old equipment from datacenter

21 July 2012 — Leave a comment

We’ve virtualized many servers already this year. Last month we moved the MD equipment that still needs to be virtualized. Today we removed the old MN servers that are no longer needed due to our new private cloud. All our main equipment is now together in the DC-2 datacenter in Amsterdam!

In Linux sysadmin data center, migration, move

Moving a MySQL slave to a new replication master

10 July 2012 — Leave a comment

I’m upgrading our MySQL master/slave setup and am moving it to new (virtual) hardware in our cloud environment. One of the things I did last night was moving the MySQL slaves to a new master that I had prepared in the new environment. This post describes how I connected the slaves to their new master in the cloud.

First, you’ll need to make sure the new master has the same data as the old one.
1. Make sure no more updates occur on the old master
2. Create a sql dump of the master using mysqldump
3. Import that dump into the new master using mysql cmd line tool

At this point both masters should have the same data.
4. Now, shut down the old master as it can be retired 😉
5. Before allowing write access to the new master, note it’s position by executing this query:

mysql> show master status\G;
File: mn-bin.000005
Position: 11365777
Binlog_Do_DB: database_name
Binlog_Ignore_DB:
1 row in set (0.00 sec)

We’ll need this information later on when instructing the slaves to connect to their new master.

6. It’s now safe to allow write access again to the new master
7. Do this on any slave, it will connect it to the new master:

CHANGE MASTER TO
master_host=’master_hostname’,
master_user=’replicate_user’,
master_password=’password’,
master_log_file=’log-bin.000005‘,
master_log_pos= 11365777

Note the ‘master_log_file’ and ‘master_log_pos’. Their values are the ones we selected from the master at step 5. Then check if it worked (allow a few seconds to connect):

mysql> show slave status\G;

Look for these lines, they should say ‘Yes’:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

And the status should be:

Slave_IO_State: Waiting for master to send event

That’s it, the slave is now connected to a new master. Test it by updating the master, and checking whether the slave receives the update too.

In Linux sysadmin howto, migration, mysql, replication

Video impression of MD datacenter move 2012

1 July 2012 — Leave a comment

After many weeks of preparation we moved MD’s equipment from the EUNetworks datacenter in Amsterdam, to Gyrocenter DC-2. This is also our new datacenter where we’ve built a CloudStack cloud.

Most servers were virtualized before this migration and only had to be removed during this operation. Others will be virtualized in the coming months. Fortunately most services had only some downtime during the night.

We also moved MD’s backup location (Redbus) and integrated it with our equipment in Global Switch.

As you can see, our Team of 5 worked very hard. Enjoy this video impression!

In Linux sysadmin data center, migration, move, video

Redirecting incoming tcp connections with redir

28 February 2012 — 5 Comments

Sometimes you need an easy way to redirect incoming connections to another system. For example when migrating an old box to a new one. Today I came across an old note from myself explaining this. I thought it might be worth sharing 🙂

You’ll need the ‘redir’ program for this to work:

apt-get install redir

Redir redirects tcp connections coming in to a local port to a specified address/port combination like this:

redir –laddr=10.10.0.1 –lport=80 –caddr=10.10.10.1 –cport=80

This redirects web requests coming in at 10.10.0.1 to 10.10.10.1.

redir –laddr=10.10.0.1 –lport=21 –caddr=10.10.10.1 –cport=21 –ftp=both

And here the same for ftp. Note the –ftp option, this makes sure both passive and active ftp work.

Many years ago, when I didn’t know about this option, I had to rollback an upgrade in the middle of the night, because ftp redirection just didn’t work.. all I had to do was add –ftp=both to redir. I found it out the next morning, and did the upgrade again the following night. I’ll never forget! Just had to smile when I found the note today 🙂

In Linux sysadmin connections, migration, redirect

	Shashi on Setting locales correctly on M…
	Sayling Low on Alt-key in OSX-Terminal
	Roger on Setting locales correctly on M…
	belwardblog on HOWTO discover the ip address…
	Guilherme Caeiro Dia… on Setting locales correctly on M…
	Terminal Show Multip… on Setting locales correctly on M…
	bodhix on RRDtool: moving data between 3…
	vasu on One-liner: restore compressed…
	Angel on HOWTO quickly add a route in M…
	Kar.ma on HOWTO connect to hosts on a re…
	Home \| MacarioJames.… on Sed inline editing different o…
	Mac i problemy z loc… on Setting locales correctly on M…
	NearlyNormal on HOWTO enable color for PHP and…
	Yong on Connecting two Open vSwitches…
	Aysad Kozanoglu on Creating a multi hop SSH tunne…

Remi Bergsma's blog

Archives For 30 November 1999

Replacing both DRBD nodes while keeping the storage online

Removing old equipment from datacenter

Moving a MySQL slave to a new replication master

Video impression of MD datacenter move 2012

Redirecting incoming tcp connections with redir

About me

Blog Stats

Tag Cloud

Top posts

Recent comments

Archives

Tweets @remibergsma

Follow Blog via Email

Remi Bergsma's blog

Archives For 30 November 1999

Replacing both DRBD nodes while keeping the storage online

Rate this:

Share this:

Removing old equipment from datacenter

Rate this:

Share this:

Moving a MySQL slave to a new replication master

Rate this:

Share this:

Video impression of MD datacenter move 2012

Rate this:

Share this:

Redirecting incoming tcp connections with redir

Rate this:

Share this:

About me

Blog Stats

Tag Cloud

Top posts

Recent comments

Archives

Tweets @remibergsma

Follow Blog via Email