Replacing both DRBD nodes while keeping the storage online

DRBD (Distributed Redundant Block Device) is an open source storage solution that is best compared with a RAID-1 (mirror) between two servers. I’ve implemented this for both our cloud storage as our cloud management servers.

We’re in the process of replacing both cloud storage nodes (that is: everything except the disks and its raid array) and of course no downtime is allowed. Although DRBD is made for redundancy (one node can be offline without impact), completely replacing a node is a bit tricky.

Preventing a ‘split brain’ situation
The most important thing to remember is that only one storage node is allowed to be the active node at all times. If this is violated, a so-called ‘split brain’ happens. DRBD has methods in surviving such a state, but it is best to prevent it from happening.

When discussing this project in our Team, it was suggested to boot the replaced storage node without any networking cables. Usually this is a safe way to prevent the node from interacting with others. In this case, it is not such a good idea: since the new secondary server has the same disks and configuration as the old one unexpeced thing may happen. When booting without networking cables attached, both nodes cannot find one another and the newly booted secondary may decode to become primary itself. A split brain situation will then occur: both nodes will be primary at the same time and access the data. You won’t be able to recover from this, unless you manually decide which node is the master (and lose the changes on the other node). In this case this can be easily decided, but it’s a lot of unneccessary trouble.

Instead, boot the replaced node with network cables connected so the replication network will be up and both nodes immediately will see each other. In our case, this means connecting the 10Gbps connection between the nodes. This connection is used by DRBD for syncing and by Heartbeat for sending the heartbeats. This prevents entering the ‘split brain’ state and immediately starts syncing.

Note: If you want to replace everything including the disks, you’ll have to manually join the cluster with the new secondary node and then sync the data. In this case it doesn’t matter whether the networking cables are connected or not, since this new node won’t be able to become primary anyway.

The procedure
Back to our case: replacing all hardware except for the disks. We managed to successfully replace both nodes using this procedure:

shut down the secondary node
replace the hardware, install the existing disks and 10Gbps card
boot the node with at least the 10Gbps connection active
the node should sync with the primary
when syncing finishes, redundancy is restored
make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
when all is fine, check if DRBD and Heartbeat are running without errors on both nodes
then stop heartbeat on the primary node. A fail-over to the new secondary node will occur
if all went well, you can now safely shut down the old primary
replace the hardware, install the existing disks and 10Gbps card
boot the node with at least the 10Gbps connection active
the node should sync with the primary
when syncing finishes, redundancy is restored
make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
the old primary is now secondary
If you want, initialize another fail-over (In our case we didn’t fail-over again, since both nodes are equal powerful)

Congratulations: the cluster is redundant again with the new hardware!

Using the above procedure, we replaced both nodes of our DRBD storage cluster without any downtime.

	Shashi on Setting locales correctly on M…
	Sayling Low on Alt-key in OSX-Terminal
	Roger on Setting locales correctly on M…
	belwardblog on HOWTO discover the ip address…
	Guilherme Caeiro Dia… on Setting locales correctly on M…
	Terminal Show Multip… on Setting locales correctly on M…
	bodhix on RRDtool: moving data between 3…
	vasu on One-liner: restore compressed…
	Angel on HOWTO quickly add a route in M…
	Kar.ma on HOWTO connect to hosts on a re…
	Home \| MacarioJames.… on Sed inline editing different o…
	Mac i problemy z loc… on Setting locales correctly on M…
	NearlyNormal on HOWTO enable color for PHP and…
	Yong on Connecting two Open vSwitches…
	Aysad Kozanoglu on Creating a multi hop SSH tunne…

Replacing both DRBD nodes while keeping the storage online

No Comments

What do you think? Cancel reply

About me

Blog Stats

Tag Cloud

Top posts

Recent comments

Archives

Tweets @remibergsma

Follow Blog via Email

Replacing both DRBD nodes while keeping the storage online

Rate this:

Share this:

Related

No Comments

What do you think? Cancel reply

About me

Blog Stats

Tag Cloud

Top posts

Recent comments

Archives

Tweets @remibergsma

Follow Blog via Email