Replace a failed YB-Master

You can replace a failed YB-Master server in a YugabyteDB cluster.

Examples included in this document use the following scenario:

  • The cluster includes three yb-master servers: M1, M2, M3.
  • YB-Master server M1 failed and needs to be replaced.
  • A new YB-Master server (M4) will replace M1.
  • The default master RPC port is 7100

If the YB-Master to be replaced is already dead (for example, the VM was terminated), you would need to perform the REMOVE step first, and then the ADD step.

  1. Start the replacement YB-Master server in standby mode by setting the --master_addresses flag to an empty string (""), as follows:

    ./bin/yb-master --master_addresses="" --fs_data_dirs=<your_data_directories> [any other flags you would typically pass to this master process]
    

    When the --master_addresses is "", this YB-Master server starts without joining any existing master quorum. The node will be added to the master quorum in a later step.

  2. Add the replacement YB-Master server into the existing cluster by running the yb-admin change_master_config ADD_SERVER command, as follows:

    ./bin/yb-admin -master_addresses M1:7100,M2:7100,M3:7100 change_master_config ADD_SERVER M4 7100
    
  3. Remove the failed YB-Master server from the cluster by using the yb-admin change_master_config REMOVE_SERVER command, as follows:

    ./yb-admin -master_addresses M1:7100,M2:7100,M3:7100,M4:7100 change_master_config REMOVE_SERVER M1 7100
    

    Make sure to specify all YB-Master addresses, including M4, to make sure that if M4 becomes the leader, then yb-admin can find it.

  4. Validate cluster by checking that your set of masters is now M2, M3 and M4, as follows:

    ./yb-admin -master_addresses M2:7100,M3:7100,M4:7100 list_all_masters
    

Until #1542 is implemented, the YB-TServer by default can only be aware of the YB-Master servers that are encoded in the --tserver_master_addrs flag with which they are started. If any of those YB-Master servers are still part of the active quorum, then they can propagate the new master quorum via heartbeats. If none of the current YB-Master servers are present in the YB-TServer flag, then the YB-TServer cannot join the cluster. Therefore, it is important to update --tserver_master_addrs on every YB-TServer to the new set of master addresses as M2:7100,M3:7100,M4:7100.