Prerequisites for Debian Hadoop HA
Before configuring high availability (HA) for Hadoop on Debian, ensure the following prerequisites are met:
JAVA_HOME
set in /etc/environment
.HADOOP_HOME
, PATH
) properly configured.ssh-keygen
and ssh-copy-id
) to enable seamless communication for ZKFC and other services.1. Install and Configure ZooKeeper Cluster
ZooKeeper is critical for distributed coordination in Hadoop HA, providing leader election and cluster state management. For fault tolerance, deploy an odd number of ZooKeeper nodes (3 or 5).
sudo apt-get update && sudo apt-get install -y zookeeper zookeeperd
/etc/zookeeper/conf/zoo.cfg
on all nodes to include cluster members:server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
Create a myid
file in /var/lib/zookeeper/
on each node with a unique ID (e.g., 1
for zookeeper1
).sudo systemctl start zookeeper && sudo systemctl enable zookeeper
Verify ZooKeeper status with echo stat | nc zookeeper1 2181
(replace with your node name).
2. Configure HDFS High Availability (NameNode HA)
HDFS HA eliminates the single point of failure (SPOF) of the NameNode by using Active/Standby nodes synchronized via JournalNodes.
core-site.xml
: Add HDFS cluster and ZooKeeper configurations:<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value> <!-- Logical cluster name -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value> <!-- ZooKeeper quorum -->
</property>
hdfs-site.xml
: Define NameNode roles, shared storage, and failover settings:<property>
<name>dfs.nameservices</name>
<value>mycluster</value> <!-- Must match fs.defaultFS -->
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value> <!-- Active and Standby NameNode IDs -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1:8020</value> <!-- RPC address for nn1 -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2:8020</value> <!-- RPC address for nn2 -->
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster</value> <!-- JournalNode quorum for shared edits -->
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/lib/hadoop/hdfs/journalnode</value> <!-- Local edit log directory on JournalNodes -->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value> <!-- Enable automatic failover -->
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <!-- Client-side proxy for failover -->
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value> <!-- Prevent split-brain with SSH fencing -->
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value> <!-- Path to SSH private key for fencing -->
</property>
namenode1
), format the NameNode for the first time:hdfs namenode -format
hadoop-daemon.sh start journalnode
hdfs namenode -bootstrapStandby # Sync metadata to Standby NameNode
start-dfs.sh # Start all HDFS daemons (NameNodes, DataNodes, JournalNodes)
Verify NameNode status with hdfs haadmin -report
(should show one Active and one Standby NameNode).
3. Configure YARN High Availability (ResourceManager HA)
YARN HA ensures the ResourceManager (RM) remains available by running multiple RMs in Active/Standby mode, coordinated by ZooKeeper.
yarn-site.xml
: Add ResourceManager HA configurations:<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value> <!-- Enable HA -->
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value> <!-- Logical cluster name -->
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value> <!-- RM IDs -->
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value> <!-- ZooKeeper quorum -->
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value> <!-- Set to rm1 for the first RM, rm2 for the second -->
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value> <!-- Shuffle service for MapReduce -->
</property>
start-yarn.sh # Start all YARN daemons (ResourceManagers, NodeManagers)
Verify ResourceManager status with yarn node -list
(should show the Active RM handling requests).
4. Enable Automatic Failover with ZKFC
The ZK Failover Controller (ZKFC) monitors NameNode health and triggers automatic failover if the Active NameNode fails.
hadoop-daemon.sh start zkfc
ZKFC uses ZooKeeper to manage the Active/Standby state:
5. Validate HA Configuration
Test the HA setup to ensure it works as expected:
hdfs dfsadmin -report
to verify NameNode status (Active/Standby) and DataNode health.hadoop-daemon.sh stop namenode
on namenode1
) and check if the Standby promotes to Active (hdfs haadmin -report
).hdfs dfs -put /local/file /hdfs/path
) and verify it’s accessible after failover.yarn node -list
to ensure the ResourceManager is serving requests.6. Monitor and Maintain the Cluster
Proactive monitoring is essential for long-term HA reliability: