Prerequisites for HDFS High Availability (HA) on Debian
Before configuring HDFS HA, ensure the following prerequisites are met:
sudo apt install openjdk-11-jdk
).~/.bashrc
(e.g., export HADOOP_HOME=/usr/local/hadoop
, export PATH=$PATH:$HADOOP_HOME/bin
).namenode1
, namenode2
, journalnode1
) and update /etc/hosts
with IP-hostname mappings for all nodes.ssh-keygen -t rsa
and copy to other nodes using ssh-copy-id
).Step 1: Configure JournalNode Nodes
JournalNodes store edit logs (transaction records for HDFS metadata) and ensure consistency between Active and Standby NameNodes.
sudo mkdir -p /usr/local/hadoop/journalnode/data
sudo chown -R $USER:$USER /usr/local/hadoop/journalnode
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
on all nodes:<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/hadoop/journalnode/data</value>
</property>
hdfs namenode -formatJournalNode
hadoop-daemon.sh start journalnode
Verify status with jps
(should show JournalNode
processes).Step 2: Configure NameNode High Availability
This step enables two NameNodes (Active/Standby) to share metadata via JournalNodes.
$HADOOP_HOME/etc/hadoop/core-site.xml
to define the HDFS namespace and ZooKeeper address:<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value> <!-- Logical name for the HDFS cluster -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zk1:2181,zk2:2181,zk3:2181</value> <!-- ZooKeeper ensemble addresses -->
</property>
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
to configure NameNode roles, RPC/HTTP addresses, shared edits, and failover:<property>
<name>dfs.nameservices</name>
<value>mycluster</value> <!-- Must match fs.defaultFS -->
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value> <!-- Names of NameNodes -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>namenode1:8020</value> <!-- RPC address for nn1 -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>namenode2:8020</value> <!-- RPC address for nn2 -->
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>namenode1:9870</value> <!-- HTTP address for nn1 -->
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>namenode2:9870</value> <!-- HTTP address for nn2 -->
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster</value> <!-- JournalNode quorum for shared edits -->
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <!-- Client-side failover proxy -->
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value> <!-- Method to prevent split-brain (e.g., kill standby NN's process) -->
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/$USER/.ssh/id_rsa</value> <!-- Path to private key for SSH fencing -->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value> <!-- Enable automatic failover -->
</property>
nn1
):hdfs namenode -format
nn1
:hadoop-daemon.sh start namenode
nn1
to nn2
):hdfs namenode -bootstrapStandby
nn2
:hadoop-daemon.sh start namenode
Verify both NameNodes are running with hdfs haadmin -getServiceState nn1
(should return “active”) and hdfs haadmin -getServiceState nn2
(should return “standby”).Step 3: Start HDFS Services
Start all HDFS components in the correct order:
start-dfs.sh # Starts JournalNodes, NameNodes, and DataNodes
Check cluster status with:
hdfs dfsadmin -report # Lists DataNodes and their health
hdfs haadmin -getAllServiceStates # Shows NameNode states (active/standby)
Access NameNode Web UIs (e.g., http://namenode1:9870
, http://namenode2:9870
) to confirm HA status.
Step 4: Test Automatic Failover
Simulate a failure to verify automatic failover works:
nn1
, find the NameNode PID (jps | grep NameNode
) and kill it:kill -9 <NameNode_PID>
nn2
, check its state:hdfs haadmin -getServiceState nn2 # Should return "active"
nn1
and verify it becomes standby:hadoop-daemon.sh start namenode
hdfs haadmin -getServiceState nn1 # Should return "standby"
hdfs dfs -put /local/file.txt /test/
hdfs dfs -get /test/file.txt /local/ # Should succeed after failover
Step 5: Monitor and Maintain
Set up monitoring to detect issues early:
$HADOOP_HOME/logs/hadoop-*-namenode-*.log
) for errors.