溫馨提示×

Debian如何實現HDFS高可用

小樊
45
2025-10-06 18:19:35
欄目: 智能運維

Prerequisites for HDFS High Availability (HA) on Debian
Before configuring HDFS HA, ensure the following prerequisites are met:

  • Debian Nodes: At least 5 nodes (2 NameNodes, 3 JournalNodes, and multiple DataNodes) with identical Debian versions (e.g., Debian 11/12).
  • Java Environment: Install OpenJDK 11 or 17 on all nodes (sudo apt install openjdk-11-jdk).
  • Hadoop Installation: Download and extract the same Hadoop version (e.g., 3.3.6) on all nodes. Configure basic environment variables in ~/.bashrc (e.g., export HADOOP_HOME=/usr/local/hadoop, export PATH=$PATH:$HADOOP_HOME/bin).
  • Hostname & Hosts File: Set unique hostnames (e.g., namenode1, namenode2, journalnode1) and update /etc/hosts with IP-hostname mappings for all nodes.
  • SSH Configuration: Enable passwordless SSH between all nodes (generate keys with ssh-keygen -t rsa and copy to other nodes using ssh-copy-id).
  • ZooKeeper Cluster: Deploy a 3-node ZooKeeper ensemble (critical for HA coordination; follow standard ZooKeeper setup steps on separate nodes).

Step 1: Configure JournalNode Nodes
JournalNodes store edit logs (transaction records for HDFS metadata) and ensure consistency between Active and Standby NameNodes.

  • Create a dedicated directory for JournalNode data:
    sudo mkdir -p /usr/local/hadoop/journalnode/data
    sudo chown -R $USER:$USER /usr/local/hadoop/journalnode
    
  • Add JournalNode configuration to $HADOOP_HOME/etc/hadoop/hdfs-site.xml on all nodes:
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/usr/local/hadoop/journalnode/data</value>
    </property>
    
  • Format JournalNodes (run once on each JournalNode):
    hdfs namenode -formatJournalNode
    
  • Start JournalNode service on all JournalNodes:
    hadoop-daemon.sh start journalnode
    
    Verify status with jps (should show JournalNode processes).

Step 2: Configure NameNode High Availability
This step enables two NameNodes (Active/Standby) to share metadata via JournalNodes.

  • Edit $HADOOP_HOME/etc/hadoop/core-site.xml to define the HDFS namespace and ZooKeeper address:
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value> <!-- Logical name for the HDFS cluster -->
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>zk1:2181,zk2:2181,zk3:2181</value> <!-- ZooKeeper ensemble addresses -->
    </property>
    
  • Edit $HADOOP_HOME/etc/hadoop/hdfs-site.xml to configure NameNode roles, RPC/HTTP addresses, shared edits, and failover:
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value> <!-- Must match fs.defaultFS -->
    </property>
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value> <!-- Names of NameNodes -->
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>namenode1:8020</value> <!-- RPC address for nn1 -->
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>namenode2:8020</value> <!-- RPC address for nn2 -->
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>namenode1:9870</value> <!-- HTTP address for nn1 -->
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>namenode2:9870</value> <!-- HTTP address for nn2 -->
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster</value> <!-- JournalNode quorum for shared edits -->
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <!-- Client-side failover proxy -->
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value> <!-- Method to prevent split-brain (e.g., kill standby NN's process) -->
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/$USER/.ssh/id_rsa</value> <!-- Path to private key for SSH fencing -->
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value> <!-- Enable automatic failover -->
    </property>
    
  • Format the Active NameNode (run only once on nn1):
    hdfs namenode -format
    
  • Start the NameNode on nn1:
    hadoop-daemon.sh start namenode
    
  • Bootstrap the Standby NameNode (copy metadata from nn1 to nn2):
    hdfs namenode -bootstrapStandby
    
  • Start the NameNode on nn2:
    hadoop-daemon.sh start namenode
    
    Verify both NameNodes are running with hdfs haadmin -getServiceState nn1 (should return “active”) and hdfs haadmin -getServiceState nn2 (should return “standby”).

Step 3: Start HDFS Services
Start all HDFS components in the correct order:

start-dfs.sh  # Starts JournalNodes, NameNodes, and DataNodes

Check cluster status with:

hdfs dfsadmin -report  # Lists DataNodes and their health
hdfs haadmin -getAllServiceStates  # Shows NameNode states (active/standby)

Access NameNode Web UIs (e.g., http://namenode1:9870, http://namenode2:9870) to confirm HA status.

Step 4: Test Automatic Failover
Simulate a failure to verify automatic failover works:

  1. Kill the Active NameNode Process:
    On nn1, find the NameNode PID (jps | grep NameNode) and kill it:
    kill -9 <NameNode_PID>
    
  2. Verify Standby Takes Over:
    On nn2, check its state:
    hdfs haadmin -getServiceState nn2  # Should return "active"
    
  3. Restore the Original Active NameNode:
    Restart the NameNode on nn1 and verify it becomes standby:
    hadoop-daemon.sh start namenode
    hdfs haadmin -getServiceState nn1  # Should return "standby"
    
  4. Check Data Availability:
    Create a test file in HDFS and verify it persists after failover:
    hdfs dfs -put /local/file.txt /test/
    hdfs dfs -get /test/file.txt /local/  # Should succeed after failover
    

Step 5: Monitor and Maintain
Set up monitoring to detect issues early:

  • Metrics: Use Hadoop’s built-in metrics (via JMX) or tools like Prometheus + Grafana to track NameNode memory, DataNode disk usage, and replication status.
  • Logs: Regularly check NameNode logs ($HADOOP_HOME/logs/hadoop-*-namenode-*.log) for errors.
  • Backups: Backup critical data (e.g., NameNode metadata, ZooKeeper data) to an offsite location.
  • Updates: Keep Hadoop and ZooKeeper versions up-to-date to patch security vulnerabilities.

0
亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女