溫馨提示×

Debian上Hadoop高可用性如何實現

小樊
52
2025-09-19 20:10:49
欄目: 智能運維

Prerequisites for Hadoop HA on Debian
Before configuring Hadoop High Availability (HA), ensure the following prerequisites are met:

  • Debian Nodes: At least 5 nodes (1 NameNode active/passive, 3 JournalNodes, 1 ResourceManager active/passive, and multiple DataNodes/NodeManagers).
  • Java Environment: Install OpenJDK 8/11 (e.g., sudo apt install openjdk-11-jdk).
  • Hadoop Installation: Download and extract Hadoop (e.g., version 3.3.6) to a consistent directory (e.g., /opt/hadoop) on all nodes.
  • Hostname Configuration: Assign unique hostnames (e.g., namenode1, journalnode1) and update /etc/hosts with IP-hostname mappings (e.g., 192.168.1.10 namenode1).
  • SSH Key Setup: Generate SSH keys on all nodes (ssh-keygen -t rsa) and distribute public keys (ssh-copy-id user@node-ip) to enable passwordless login.

1. Configure ZooKeeper Cluster (Critical for Coordination)
ZooKeeper ensures consistent failover by managing locks and leader election for NameNode/ResourceManager.

  • Install ZooKeeper: On each ZooKeeper node (3+ recommended), install via sudo apt install zookeeper zookeeperd.
  • Configure zoo.cfg: Edit /etc/zookeeper/conf/zoo.cfg on all nodes to include server entries (replace 1,2,3 with node IDs and IPs):
    server.1=192.168.1.10:2888:3888
    server.2=192.168.1.11:2888:3888
    server.3=192.168.1.12:2888:3888
    
  • Start ZooKeeper: Run sudo systemctl start zookeeper on all nodes and verify status (sudo systemctl status zookeeper).
  • Initiate HA State in ZooKeeper: On the active NameNode, run hdfs zkfc -formatZK to create a znode for HA coordination.

2. Configure HDFS High Availability (NameNode HA)
HDFS HA uses an Active/Passive NameNode pair with Quorum Journal Manager (QJM) for shared edits.

  • Modify core-site.xml: Add default file system and ZooKeeper quorum:
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181</value>
    </property>
    
  • Modify hdfs-site.xml: Define NameNode roles, RPC addresses, shared edits, and failover settings:
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>192.168.1.10:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>192.168.1.11:8020</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://192.168.1.13:8485;192.168.1.14:8485;192.168.1.15:8485/mycluster</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    
  • Start JournalNodes: On all JournalNode nodes, run hadoop-daemon.sh start journalnode to store edit logs.
  • Bootstrap Standby NameNode: On the standby NameNode (e.g., namenode2), run hdfs namenode -bootstrapStandby to sync metadata from the active NameNode.
  • Format and Start NameNodes: Format the active NameNode (hdfs namenode -format), then start HDFS (start-dfs.sh). Verify status with hdfs haadmin -getServiceState nn1 (should return “active”) and hdfs haadmin -getServiceState nn2 (should return “standby”).

3. Configure YARN High Availability (ResourceManager HA)
YARN HA enables failover for ResourceManager, which schedules resources for applications.

  • Modify yarn-site.xml: Enable ResourceManager HA and define roles:
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>192.168.1.10</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>192.168.1.11</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181</value>
    </property>
    
  • Start ResourceManager: On both ResourceManager nodes, run yarn-daemon.sh start resourcemanager. Verify status with yarn rmadmin -getServiceState rm1 (active/passive).

4. Configure Data Redundancy and Backup
Ensure data availability via replication and snapshots:

  • Set Replication Factor: In hdfs-site.xml, configure dfs.replication (default is 3) to store multiple copies of data blocks across nodes.
  • Enable Snapshots: Add dfs.namenode.snapshot.enabled=true to hdfs-site.xml to create point-in-time snapshots of HDFS directories.
  • Regular Backups: Use hdfs dfsadmin -saveNamespace to take periodic namespace backups and copy critical data to external storage.

5. Set Up Monitoring and Alerting
Proactively monitor cluster health to detect failures early:

  • Built-in Tools: Use Hadoop’s NameNode/ResourceManager web UI (e.g., http://namenode1:9870) to track metrics like node status, disk usage, and job progress.
  • Third-Party Tools: Integrate with Prometheus (for metrics collection) + Grafana (for visualization) or Ambari (for cluster management) to set alerts for thresholds (e.g., node down, high CPU).

6. Validate High Availability
Test failover to ensure automatic recovery:

  • Simulate NameNode Failure: Stop the active NameNode (hadoop-daemon.sh stop namenode on namenode1) and verify the standby becomes active (hdfs haadmin -getServiceState nn2 should return “active”).
  • Simulate ResourceManager Failure: Stop the active ResourceManager (yarn-daemon.sh stop resourcemanager on resourcemanager1) and check the standby takes over (yarn rmadmin -getServiceState rm2 should return “active”).
  • Check Data Availability: Create a test file (hdfs dfs -put /local/file /test) and access it after failover to confirm data integrity.

0
亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女