溫馨提示×

Debian HDFS配置步驟全解析

小樊
47
2025-09-24 00:04:28
欄目: 智能運維

Prerequisites
Before configuring HDFS on Debian, ensure the following prerequisites are met:

  • A Debian system (physical/virtual) with root or sudo access.
  • Java Development Kit (JDK) 8 or 11 installed (Hadoop 3.x requires JDK 8+). Verify with java -version.
  • Network connectivity between all cluster nodes (static IPs recommended).
  • Synchronized time across nodes (install ntp or chrony for time synchronization).
  • SSH configured for passwordless login between nodes (generate SSH keys and copy public keys to all nodes using ssh-keygen and ssh-copy-id).

Step 1: Install Hadoop
Download the latest stable Hadoop release from the Apache Hadoop website. Extract it to a dedicated directory (e.g., /usr/local):

wget https://downloads.apache.org/hadoop/core/hadoop-3.3.6/hadoop-3.3.6.tar.gz
sudo tar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop  # Rename for simplicity

Set ownership of the Hadoop directory to the current user (replace youruser with your username):

sudo chown -R youruser:youruser /usr/local/hadoop

Step 2: Configure Environment Variables
Edit the ~/.bashrc file (or /etc/profile for system-wide configuration) to add Hadoop environment variables:

nano ~/.bashrc

Add the following lines (adjust paths as needed):

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64  # Update to your JDK path

Apply changes by running:

source ~/.bashrc

Verify Hadoop commands are accessible:

hadoop version

Step 3: Configure Core HDFS Files
Navigate to the Hadoop configuration directory ($HADOOP_HOME/etc/hadoop) and edit the following files:

a. core-site.xml

This file defines HDFS defaults (e.g., NameNode address). Add:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenode:9000</value>  <!-- Replace 'namenode' with your NameNode's hostname/IP -->
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/data/hadoop/tmp</value>  <!-- Temporary directory for Hadoop (adjust path) -->
  </property>
</configuration>

b. hdfs-site.xml

This file configures HDFS-specific parameters (e.g., replication factor, data directories). Add:

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>  <!-- Replication factor (adjust based on cluster size) -->
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/data/hadoop/namenode</value>  <!-- Persistent storage for NameNode metadata -->
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/data/hadoop/datanode</value>  <!-- Data storage directory for DataNodes -->
  </property>
</configuration>

c. mapred-site.xml

Create this file if it doesn’t exist (copy from mapred-site.xml.template):

cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

Edit to use YARN as the MapReduce framework:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

d. yarn-site.xml

Configure YARN (Yet Another Resource Negotiator) for resource management:

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>resourcemanager</value>  <!-- Replace with ResourceManager's hostname/IP -->
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>  <!-- Required for MapReduce shuffle service -->
  </property>
</configuration>

Step 4: Format NameNode
The NameNode stores metadata about the HDFS filesystem. Formatting initializes this metadata (only required on the first startup):

hdfs namenode -format

This command creates the directories specified in dfs.namenode.name.dir and writes initial metadata.

Step 5: Start HDFS Services
Start the HDFS services (NameNode and DataNode) using the start-dfs.sh script (run from the NameNode):

$HADOOP_HOME/sbin/start-dfs.sh

Check the status of running services:

jps  # Should show NameNode, DataNode, and SecondaryNameNode (if configured)

Verify HDFS is running by accessing the Web UI (default: http://namenode:9870).

Step 6: Verify HDFS Configuration
Run basic HDFS commands to validate functionality:

  • Create a test directory:
    hdfs dfs -mkdir -p /user/youruser/test
    
  • Upload a local file to HDFS:
    hdfs dfs -put /path/to/localfile.txt /user/youruser/test
    
  • List files in the test directory:
    hdfs dfs -ls /user/youruser/test
    
  • View file content:
    hdfs dfs -cat /user/youruser/test/localfile.txt
    

Optional: Configure High Availability (HA)
For production clusters, set up HDFS HA to avoid single points of failure. Key steps include:

  1. Configure NameNode HA: Modify hdfs-site.xml to define two NameNodes (e.g., nn1, nn2), shared edits directory (via JournalNode), and failover proxy provider.
  2. Set Up JournalNode: Start 3+ JournalNode processes (on separate nodes) to store edit logs.
  3. Bootstrap Standby NameNode: Use hdfs namenode -bootstrapStandby to sync metadata from the active NameNode.
  4. Start HA Cluster: Use start-dfs.sh to start all NameNodes and DataNodes.
  5. Verify Failover: Simulate a NameNode failure (stop the active NameNode) and confirm the standby takes over automatically.

0
亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女