溫馨提示×

Debian HDFS配置步驟全解析

debian

小樊

47

2025-09-24 00:04:28

欄目: 智能運維

Prerequisites
Before configuring HDFS on Debian, ensure the following prerequisites are met:

A Debian system (physical/virtual) with root or sudo access.
Java Development Kit (JDK) 8 or 11 installed (Hadoop 3.x requires JDK 8+). Verify with java -version.
Network connectivity between all cluster nodes (static IPs recommended).
Synchronized time across nodes (install ntp or chrony for time synchronization).
SSH configured for passwordless login between nodes (generate SSH keys and copy public keys to all nodes using ssh-keygen and ssh-copy-id).

Step 1: Install Hadoop
Download the latest stable Hadoop release from the Apache Hadoop website. Extract it to a dedicated directory (e.g., /usr/local):

wget https://downloads.apache.org/hadoop/core/hadoop-3.3.6/hadoop-3.3.6.tar.gz
sudo tar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop  # Rename for simplicity

Set ownership of the Hadoop directory to the current user (replace youruser with your username):

sudo chown -R youruser:youruser /usr/local/hadoop

Step 2: Configure Environment Variables
Edit the ~/.bashrc file (or /etc/profile for system-wide configuration) to add Hadoop environment variables:

nano ~/.bashrc

Add the following lines (adjust paths as needed):

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64  # Update to your JDK path

Apply changes by running:

source ~/.bashrc

Verify Hadoop commands are accessible:

hadoop version

Step 3: Configure Core HDFS Files
Navigate to the Hadoop configuration directory ($HADOOP_HOME/etc/hadoop) and edit the following files:

a. core-site.xml

This file defines HDFS defaults (e.g., NameNode address). Add:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenode:9000</value>  <!-- Replace 'namenode' with your NameNode's hostname/IP -->
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/data/hadoop/tmp</value>  <!-- Temporary directory for Hadoop (adjust path) -->
  </property>
</configuration>

b. hdfs-site.xml

This file configures HDFS-specific parameters (e.g., replication factor, data directories). Add:

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>  <!-- Replication factor (adjust based on cluster size) -->
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/data/hadoop/namenode</value>  <!-- Persistent storage for NameNode metadata -->
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/data/hadoop/datanode</value>  <!-- Data storage directory for DataNodes -->
  </property>
</configuration>

c. mapred-site.xml

Create this file if it doesn’t exist (copy from mapred-site.xml.template):

cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

Edit to use YARN as the MapReduce framework:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

d. yarn-site.xml

Configure YARN (Yet Another Resource Negotiator) for resource management:

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>resourcemanager</value>  <!-- Replace with ResourceManager's hostname/IP -->
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>  <!-- Required for MapReduce shuffle service -->
  </property>
</configuration>

Step 4: Format NameNode
The NameNode stores metadata about the HDFS filesystem. Formatting initializes this metadata (only required on the first startup):

hdfs namenode -format

This command creates the directories specified in dfs.namenode.name.dir and writes initial metadata.

Step 5: Start HDFS Services
Start the HDFS services (NameNode and DataNode) using the start-dfs.sh script (run from the NameNode):

$HADOOP_HOME/sbin/start-dfs.sh

Check the status of running services:

jps  # Should show NameNode, DataNode, and SecondaryNameNode (if configured)

Verify HDFS is running by accessing the Web UI (default: http://namenode:9870).

Step 6: Verify HDFS Configuration
Run basic HDFS commands to validate functionality:

Create a test directory:
```
hdfs dfs -mkdir -p /user/youruser/test
```

Upload a local file to HDFS:

hdfs dfs -put /path/to/localfile.txt /user/youruser/test

List files in the test directory:
```
hdfs dfs -ls /user/youruser/test
```

View file content:

hdfs dfs -cat /user/youruser/test/localfile.txt

Optional: Configure High Availability (HA)
For production clusters, set up HDFS HA to avoid single points of failure. Key steps include:

Configure NameNode HA: Modify hdfs-site.xml to define two NameNodes (e.g., nn1, nn2), shared edits directory (via JournalNode), and failover proxy provider.
Set Up JournalNode: Start 3+ JournalNode processes (on separate nodes) to store edit logs.
Bootstrap Standby NameNode: Use hdfs namenode -bootstrapStandby to sync metadata from the active NameNode.
Start HA Cluster: Use start-dfs.sh to start all NameNodes and DataNodes.
Verify Failover: Simulate a NameNode failure (stop the active NameNode) and confirm the standby takes over automatically.

0 贊

0 踩

最新問答

相關問答

相關標簽

產品服務

地區劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網站二維碼

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女