一、環境準備
192.168.1.100
、Slave1192.168.1.101
、Slave2192.168.1.102
);編輯/etc/hosts
文件,將IP與主機名映射(如192.168.1.100 master
、192.168.1.101 slave1
),確保節點間可通過主機名通信。sudo yum install -y java-1.8.0-openjdk-devel
(CentOS)或sudo apt install -y openjdk-11-jdk
(Ubuntu);驗證安裝:java -version
,需顯示Java版本信息。二、Hadoop安裝與配置
/usr/local/hadoop
):wget https://downloads.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
;tar -xzvf hadoop-3.3.5.tar.gz -C /usr/local/
;修改權限:chown -R hadoop:hadoop /usr/local/hadoop
(建議使用專用用戶hadoop
運行)。~/.bashrc
(或/etc/profile
),添加以下內容:export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 # 根據實際Java路徑調整
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
使配置生效:source ~/.bashrc
;驗證:hadoop version
,需顯示Hadoop版本信息。$HADOOP_HOME/etc/hadoop
目錄,修改以下文件:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value> <!-- Master節點的主機名 -->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value> <!-- 臨時目錄 -->
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- 副本數 -->
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/namenode</value> <!-- NameNode數據目錄 -->
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value> <!-- DataNode數據目錄 -->
</property>
</configuration>
mapred-site.xml.template
重命名為mapred-site.xml
):配置MapReduce運行框架為YARN:<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value> <!-- ResourceManager節點的主機名 -->
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
scp core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slave1:$HADOOP_HOME/etc/hadoop/
;scp core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml slave2:$HADOOP_HOME/etc/hadoop/
。ssh-keygen -t rsa
(直接按回車鍵,默認路徑);將公鑰復制到所有Slave節點:ssh-copy-id hadoop@slave1
;ssh-copy-id hadoop@slave2
;測試免密登錄:ssh slave1
、ssh slave2
,無需輸入密碼即可登錄。三、啟動集群
hdfs namenode -format
;此命令會創建NameNode的元數據目錄。start-dfs.sh
;啟動后,NameNode(NameNode
進程)、DataNode(DataNode
進程)會在Master和Slave節點上自動啟動。start-yarn.sh
;ResourceManager(ResourceManager
進程)會在Master節點啟動,NodeManager(NodeManager
進程)會在所有Slave節點啟動。四、驗證集群
jps
,Master節點應顯示NameNode
、ResourceManager
,Slave節點應顯示DataNode
、NodeManager
。hdfs dfsadmin -report
,查看DataNode列表和存儲容量。yarn node -list
,查看NodeManager列表和資源使用情況。http://master:9870
(HDFS Web界面),http://master:8088
(YARN Web界面),查看集群運行狀態。