CentOS環境下HDFS與YARN集成實踐指南
HDFS(Hadoop分布式文件系統)負責分布式存儲,YARN(Yet Another Resource Negotiator)負責資源管理與任務調度,二者集成是構建Hadoop大數據生態的核心步驟。以下是在CentOS系統上的詳細集成實踐流程:
集成前需完成以下基礎配置,確保集群環境一致性:
systemctl stop firewalld && systemctl disable firewalld)以簡化端口測試;setenforce 0并修改/etc/selinux/config中SELINUX=disabled);yum install chrony -y && systemctl enable --now chronyd),避免節點間時間偏差。/etc/hosts文件,添加<IP地址> <主機名>映射(如192.168.1.100 namenode、192.168.1.101 datanode1),確保主機名解析正確。ssh-keygen -t rsa),并將公鑰(id_rsa.pub)復制到所有DataNode節點(ssh-copy-id datanode1),實現免密登錄。yum install java-1.8.0-openjdk-devel -y),并通過java -version驗證安裝(需顯示1.8.0版本)。/usr/local/目錄:wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
tar -xzvf hadoop-3.3.1.tar.gz -C /usr/local/
/etc/profile.d/hadoop.sh文件,添加以下內容:export HADOOP_HOME=/usr/local/hadoop-3.3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
執行source /etc/profile.d/hadoop.sh使配置生效。<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value> <!-- NameNode地址 -->
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> <!-- 副本數(生產環境建議≥3) -->
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop-3.3.1/data/namenode</value> <!-- NameNode數據目錄 -->
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop-3.3.1/data/datanode</value> <!-- DataNode數據目錄 -->
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value> <!-- ResourceManager所在節點 -->
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value> <!-- MapReduce Shuffle服務 -->
</property>
</configuration>
mapred-site.xml不存在,可復制模板生成(cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml),然后添加:<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> <!-- 綁定YARN作為資源調度框架 -->
</property>
</configuration>
hdfs namenode -format
start-dfs.sh腳本啟動HDFS服務,啟動后可通過jps命令查看進程:start-dfs.sh
jps # 應顯示NameNode、DataNode進程(若配置了SecondaryNameNode,也會顯示)
http://namenode:50070),查看集群狀態;hdfs dfs -mkdir -p /input
hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /input # 上傳示例文件
start-yarn.sh腳本啟動YARN服務,然后在所有DataNode節點啟動NodeManager(若已同步配置文件,可直接在ResourceManager節點執行yarn-daemon.sh start nodemanager):start-yarn.sh
http://namenode:8088),查看ResourceManager與NodeManager狀態;yarn node -list)。通過運行經典的WordCount程序,驗證HDFS與YARN的集成是否成功:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output
/input為HDFS中的輸入目錄(需提前創建并上傳文件);/output為輸出目錄(需確保不存在,YARN會自動創建);hdfs dfs -cat /output/part-r-00000查看結果。/output目錄下生成統計結果文件。core-site.xml、hdfs-site.xml、yarn-site.xml需保持一致;