在Ubuntu系統上集成Hadoop主要包括以下幾個步驟:
sudo apt update && sudo apt upgrade
sudo apt install openjdk-8-jdk
安裝完成后,驗證Java是否已正確安裝:
java -version
sudo apt install openssh-server
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
tar -zxvf hadoop-3.3.4.tar.gz -C /opt/
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
然后,使用以下命令使環境變量生效:
source ~/.bashrc
sudo vi /opt/hadoop/etc/hadoop/hadoop-env.sh
添加以下行(如果不存在):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
sudo vi /opt/hadoop/etc/hadoop/core-site.xml
添加以下內容:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/data</value>
</property>
</configuration>
sudo vi /opt/hadoop/etc/hadoop/hdfs-site.xml
添加以下內容:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/data/hdfs/datanode</value>
</property>
</configuration>
sudo vi /opt/hadoop/etc/hadoop/mapred-site.xml
添加以下內容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
sudo vi /opt/hadoop/etc/hadoop/yarn-site.xml
添加以下內容:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
hdfs namenode -format
sudo -u hdfs hadoop-daemon.sh start namenode
sudo -u hdfs hadoop-daemon.sh start datanode
sudo -u yarn hadoop-daemon.sh start resourcemanager
sudo -u yarn hadoop-daemon.sh start nodemanager
http://localhost:50070
如果成功訪問,則表示Hadoop安裝成功。
http://localhost:8088
以上步驟涵蓋了在Ubuntu系統上安裝和配置Hadoop的基本流程。請注意,根據具體的Hadoop版本和個人需求,某些步驟可能需要進行相應的調整。