這篇文章給大家分享的是有關Hadoop高可用搭建的示例分析的內容。小編覺得挺實用的,因此分享給大家做個參考,一起跟隨小編過來看看吧。
實驗環境
1.安裝jdk
2.修改hostname
3.修改hosts映射,并配置ssh免密登錄
4.設置時間同步
5.安裝hadoop至/opt/data目錄下
6.修改hadoop配置文件
7.zookeeper集群安裝配置
8.啟動集群
安裝步驟
master:192.168.10.131 slave1:192.168.10.129 slave2:192.168.10.130 操作系統ubuntu-16.04.3 hadoop-2.7.1 zookeeper-3.4.8
將jdk安裝到opt目錄下
tar -zvxf jdk-8u221-linux-x64.tar.gz
配置環境變量
vim etc/profile #jdk export JAVA_HOME=/opt/jdk1.8.0_221 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH source etc/profile
分別將三臺虛擬機的修改為master、slave1、slave2
vim /etc/hostname
修改hosts文件,每臺主機都需進行以下操作
vim /etc/hosts
192.168.10.131 master 192.168.10.129 slave1 192.168.10.130 slave2
配置ssh免密
首先需要關閉防火墻
1、查看端口開啟狀態 sudo ufw status 2、開啟某個端口,比如我開啟的是8381 sudo ufw allow 8381 3、開啟防火墻 sudo ufw enable 4、關閉防火墻 sudo ufw disable 5、重啟防火墻 sudo ufw reload 6、禁止外部某個端口比如80 sudo ufw delete allow 80 7、查看端口ip netstat -ltn
集群在啟動的過程中需要ssh遠程登錄到別的主機上,為了避免每次輸入對方主機的密碼,我們需要配置免密碼登錄(提示操作均按回車)
ssh-keygen -t rsa
將每臺主機的公匙復制給自己以及其他主機
ssh-copy-id -i ~/.ssh/id_rsa.pub root@master ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave2
安裝ntpdate服務
apt-get install ntpdate
修改ntp配置文件
vim /etc/ntp.conf # /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help driftfile /var/lib/ntp/ntp.drift # Enable this if you want statistics to be logged. #statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable # Specify one or more NTP servers. # Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board # on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for # more information. #pool 0.ubuntu.pool.ntp.org iburst #pool 1.ubuntu.pool.ntp.org iburst #pool 2.ubuntu.pool.ntp.org iburst #pool 3.ubuntu.pool.ntp.org iburst # Use Ubuntu's ntp server as a fallback. #pool ntp.ubuntu.com # Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for # details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions> # might also be helpful. # # Note that "restrict" applies to both servers and clients, so a configuration # that might be intended to block requests from certain clients could also end # up blocking replies from your own upstream servers. # By default, exchange time with everybody, but don't allow configuration. restrict -4 default kod notrap nomodify nopeer noquery limited restrict -6 default kod notrap nomodify nopeer noquery limited # Local users may interrogate the ntp server more closely. restrict 127.0.0.1 restrict ::1 # Needed for adding pool entries restrict source notrap nomodify noquery # Clients from this (example!) subnet have unlimited access, but only if # cryptographically authenticated. # 允許局域網內設備與這臺服務器進行同步時間.但是拒絕讓他們修改服務器上的時間 #restrict 192.168.10.131 mask 255.255.255.0 nomodify notrust #statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable # Specify one or more NTP servers. # Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board # on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for # more information. #pool 0.ubuntu.pool.ntp.org iburst #pool 1.ubuntu.pool.ntp.org iburst #pool 2.ubuntu.pool.ntp.org iburst #pool 3.ubuntu.pool.ntp.org iburst # Use Ubuntu's ntp server as a fallback. #pool ntp.ubuntu.com # Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for # details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions> # might also be helpful. # # Note that "restrict" applies to both servers and clients, so a configuration # that might be intended to block requests from certain clients could also end # up blocking replies from your own upstream servers. # By default, exchange time with everybody, but don't allow configuration. restrict -4 default kod notrap nomodify nopeer noquery limited restrict -6 default kod notrap nomodify nopeer noquery limited # Local users may interrogate the ntp server more closely. restrict 127.0.0.1 restrict ::1 # Needed for adding pool entries restrict source notrap nomodify noquery # Clients from this (example!) subnet have unlimited access, but only if # cryptographically authenticated. # 允許局域網內設備與這臺服務器進行同步時間.但是拒絕讓他們修改服務器上的時間 #restrict 192.168.10.131 mask 255.255.255.0 nomodify notrust restrict 192.168.10.129 mask 255.255.255.0 nomodify notrust restrict 192.168.10.130 mask 255.255.255.0 nomodify notrust # 允許上層時間服務器修改本機時間 #restrict times.aliyun.com nomodify #restrict ntp.aliyun.com nomodify #restrict cn.pool.ntp.org nomodify # 定義要同步的時間服務器 server 192.168.10.131 perfer #server times.aliyun.com iburst prefer # prefer表示為優先,表示本機優先同步該服務器時間 #server ntp.aliyun.com iburst #server cn.pool.ntp.org iburst #logfile /var/log/ntpstats/ntpd.log # 定義ntp日志目錄 #pidfile /var/run/ntp.pid # 定義pid路徑 # If you want to provide time to your local subnet, change the next line. # (Again, the address is an example only.) #broadcast 192.168.123.255 # If you want to listen to time broadcasts on your local subnet, de-comment the # next lines. Please do this only if you trust everybody on the network! #disable auth #broadcastclient #Changes recquired to use pps synchonisation as explained in documentation: #http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm#AEN3918 #server 127.127.8.1 mode 135 prefer # Meinberg GPS167 with PPS #fudge 127.127.8.1 time1 0.0042 # relative to PPS for my hardware #server 127.127.22.1 # ATOM(PPS) #fudge 127.127.22.1 flag3 1 # enable PPS API server 127.127.1.0 fudge 127.127.1.0 stratum 10
啟動ntpd服務,并查看ntp同步狀態
service ntpd start #啟動ntp服務 ntpq -p #觀察時間同步狀況 ntpstat #查看時間同步結果
重啟服務,與master主機時間同步
/etc/init.d/ntp restart ntpdate 192.168.10.131
在/opt目錄下新建Data目錄
cd /opt mkdir Data
下載并解壓hadoop至/opt/data目錄
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/ tar -zvxf hadoop-2.7.1.tar /opt/data
配置環境變量
# HADOOP export HADOOP_HOME=/opt/Data/hadoop-2.7.1 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH export HADOOP_YARN_HOME=$HADOOP_HOME
文件目錄hadoop-2.7.1/etc/hadoop
修改hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_221
修改core-site.xml
<configuration> <!-- 指定hdfs的nameservice為ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1/</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/Data/hadoop-2.7.1/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>slave1:2181,slave2:2181</value> </property> <!--修改core-site.xml中的ipc參數,防止出現連接journalnode服務ConnectException--> <property> <name>ipc.client.connect.max.retries</name> <value>100</value> <description>Indicates the number of retries a client will make to establish a server connection.</description> </property> </configuration>
修改hdfs-site.xml
<configuration> <!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>master:9820</value> </property> <!-- nn1的http通信地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>master:9870</value> </property> <!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>slave1:9820</value> </property> <!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>slave1:9870</value> </property> <!-- 指定NameNode的日志在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;slave1:8485;slave2:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/Data/hadoop-2.7.1/journal</value> </property> <!-- 開啟NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時需要ssh免登陸 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <!--配置namenode存放元數據的目錄,可以不配置,如果不配置則默認放到hadoop.tmp.dir下--> <property> <name>dfs.namenode.name.dir</name> <value>/opt/Data/hadoop-2.7.1/data/name</value> </property> <!--配置datanode存放元數據的目錄,可以不配置,如果不配置則默認放到hadoop.tmp.dir下--> <property> <name>dfs.datanode.data.dir</name> <value>/opt/Data/hadoop-2.7.1/data/data</value> </property> <!--配置復本數量--> <property> <name>dfs.replication</name> <value>2</value> </property> <!--設置用戶的操作權限,false表示關閉權限驗證,任何用戶都可以操作--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
修改mapred-site.xml
將文件名修改為mapred-site.xml cp mapred-queues.xml.template mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
修改yarn-site.xml
<configuration> <!-- 指定nodemanager啟動時加載server的方式為shuffle server --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--配置yarn的高可用--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--執行yarn集群的別名--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster1</value> </property> <!--指定兩個resourcemaneger的名稱--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!--配置rm1的主機--> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>master</value> </property> <!--配置rm2的主機--> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>slave1</value> </property> <!--配置2個resourcemanager節點--> <property> <name>yarn.resourcemanager.zk-address</name> <value>slave1:2181,slave2:2181</value> </property> <!--zookeeper集群地址--> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <description>Whether virtual memory limits will be enforced for containers</description> </property> <!--物理內存8G--> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>8</value> <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description> </property> </configuration>
修改slave
master slave1 slave2
下載并解壓zookeeper-3.4.8.tar.gz
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz tar -zvxf zookeeper-3.4.8.tar.gz /opt/Data
修改配置文件
#zookeeper export ZOOKEEPER_HOME=/opt/Data/zookeeper-3.4.8 export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin
進入conf目錄,復制zoo-sample.cfg為zoo.cfg
cp zoo-sample.cfg zoo.cfg
修改zoo.cfg
dataDir=/opt/Data/zookeeper-3.4.8/tmp //需要在zookeeper-3.4.8目錄下新建tmp目錄 server.1=master:2888:3888 server.2=slave1:2888:3888 server.3=slave2:2888:3888
在tmp目錄中創建myid文件
vim myid 1 //其他主機需要修改該編號 2,3
格式化master主機namenode。/etc/hadoop目錄下輸入該命令
hadoop namenode -format
將Data目錄拷貝到其他兩臺主機上
scp -r /opt/Data root@slave1:/opt scp -r /opt/Data root@slave2:/opt
啟動zookeeper,所有節點均執行
hadoop-daemon.sh start zkfc
格式化zookeeper,所有節點均執行
hdfs zkfc -formatZK
啟動journalnode,namenode備用節點相同(hadoop-2.7.1目錄下執行)
hadoop-daemon.sh start journalnode
啟動集群
start-all.sh
查看端口
netstat -ntlup #可以查看服務端占用的端口
查看進程jps
感謝各位的閱讀!關于“Hadoop高可用搭建的示例分析”這篇文章就分享到這里了,希望以上內容可以對大家有一定的幫助,讓大家可以學到更多知識,如果覺得文章不錯,可以把它分享出去讓更多的人看到吧!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。