标签:roc 必须 har control hadoop集群 bsp 变量 火墙 服务
架构图(HA模型没有SNN节点)

用vm规划了8台机器,用到了7台,SNN节点没用

|
|
NN
|
DN
|
SN
|
ZKFC
|
ZK
|
JNN
|
RM
|
NM
|
|
node1
|
*
|
|
|
*
|
|
|
|
|
|
node2
|
*
|
|
|
*
|
|
|
|
|
|
node3
|
|
|
|
|
|
|||
|
node4
|
|
|
|
*
|
|
|
*
|
|
|
node5
|
|
|
|
*
|
|
|
*
|
|
|
node6
|
|
*
|
|
|
*
|
*
|
|
*
|
|
node7
|
|
*
|
|
|
*
|
*
|
|
*
|
|
node8
|
|
*
|
|
|
*
|
*
|
|
*
|
集群搭建前准备工作:
*搭建集群之前需要关闭所有服务器的selinux和防火墙
1.更改所有服务器的主机名和hosts文件对应关系
[root@localhost ~]# hostnamectl set-hostname node1 [root@localhost ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.159.129 node1 192.168.159.130 node2 192.168.159.132 node3 192.168.159.133 node4 192.168.159.136 node5 192.168.159.137 node6 192.168.159.138 node7 192.168.159.139 node8
2.两个NameNode节点做对所有主机的免密登陆,包括自己的节点;两个resourcemanager节点互相做免密登陆,包括自己的节点
[root@localhost ~]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ‘/root/.ssh‘. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:lIvGygyJHycNTZJ0KeuE/BM0BWGGq/UTgMUQNo7Qm2M root@node1 The key‘s randomart image is: +---[RSA 2048]----+ |+@=**o | |*.XB. . | |oo+*o o | |.+E=.. o . | |o=*o+.+ S | |...Xoo | | . =. | | | | | +----[SHA256]-----+ [root@localhost ~]# for i in `seq 1 8`;do ssh-copy-id root@node$i;done
3.同步所有服务器时间
[root@node1 ~]# ansible all -m shell -o -a ‘ntpdate ntp1.aliyun.com‘ node4 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2477]: adjust time server 120.25.115.20 offset 0.001546 sec node6 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2470]: adjust time server 120.25.115.20 offset 0.000220 sec node2 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2406]: adjust time server 120.25.115.20 offset -0.002414 sec node3 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2465]: adjust time server 120.25.115.20 offset -0.001185 sec node5 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:37 ntpdate[2466]: adjust time server 120.25.115.20 offset 0.005768 sec node7 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:43 ntpdate[2503]: adjust time server 120.25.115.20 offset 0.000703 sec node8 | CHANGED | rc=0 | (stdout) 20 Feb 16:08:43 ntpdate[2426]: adjust time server 120.25.115.20 offset -0.001338 sec
4.所有服务器安装JDK环境并配置好环境变量
[root@node1 ~]# tar -xf jdk-8u144-linux-x64.gz -C /usr/ [root@node1 ~]# ln -sv /usr/jdk1.8.0_144/ /usr/java "/usr/java" -> "/usr/jdk1.8.0_144/" [root@node1 ~]# cat /etc/profile.d/java.sh export JAVA_HOME=/usr/java export PATH=$PATH:$JAVA_HOME/bin [root@node1 ~]# source /etc/profile.d/java.sh [root@node1 ~]# java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
zookeeper集群搭建
在规划好的6、7、8节点上安装zookeeper(JDK环境要准备好)
#解压zookeeper程序到/usr目录下 [root@node6 ~]# tar xf zookeeper-3.4.6.tar.gz -C /usr/ #创建zookeeper存放数据目录 [root@node6 ~]# mkdir /usr/data/zookeeper #将zookeeper的conf目录下sample配置文件更改成cfg文件 [root@node6 ~]# cp /usr/zookeeper-3.4.6/conf/zoo_sample.cfg /usr/zookeeper-3.4.6/conf/zoo.cfg #编辑配置文件,更改数据存放目录,并添加zookeeper集群配置信息 [root@node6 ~]# vim /usr/zookeeper-3.4.6/conf/zoo.cfg dataDir=/usr/data/zookeeper #修改 server.1=node6:2888:3888 #添加 server.2=node7:2888:3888 #添加 server.3=node8:2888:3888 #添加 #把配置好的zookeeper程序文件分发至其余的两个节点 [root@node6 ~]# scp -r /usr/zookeeper-3.4.6/ node7:/usr/zookeeper-3.4.6/ [root@node6 ~]# scp -r /usr/zookeeper-3.4.6/ node8:/usr/zookeeper-3.4.6/ #在刚刚创建的目录下当前zookeeper节点信息,必须为数字,且三个节点不能相同 [root@node6 ~]# echo 1 > /usr/data/zookeeper/myid #在剩下的两个节点上也要创建数据存放目录和节点配置文件 [root@node7 ~]# mkdir /usr/data/zookeeper [root@node7 ~]# echo 2 > /usr/data/zookeeper/myid [root@node8 ~]# mkdir /usr/data/zookeeper [root@node8 ~]# echo 3 > /usr/data/zookeeper/myid #配置完成后启动zookeeper集群 [root@node6 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh start [root@node7 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh start [root@node8 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh start #查看集群启动情况(先启动的会成为leader,同时启动数字大的会成为leader) [root@node6 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower [root@node7 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower [root@node8 ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: leader [root@node8 ~]# netstat -tnlp | grep java #只有主节点有2888 tcp6 0 0 :::2181 :::* LISTEN 33766/java tcp6 0 0 192.168.159.139:2888 :::* LISTEN 33766/java tcp6 0 0 192.168.159.139:3888 :::* LISTEN 33766/java tcp6 0 0 :::43793 :::* LISTEN 33766/java
Hadoop集群搭建
1.先添加hadoop的环境变量
[root@node1 ~]# cat /etc/profile.d/hadoop.sh export HADOOP_HOME=/usr/hadoop-2.9.2 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
2.解压hadoop程序包到/usr目录下
[root@node1 ~]# tar xf hadoop-2.9.2.tar.gz -C /usr [root@node1 ~]# ln -sv /usr/hadoop-2.9.2/ /usr/hadoop "/usr/hadoop" -> "/usr/hadoop-2.9.2/"
3.更改hadoop程序包内 hadoop-env.sh,mapred-env.sh,yarn-env.sh中的JAVA_HOME环境变量
[root@node1 ~]# grep ‘export JAVA_HOME‘ /usr/hadoop/etc/hadoop/{hadoop-env.sh,mapred-env.sh,yarn-env.sh}
/usr/hadoop/etc/hadoop/hadoop-env.sh:export JAVA_HOME=/usr/java
/usr/hadoop/etc/hadoop/mapred-env.sh:export JAVA_HOME=/usr/java
/usr/hadoop/etc/hadoop/yarn-env.sh:export JAVA_HOME=/usr/java
4.修改core-site.xml文件(NameNode配置文件)
[root@node1 ~]# vim /usr/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop</value>
<!--HA部署下,NameNode访问hdfs-site.xml中的dfs.nameservices值 -->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/data/hadoop</value>
<!--Hadoop的文件存放目录 -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node6:2181,node7:2181,node8:2181</value>
<!--zookeeper集群地址 -->
</property>
</configuration>
5.在所有hadoop节点创建/usr/data/hadoop目录
6.修改hdfs-site.xml文件
[root@node1 ~]# vim /usr/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<!--数据文件副本数量-->
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
<!--数据块大小,文件超过这个大小就会切开,128M -->
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<!-- **** -->
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop</value>
<!--这个值就是core-site.xml中hdfs集群入口 -->
</property>
<property>
<name>dfs.ha.namenodes.hadoop</name>
<value>nn1,nn2</value>
<!--集群中一共有两个namenode -->
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop.nn1</name>
<value>node1:9000</value>
<!--nn1的rpc通信地址 -->
</property>
<property>
<name>dfs.namenode.http-address.hadoop.nn1</name>
<value>node1:50070</value>
<!--nn1的http通信地址 -->
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop.nn2</name>
<value>node2:9000</value>
<!--nn2的rpc通信地址 -->
</property>
<property>
<name>dfs.namenode.http-address.hadoop.nn2</name>
<value>node2:50070</value>
<!--nn2的http通信地址 -->
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node6:8485;node7:8485;node8:8485/hadoop</value>
<!-- 指定NameNode的元数据在JournalNode日志上的存放位置(一般和zookeeper部署在一起)-->
<!-- 存储路径可以随便起,如果有多个集群,不一样就行-->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<!--是否开启故障自动隔离-->
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/data/journalnode</value>
<!-- 指定JournalNode在本地磁盘存放数据的位置,这个需要指定,默认是放在tmp目录下 -->
</property>
<property>
<name>dfs.client.failover.proxy.provider.hadoop</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<!-- namenode故障转移实现的代理类,注意"name键"要改动-->
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<!--故障自动转移的方法,这里选用ssh远程登陆方法-->
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
<!--选用了ssh远程登陆就需要ssh密钥,两台namenode需要互相做密钥认证-->
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
<!--配置ssh超时时间-->
</property>
</configuration>
7.在journalnode节点创建/usr/data/journalnode目录
8.修改mapred-site.xml( 修改mapred-site.xml.template名称为mapred-site.xml)
[root@node1 ~]# vim /usr/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node3:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node3:19888</value>
</property>
</configuration>
9.修改yarn-site.xml
[root@node1 ~]# vim /usr/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
<!--是否开启rm的高可用-->
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmcluster</value>
<!--生成rm集群的唯一标识,name键不需要改动 -->
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
<!--rm集群的两台机器名称 -->
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node4</value>
<!--rm1的机器地址 -->
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node4:8088</value>
<!--rm1的网页访问地址 -->
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node5</value>
<!--rm2的机器地址 -->
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node5:8088</value>
<!--rm2的网页访问地址 -->
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node6:2181,node7:2181,node8:2181</value>
<!--指定zookeeper集群的地址-->
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
<!--启用自动恢复,默认是false-->
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
<!--指定resourcemanager的状态信息存储在zookeeper集群,默认是存放在FileSystem里-->
</property>
</configuration>
10.编辑datanode配置文件(也是nodemanager的启动位置)
[root@node1 ~]# vim /usr/hadoop/etc/hadoop/slaves node6 node7 node8
仅首次初始化时需要的步骤如下:
1.首先启动三台journalnode集群
[root@node6 ~]# hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-journalnode-node6.out [root@node6 ~]# jps 2965 Jps 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~]# hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-journalnode-node7.out [root@node7 ~]# jps 2119 QuorumPeerMain 2220 JournalNode 2318 Jps [root@node8 ~]# hadoop-daemon.sh start journalnode starting journalnode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-journalnode-node8.out [root@node8 ~]# jps 2229 Jps 2025 QuorumPeerMain 2153 JournalNode
2.格式化NameNode主节点
[root@node1 ~]# hadoop namenode -format

3.启动NameNode主节点
[root@node1 ~]# hadoop-daemon.sh start namenode starting namenode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-namenode-node1.out [root@node1 ~]# jps 7302 Jps 7225 NameNode
4.格式化NameNode从节点
[root@node2 ~]# hadoop namenode -bootstrapStandby

5.NameNode主节点向zookeeper提交初始化节点信息
[root@node1 ~]# hdfs zkfc -formatZK

5.1可以在zookeeper节点上使用zkCli.sh命令查看hdfs信息
[root@node6 ~]# /usr/zookeeper-3.4.6/bin/zkCli.sh Connecting to localhost:2181 ...... ...... [zk: localhost:2181(CONNECTED) 0] ls / [zookeeper] #namenode还没提交信息的时候 [zk: localhost:2181(CONNECTED) 1] ls / [zookeeper, hadoop-ha] #执行了上面那个提交命令 [zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha/hadoop []
6.启动HDFS集群
[root@node1 ~]# start-dfs.sh Starting namenodes on [node1 node2] node2: starting namenode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-namenode-node2.out node1: namenode running as process 7225. Stop it first. node8: starting datanode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-datanode-node8.out node6: starting datanode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-datanode-node6.out node7: starting datanode, logging to /usr/hadoop-2.9.2/logs/hadoop-root-datanode-node7.out Starting journal nodes [node6 node7 node8] node6: journalnode running as process 2904. Stop it first. node7: journalnode running as process 2220. Stop it first. node8: journalnode running as process 2153. Stop it first. Starting ZK Failover Controllers on NN hosts [node1 node2] node2: starting zkfc, logging to /usr/hadoop-2.9.2/logs/hadoop-root-zkfc-node2.out node1: starting zkfc, logging to /usr/hadoop-2.9.2/logs/hadoop-root-zkfc-node1.out [root@node1 ~]# jps 7857 DFSZKFailoverController 7924 Jps 7225 NameNode [root@node2 ~]# jps 2788 Jps 2633 NameNode 2732 DFSZKFailoverController [root@node6 ~]# jps 3235 Jps 3125 DataNode 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~]# jps 2119 QuorumPeerMain 2220 JournalNode 2572 Jps 2462 DataNode [root@node8 ~]# jps 2483 Jps 2373 DataNode 2025 QuorumPeerMain 2153 JournalNode
7.此时zookeeper上就会有namenode的信息了,只存储主节点信息



以上HDFS高可用集群初始化完成,下面启动yarn集群
1.在namenode主节点上开启yarn集群,start-yarn.sh命令仅可以启动nodemanager,resourcemanager需要在对应节点上手动启动
[root@node1 ~]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-resourcemanager-node1.out node7: starting nodemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-nodemanager-node7.out node8: starting nodemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-nodemanager-node8.out node6: starting nodemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-nodemanager-node6.out [root@node6 ~]# jps 3125 DataNode 3397 NodeManager 3509 Jps 2904 JournalNode 2779 QuorumPeerMain [root@node7 ~]# jps 2736 NodeManager 2848 Jps 2119 QuorumPeerMain 2220 JournalNode 2462 DataNode [root@node8 ~]# jps 2373 DataNode 2646 NodeManager 2758 Jps 2025 QuorumPeerMain 2153 JournalNode
2.在resourcemanager节点手动启动rm
[root@node4 ~]# yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-resourcemanager-node4.out [root@node4 ~]# jps 2840 ResourceManager 3103 Jps [root@node5 ~]# yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /usr/hadoop-2.9.2/logs/yarn-root-resourcemanager-node5.out [root@node5 ~]# jps 2994 Jps 2955 ResourceManager

start-dfs.sh start-yarn.sh
在resourcemanager节点
yarn-daemon.sh start resourcemanager
标签:roc 必须 har control hadoop集群 bsp 变量 火墙 服务
原文地址:https://www.cnblogs.com/forlive/p/12345508.html