码迷,mamicode.com
首页 > 其他好文 > 详细

Hadoop全分布模式启动集群

时间:2020-11-23 12:38:36      阅读:12      评论:0      收藏:0      [点我收藏+]

标签:rm -rf   mic   rip   daemon   分布   机器   port   ddr   ima   

一、 初始化工作:

 

1. 准备三台机器

hostanme IP 用途
master 192.168.0.10 namenode
node1 192.168.0.11 datenode
node2 192.168.0.12 datenode

 

 

 

 

 

在三台机器上做下面的动作

 

2. 编辑/etc/hosts

192.168.0.10 master
192.168.0.11 node1
192.168.0.12 node2

 

3.  创建用户hadoop,及数据存储目录

useradd hadoop
echo hadoop|passwd --stdin hadoop
mkdir /hadoop
mkdir /var/log/hadoop
mkdir /var/run/hadoop
chown -R hadoop:hadoop /hadoop
chown -R hadoop:hadoop /var/log/hadoop /var/run/hadoop

 

4. 建立节点之间的免密访问hadoop用户

master节点执行

su - hadoop
ssh-keygen
ssh-copy-id hadoop@node1
ssh-copy-id hadoop@node2

同样在node1和node2执行,后面的主机名需要做相应的修改

如果master也作为了datenode,写在了works文件里,则需要把自身设置免密访问

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

 

5. 安装JAVA运行环境,得到如下效果

[hadoop@node2 ~]$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

[hadoop@node2 ~]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_181

 

6. 安装需要的依赖

yum -y install pdsh

 

二、安装并配置Hadoop

 

在Master节点下执行下面的步骤:

1. 下载hadoop

wget -P /usr/local/src https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

 

2. 解压到/opt下,并指定属主

在root用户下执行

tar xvf hadoop-3.3.0.tar.gz -C /opt
chown -R hadoop:hadoop /opt/hadoop-3.3.0/

 

3. 配置hadoop

在hadoop用户下编辑

 

/opt/hadoop-3.3/etc/hadoop/hadoop-env.sh:

export JAVA_HOME=/usr/java/jdk1.8.0_181
export HADOOP_LOG_DIR=/var/log/hadoop
export HADOOP_PID_DIR=/var/run/hadoop

 

/opt/hadoop-3.3.0/etc/hadoop/workers:

node1
node2
master

 

/opt/hadoop-3.3/etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
   </property>
</configuration>

 

/opt/hadoop-3.3.0/etc/hadoop/mapred-site.xml:

<configuration>
    <property>
        <!-- 历史服务器端地址 -->
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
       <!-- 历史服务器web端地址 -->
       <name>mapreduce.jobhistory.webapp.address</name>
       <value>master:19888</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

 

/opt/hadoop-3.3.0/etc/hadoop/yarn-site.xml:

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
   </property>
   <property>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
   </property>
   <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
   </property>
   <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
   </property>
   <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
   </property>
   <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
</property>
   <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
   </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>4096</value>
        <discription>每个节点可用内存,单位MB,默认8182MB</discription>
   </property>
   <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
   </property>
   <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
   <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <!-- 日志聚集功能使能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <!-- 日志保留时间设置7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

 

4. 复制/opt/hadoop-3.3.0到node1和node2

scp -r /opt/hadoop-3.3.0 root@node1:/opt
scp -r /opt/hadoop-3.3.0 root@node1:/opt

 

5. 配置hadoop环境变量

在root用户下编辑/etc/profile.d/hadoop.sh

export HADOOP_HOME=/opt/hadoop-3.3.0
export PATH=$PATH:/$HADOOP_HOME/bin

复制到node1和node2

scp /etc/profile.d/hadoop.sh root@node1:/etc/profile.d/
scp /etc/profile.d/hadoop.sh root@node2:/etc/profile.d/

 

6. 格式化文件系统

在haddop用户下运行

hdfs namenode -format

 

7. 启动NameNode守护程序和DataNode守护程序

[hadoop@master hadoop-3.3.0]$ ./sbin/start-dfs.sh 
Starting namenodes on [master]
Starting datanodes
Starting secondary namenodes [master]

查看web界面

  • NameNode http://192.168.0.10:9870

技术图片

 

 

8. 启动ResourceManager守护程序和NodeManager守护程序

[hadoop@master hadoop-3.3.0]$ ./sbin/start-yarn.sh 
Starting resourcemanager
Starting nodemanagers

 

浏览Web界面以找到ResourceManager;默认情况下,它在以下位置可用:

  • ResourceManager- http://192.168.0.10:8088

技术图片

 

 

9. 启动MapReduce JobHistory Server

[hadoop@master hadoop-3.3.0]$ ./bin/mapred --daemon start historyserver

浏览web界面

  • historyserver:http://192.168.0.10:19888/

技术图片

 

三 在本地运行MapReduce作业

1. 设置执行MapReduce作业所需的HDFS目录:

hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/hadoop

 

2. 将本地的输入文件上传到HDFS目录

[hadoop@master hadoop-3.3.0]$ hdfs dfs -mkdir input
[hadoop@master hadoop-3.3.0]$ hdfs dfs -put etc/hadoop/*xml input

 

3. 查看上传到HDFS的文件

[hadoop@master hadoop-3.3.0]$ hdfs dfs -ls /user/hadoop/input
Found 10 items
-rw-r--r--   1 hadoop supergroup       9213 2020-11-19 11:09 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--   1 hadoop supergroup       1056 2020-11-19 11:09 /user/hadoop/input/core-site.xml
-rw-r--r--   1 hadoop supergroup      11765 2020-11-19 11:09 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--   1 hadoop supergroup        683 2020-11-19 11:09 /user/hadoop/input/hdfs-rbf-site.xml
-rw-r--r--   1 hadoop supergroup       1345 2020-11-19 11:09 /user/hadoop/input/hdfs-site.xml
-rw-r--r--   1 hadoop supergroup        620 2020-11-19 11:09 /user/hadoop/input/httpfs-site.xml
-rw-r--r--   1 hadoop supergroup       3518 2020-11-19 11:09 /user/hadoop/input/kms-acls.xml
-rw-r--r--   1 hadoop supergroup        682 2020-11-19 11:09 /user/hadoop/input/kms-site.xml
-rw-r--r--   1 hadoop supergroup       1388 2020-11-19 11:09 /user/hadoop/input/mapred-site.xml
-rw-r--r--   1 hadoop supergroup       3076 2020-11-19 11:09 /user/hadoop/input/yarn-site.xml

 

3. 利用示例的jar文件,过滤出input目录中的以dfs开头的单词

[hadoop@master hadoop-3.3.0]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar grep input output dfs[a-z.]+

 

4. 查看结果

直接在hdfs中查看

[hadoop@master hadoop-3.3.0]$ hdfs dfs -cat output/*
1    dfsadmin
1    dfs.replication
1    dfs.name.dir
1    dfs.data.dir

或者下载到本地查看

 $ bin/hdfs dfs -get output output
 $ cat output/*

 

 

四、遇到的问题及解决

问题

node1和node2的datenode已经启动,但是在webui上看不到node1和node2

解决办法:

在node1和node2节点进行下面操作:

需要去修改hdfs-site.xml 中找到配置name和data的2个路径,然后删除标记部分

技术图片

 

然后删除实际目录

rm -rf /hadoop/dfs/name

rm -rf /hadoop/dfs/data

 

修改core-site.xml,删除标记部分

技术图片

 

 rm -rf /hadoop/tmp

Hadoop全分布模式启动集群

标签:rm -rf   mic   rip   daemon   分布   机器   port   ddr   ima   

原文地址:https://www.cnblogs.com/zydev/p/14004073.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!