Spark实践笔记2:Hadoop安装

  1. Hadoop下载:我选择了hadoop2.7.0

  2. 将hadoop-2.7.0.tar 上传到ubuntu1里去,并将hadoop解压到/usr/local/hadoop

1
2
3
4
mkdir /usr/local/hadoop
tar xvf hadoop-2.7.0.tar -C /usr/local/hadoop/
  1. 修改 ~/.bashrc 文件,增加环境变量:
1
2
3
4
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.0
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

导入配置使其生效:source ~/.bashrc

  1. 创建必要的目录
1
2
3
4
5
6
7
8
mkdir ${HADOOP_HOME}/dfs
mkdir ${HADOOP_HOME}/tmp
mkdir ${HADOOP_HOME}/dfs/data
mkdir ${HADOOP_HOME}/dfs/name
  1. 修改配置文件,配置文件目录:${HADOOP_HOME}/etc/hadoop
  • 检查配置文件hadoop-env.sh , yarn-env.sh , mapred-env.sh里的JAVA_HOME配置,未配置或者配置不正确则修改成正确的。此处JAVA_HOME都应配绝对路径

  • 修改配置文件slaves,将其修改为正确的从节点名

1
2
3
4
5
6
7
8
9
10
root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop# vi slaves
root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop# more slaves
ubuntu2
ubuntu3
root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop#
  • 修改core-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu1:9000/</value>
<description>The name of the default file system</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.7.0/tmp</value>
<description>A base for other temporary directories</description>
</property>
</configuration>
  • 修改hdfs-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop-2.7.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hadoop-2.7.0/dfs/data</value>
</property>
</configuration>

d-default.xml)

cp mapred-site.xml.template mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  • 修改yarn-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ubuntu1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
  1. 发布到另外两台机器:
1
2
3
4
scp -r /usr/local/hadoop ubuntu2:/usr/local
scp -r /usr/local/hadoop ubuntu3:/usr/local

同样修改ubuntu2,ubuntu3的环境变量,增加HADOOP_HOME及PATH,参看 二.3

  1. 启动hadoop
  • 格式化namenode,在ubuntu1上执行

root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/bin# hadoop namenode -format

  • 在${HADOOP_HOME}/sbin执行./start-dfs.sh ,执行后jps一下,在ubuntu1上发现有namenode,datanode,在ubuntu2,ubuntu3上又datanode进程出现。

在执行./start-dfs.sh时发现有告警:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

解决办法是手工编译hadoop并用编译生成的native-hadoop library替换自带的native-hadoop library。

通过 http://10.211.55.22:8042/ http://10.211.55.23:8042/ 查看对应主机上的nodemanager运行状况

  • 执行 ./mr-jobhistory-daemon.sh start historyserver,启动历史任务服务,启动后通过 http://10.211.55.21:19888 任务历史信息
  1. 通过wordcount验证hadoop
  • 在HDFS里创建输入输出目录:
1
2
3
4
hadoop fs -mkdir -p /data/wordcount
hadoop fs -mkdir -p /output/
  • 将HADOOP的配置文件当做是wordcount的对象放到HDFS里,

hadoop fs -put ${HADOOP_HOME}/etc/hadoop/*.xml /data/wordcount/

  • 执行自带的wordcount

hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /data/wordcount /output/wordcount

  • 执行成功,查看执行结果

hadoop fs -cat /output/wordcount/part-r-00000

至此,HADOOP安装完毕!