hadoop集群

本文共有5909个字,关键词:

建议先将之前的java版本删除

安装java jdk
可以先删除原来的
rpm -qa | grep jdk
yum remove …

添加hadoop用户来启动hadoop

useradd hadoop
passwd hadoop
#添加写权限
chmod u+w /etc/sudoers
vim /etc/sudoers
root    ALL=(ALL)       ALL
#添加下面一行
hadoop ALL=(root) NOPASSWD:ALL
#撤销文件的写权限
chmod u-w /etc/sudoers
cd /opt
mkdir hadoop

#将hadoop文件夹的所有者指定为hadoop用户
chown -R hadoop:hadoop /opt/hadoop
修改hosts
master192.168.119.110
slave1192.168.119.111
slave2192.168.119.112
su - hadoop

配置core-site.xml

<configuration>
    <!— 指定HDFS中NameNode的地址—>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
    </property>
     <!— 指定Hadoop运行时产生文件的存储目录 —>
    <property>
    <name>hadoop.tmp.dir</name>
     <value>/opt/hadoop/data/tmp</value>
    </property>
</configuration>

配置hdfs-site.xml

<configuration>
    <property>
    <!—namenode持久存储名字空间及事务日志的本地文件系统路径—>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/data/name</value>
        <description>namenode上存储hdfs名字空间元数据 </description>
    </property>
    <property><!—DataNode存放块数据的本地文件系统路径—>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop/data/data</value>
        <description>datanode上数据块的物理存储位置</description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>数据需要备份的数量,不能大于集群的机器数量,默认为3</description>
    </property>
    <property>
                <name>dfs.block.size</name>
                <value>5242880</value>
        <description>5M,需修改成1024整数倍5M=510241024</description>
        </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1:50090</value>
        <description>这个主机名设置哪个节点,SecondaryNameNode就启动在哪个服务器上</description>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
        <description>only cofig in clients</description>
    </property>
    <property>
        <name>dfs.hosts.exclude</name>
        <value>/opt/hadoop/etc/hadoop/dfs_exclude</value>
        <description>datanode踢除主机列表文件</description>
    </property>
    <property>
        <name>dfs.hosts</name>
        <value>/opt/hadoop/etc/hadoop/slaves</value>
        <description>datanode添加主机列表文件</description>
    </property>
</configuration>

配置yarn-env.sh,配置JAVA_HOME

export JAVA_HOME=/opt/jdk1.8.0_333

配置yarn-site.xml

<configuration>
    <!—获取数据和方式—>
    <property>
     <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
    </property>
    <!—指定Yarn的resourcemanager地址—>
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
    <description>The hostname of the RM.修改为主节点主机名</description>
    </property>

    <property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/opt/hadoop/etc/hadoop/dfs_exclude</value>
    <description>datanode踢除主机列表文件</description>
    </property>

    <property>
    <name>yarn.resourcemanager.nodes.include-path</name>
    <value>/opt/hadoop/etc/hadoop/slaves</value>
    <description>datanode添加主机列表文件</description>
    </property>
</configuration>

配置mapred-env.sh,增加JAVA_HOME

export JAVA_HOME=/opt/jdk1.8.0_333

配置历史服务器mapred-site.xml

<configuration>
    <!—指定MR运行在yarn上—>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
     <!— 历史服务器端地址 —>
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
    </property>
    <!— 历史服务器web端地址 —>
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
    </property>
</configuration>

配置slaves文件

vim /opt/hadoop/etc/hadoop/masters
master
#或写IP
#192.168.119.100

配置slaves文件和works文件,即从服务器结点可让三台服务器同步

vim /opt/hadoop/etc/hadoop/slaves
slave1
slave2
或者写IP
192.168.119.111
192.168.119.112

配置免密(确认以hadoop身份登陆)
在master上

ssh-keygen -t dsa -f ~/.ssh/id_dsa
cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys

拷贝至slave1和slave2

scp ~/.ssh/id_dsa.pub hadoop@slave1:~/.ssh/master.pub
scp ~/.ssh/id_dsa.pub hadoop@slave2:~/.ssh/master.pub

在主机上执行

ssh hadoop@slave1 "cat ~/.ssh/master.pub>> ~/.ssh/authorized_keys"
ssh hadoop@slave2 "cat ~/.ssh/master.pub>> ~/.ssh/authorized_keys"

或者以hadoop身份进入slave1和slave2
分别运行

cat ~ /.ssh/master.pub>> ~ /.ssh/authorized_keys

将环境 这里用到的/etc/profile ,/opt/jdk1.8.0_333 目录 和 /opt/hadoop目录从master拷贝至slave1和slave2
[可能需要先给slave1和slave2的相关文件夹权限,或以root身份先拷贝再授权]

scp -r /opt/jdk1.8.0_333 hadoop@slave1:/opt
scp -r /opt/jdk1.8.0_333 hadoop@slave2:/opt
scp -r /opt/hadoop hadoop@slave1:/opt
scp -r /opt/hadoop hadoop@slave1:/opt

格式化namenode
命令:

bin/hdfs namenode -format

启动服务

sbin/start-all.sh

http://192.168.119.110:9870/
hdfs://192.168.119.110:9000
cluster : 192.168.119.110:8088/cluster
2nd NameNode 192.168.119.110:50090
启动历史服务器

./sbin/mr-jobhistory-daemon.sh start historyserver

http://192.168.119.110:19888/

以下内容可选

修改start-dfs.sh、stop-dfs.sh

在文件头部加上:

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs 
HDFS_NAMENODE_USER=root 
HDFS_SECONDARYNAMENODE_USER=root

修改start-yarn.sh、stop-yarn.sh
在文件头部加上:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

如果修改了相关配置文件,可以使用进行复制

scp -r hadoop@slave1:/your path
#例如 
scp -r hadoop@slave1:/opt/hadoop/etc/hadoop

梦白沙

(๑>ڡ<)☆谢谢老板~

使用微信扫描二维码完成支付

版权声明:本文为作者原创,如需转载须联系作者本人同意,未经作者本人同意不得擅自转载。
添加新评论
暂无评论