3 Star 0 Fork 2

ryanpenn / hadoop-wordcount-sample

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

Windows/Mac + Maven + Hadoop2.7.3 开发环境配置(含wordcount示例)

前置条件:安装和配置JDK1.8(略)

一、下载hadoop2.7.3

二、安装hadoop

  1. 解压到
    • D:\hadoop\hadoop-2.7.3 (windows)
    • ~/Dev/hadoop/hadoop-2.7.3 (mac)
  2. 将 winutils 解压(仅windows)
    • 将 hadoop.dll 和 winutils.exe 拷贝到 D:\hadoop\hadoop-2.7.3\bin 目录下

三、环境变量设置:

    (windows)
    新建系统环境变量 HADOOP_HOME=D:\hadoop\hadoop-2.7.3
    修改path,增加 %HADOOP_HOME%\bin
    
    (mac)
    vim ~/.bash_profile
    export HADOOP_HOME=~/Dev/hadoop/hadoop-2.7.3
    export path=$path:$HADOOP_HOME/bin

四、hadoop配置

  • 创建本地工作目录
    mkdir %HADOOP_HOME%\home
    mkdir %HADOOP_HOME%\home\tmp
    mkdir %HADOOP_HOME%\home\name
    mkdir %HADOOP_HOME%\home\data
  • 修改 %HADOOP_HOME%\etc\hadoop 目录下的配置文件:
    <!-- core-site.xml -->
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/D:/_Dev/server/hadoop/hadoop-2.7.3/home/tmp</value>
        </property>
        <property>
            <name>dfs.name.dir</name>
            <value>/D:/_Dev/server/hadoop/hadoop-2.7.3/home/name</value>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    <!-- hdfs-site.xml -->
    <configuration>
        <!-- 这个参数设置为1,因为是单机版hadoop -->
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/D:/_Dev/server/hadoop/hadoop-2.7.3/home/data</value>
        </property>
    </configuration>
    <!--mapred-site.xml -->
    <configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
        </property>
        <property>
           <name>mapred.job.tracker</name>
           <value>hdfs://localhost:9001</value>
        </property>
    </configuration>
    <!-- yarn-site.xml -->
    <configuration>
        <!-- Site specific YARN configuration properties -->
        <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
    </configuration>
  • 修改 %HADOOP_HOME%\etc\hadoop 目录下的 hadoop-env.cmd 文件(mac修改 hadoop-env.sh)
    windows配置,指定 JAVA_HOME 的安装目录,不能有空格。(Program Files 用 PROGRA~1 代替)
    set JAVA_HOME=D:\PROGRA~1\Java\jdk1.8.0
    mac配置
    export HADOOP_HEAPSIZE=2000
    export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_HOME/lib/native"

五、运行和测试

  1. 运行cmd窗口,切换目录 cd %HADOOP_HOME%, 执行 hdfs namenode -format
    • 检查配置是否正确 hadoop checknative -a
  2. 切换目录 cd sbin,执行 start-all.cmd, 将会启动4个进程 namenode/datanode/nodemanager/resourcemanager
  3. 执行 hadoop fs -mkdir hdfs://localhost:9000/user/
  4. 执行 hadoop fs -ls hdfs://localhost:9000/, 查看刚刚创建的user目录
  5. 停止运行的方法是切换目录 cd sbin,执行 stop-all.cmd

六、hadoop自带的web控制台GUI

  1. 资源管理GUI: http://localhost:8088/
  2. 节点管理GUI: http://localhost:50070/

七、在IDEA中编写Hadoop程序

  1. 建立maven项目,在pom文件中添加相关的依赖,请参照本示例
  2. 将 %HADOOP_HOME%\etc\hadoop 目录下的相关配置文件(core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml)添加到resources文件夹下

八、执行程序(完成编码后)

  1. 启动
  2. 打包
    • mvn clean package
  3. 创建
    • fs -mkdir /user
    • fs -mkdir /user/wordcount
    • fs -mkdir /user/wordcount/input
    • fs -mkdir /user/wordcount/output
  4. 上传
    • hadoop fs -rm /user/wordcount/input/words.txt (可选,删除之前的文件)
    • hadoop fs -put docs/words.txt /user/wordcount/input/words.txt
  5. 运行
    • hadoop jar target/wordCountJob-with-dependencies.jar /user/wordcount/input/words.txt /user/wordcount/output
  6. 观察
    • hadoop fs -cat /user/wordcount/output/part-r-00000
  7. 结束
    • %HADOOP_HOME%\sbin\stop-all.cmd
    • $HADOOP_HOME/sbin/stop-all.sh (mac)
The MIT License (MIT) Copyright (c) 2017 ryanpenn Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

hadoop word count 示例程序,基于Hadoop2.7.3版本 展开 收起
Java
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/ryanpenn/hadoop-wordcount-sample.git
git@gitee.com:ryanpenn/hadoop-wordcount-sample.git
ryanpenn
hadoop-wordcount-sample
hadoop-wordcount-sample
master

搜索帮助