标签:
--------------------------------------------------------------------------
第一阶段:hadoop的伪分布式安装
		
第二阶段:mahout的安装
第三阶段:20newsgroups的bayes算法测试
-------------------------------------------------------------------------
		注意:安装完vmwaretools必须重启centos才可以生效
第一阶段:hadoop的伪分布式安装
		1.JDK的安装
			1.1解压hadoop安装包卸载hadoop自带的jdk
				1. 	检验系统原版本: 命令行 # java -version
					查看详细信息 # rpm -qa | grep java
					卸载自带的:  命令行  # rpm -e --nodeps
						卸载OpenJDK,执行以下操作
						[root@Centos 桌面]# rpm -e --nodeps 版本信息
					复查 # rpm -qa | grep java  无输出表示卸载干净了
----------------------------------------------------------------------------------
[root@Centos 桌面]# java -version
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
[root@Centos 桌面]# rpm -qa | grep java
tzdata-java-2012j-1.el6.noarch
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@Centos 桌面]# rpm -e --nodeps tzdata-java-2012j-1.el6.noarch
[root@Centos 桌面]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@Centos 桌面]# rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@Centos 桌面]# rpm -qa | grep java
[root@Centos 桌面]# 
----------------------------------------------------------------------------------
			1.2 安装自己下载的jdk配置环境变量
				1.解压安装jdk
			------------------------------------------------------------
				[root@Centos 桌面]# cd /root
				[root@Centos ~]# tar zxvf jdk-8u65-linux-x64.gz
			------------------------------------------------------------
				2.配置环境变量
					1.编辑/etc/profile文件   命令行 vi  /etc/profile
						[root@Centos ~]# vi /etc/profile
					2.配置环境变量---在/etc/profile文件里添加jdk路径
						export  JAVA_HOME=/root/jdk1.8.0_65
			            export  JRE_HOME=/root/jdk1.8.0_65/jre
			            export  CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
			            export  PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
					3.保存生效
						[root@Centos ~]# source /etc/profile
						[root@Centos ~]# echo $JAVA_HOME
				3. 验证安装 
             	 执行以下操作,查看信息是否正常:
             	 	[root@Centos ~]# java
             	 	[root@Centos ~]# javac
             	 	[root@Centos ~]# java -version
---------------------------------------------------------------------------------
[root@Centos ~]# java
用法: java [-options] class [args...]
           (执行类)
   或  java [-options] -jar jarfile [args...]
           (执行 jar 文件)
其中选项包括:
    -d32	  使用 32 位数据模型 (如果可用)
    -d64	  使用 64 位数据模型 (如果可用)
    -server	  选择 "server" VM
                  默认 VM 是 server.
    -cp <目录和 zip/jar 文件的类搜索路径>
    -classpath <目录和 zip/jar 文件的类搜索路径>
                  用 : 分隔的目录, JAR 档案
                  和 ZIP 档案列表, 用于搜索类文件。
    -D<名称>=<值>
                  设置系统属性
    -verbose:[class|gc|jni]
                  启用详细输出
    -version      输出产品版本并退出
    -version:<值>
                  警告: 此功能已过时, 将在
                  未来发行版中删除。
                  需要指定的版本才能运行
    -showversion  输出产品版本并继续
    -jre-restrict-search | -no-jre-restrict-search
                  警告: 此功能已过时, 将在
                  未来发行版中删除。
                  在版本搜索中包括/排除用户专用 JRE
    -? -help      输出此帮助消息
    -X            输出非标准选项的帮助
    -ea[:<packagename>...|:<classname>]
    -enableassertions[:<packagename>...|:<classname>]
                  按指定的粒度启用断言
    -da[:<packagename>...|:<classname>]
    -disableassertions[:<packagename>...|:<classname>]
                  禁用具有指定粒度的断言
    -esa | -enablesystemassertions
                  启用系统断言
    -dsa | -disablesystemassertions
                  禁用系统断言
    -agentlib:<libname>[=<选项>]
                  加载本机代理库 <libname>, 例如 -agentlib:hprof
                  另请参阅 -agentlib:jdwp=help 和 -agentlib:hprof=help
    -agentpath:<pathname>[=<选项>]
                  按完整路径名加载本机代理库
    -javaagent:<jarpath>[=<选项>]
                  加载 Java 编程语言代理, 请参阅 java.lang.instrument
    -splash:<imagepath>
                  使用指定的图像显示启动屏幕
有关详细信息, 请参阅 http://www.oracle.com/technetwork/java/javase/documentation/index.html。
[root@Centos ~]# javac
用法: javac <options> <source files>
其中, 可能的选项包括:
  -g                         生成所有调试信息
  -g:none                    不生成任何调试信息
  -g:{lines,vars,source}     只生成某些调试信息
  -nowarn                    不生成任何警告
  -verbose                   输出有关编译器正在执行的操作的消息
  -deprecation               输出使用已过时的 API 的源位置
  -classpath <路径>            指定查找用户类文件和注释处理程序的位置
  -cp <路径>                   指定查找用户类文件和注释处理程序的位置
  -sourcepath <路径>           指定查找输入源文件的位置
  -bootclasspath <路径>        覆盖引导类文件的位置
  -extdirs <目录>              覆盖所安装扩展的位置
  -endorseddirs <目录>         覆盖签名的标准路径的位置
  -proc:{none,only}          控制是否执行注释处理和/或编译。
  -processor <class1>[,<class2>,<class3>...] 要运行的注释处理程序的名称; 绕过默认的搜索进程
  -processorpath <路径>        指定查找注释处理程序的位置
  -parameters                生成元数据以用于方法参数的反射
  -d <目录>                    指定放置生成的类文件的位置
  -s <目录>                    指定放置生成的源文件的位置
  -h <目录>                    指定放置生成的本机标头文件的位置
  -implicit:{none,class}     指定是否为隐式引用文件生成类文件
  -encoding <编码>             指定源文件使用的字符编码
  -source <发行版>              提供与指定发行版的源兼容性
  -target <发行版>              生成特定 VM 版本的类文件
  -profile <配置文件>            请确保使用的 API 在指定的配置文件中可用
  -version                   版本信息
  -help                      输出标准选项的提要
  -A关键字[=值]                  传递给注释处理程序的选项
  -X                         输出非标准选项的提要
  -J<标记>                     直接将 <标记> 传递给运行时系统
  -Werror                    出现警告时终止编译
  @<文件名>                     从文件读取选项和文件名
[root@Centos ~]# java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
·······························JDK安装完成·····································
	2. hadoop的安装开始
		1.在hadoop的conf目录下配置 hadoop-env.sh  core-site.xml  hdfs-site.xml  mapred-site.xml
			1.1 在hadoop-env.sh里的配置hadoop的JDK环境
			---------------------------------------------
			[root@Centos ~]# cd hadoop-1.2.1/
			[root@Centos hadoop-1.2.1]# cd conf
			[root@Centos conf]# vi hadoop-env.sh 
			---------------------------------------------
				配置信息如下:
				export JAVA_HOME=/root/jdk1.8.0_65	
			1.2 在core-site.xml里的配置hadoop的HDFS地址及端口号
			------------------------------------------------
			[root@Centos conf]# vi core-site.xml
			------------------------------------------------
			配置信息如下:
				<configuration>
	        		<property>
		                <name>fs.default.name</name>
		                <value>hdfs://localhost:9000</value>
	        		</property>
				</configuration>
			1.3 在hdfs-site.xml里的配置hadoop的HDFS的配置
			-------------------------------------------------
			[root@Centos conf]# vi hdfs-site.xml
			-------------------------------------------------
			配置信息如下:
				<configuration>
	        		<property>
	                <name>dfs.replication</name>
	                <value>1</value>
	        		</property>
				</configuration>
			1.4 在mapred-site.xml里的配置hadoop的HDFS的配置
			-------------------------------------------------
			[root@Centos conf]# vi mapred-site.xml
			--------------------------------------------
			配置信息如下:
				<configuration>
        			<property>          
                  	<name>mapred.job.tracker</name>
                  	<value>localhost:9001</value>
        			</property>
				</configuration>
--------------------------------------------------------------------
[root@Centos conf]# vi hadoop-env.sh 
[root@Centos conf]# vi core-site.xml 
[root@Centos conf]# vi hdfs-site.xml 
[root@Centos conf]# vi mapred-site.xml 
--------------------------------------------------------------------
		2.ssh免密码登录
--------------------------------------------------------------------
[root@Centos conf]# cd /root
[root@Centos ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ed:48:64:29:62:37:c1:e9:3d:84:bf:ad:4e:50:5e:66 root@Centos
The key‘s randomart image is:
+--[ RSA 2048]----+
|     ..o         |
|      +...       |
|    o.++= E      |
|   . o.B+=       |
|      . S+.      |
|       o.o.      |
|        o..      |
|       ..        |
|       ..        |
+-----------------+
c[root@Centos ~]# cd .ssh
[root@Centos .ssh]# ls
id_rsa  id_rsa.pub
[root@Centos .ssh]# cp id_rsa.pub  authorized_keys
[root@Centos .ssh]# ls
authorized_keys  id_rsa  id_rsa.pub
[root@Centos .ssh]# ssh localhost
The authenticity of host ‘localhost (::1)‘ can‘t be established.
RSA key fingerprint is 3f:84:db:2f:53:a9:09:a6:61:a2:3a:82:80:6c:af:1a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost‘ (RSA) to the list of known hosts.
-------------------------------------------------------------------------------
	验证免密码登录
-------------------------------------------------------------------------------
[root@Centos ~]# ssh localhost
Last login: Sun Apr  3 23:19:51 2016 from localhost
[root@Centos ~]# exit
logout
Connection to localhost closed.
[root@Centos ~]# ssh localhost
Last login: Sun Apr  3 23:20:12 2016 from localhost
[root@Centos ~]# exit
logout
Connection to localhost closed.
[root@Centos ~]# 
----------------------------SSH免密码登录设置成功----------------------------
		3.格式化HDFS
			命令行 # bin/hadoop namenode -format
-----------------------------------------------------------------------------
[root@Centos ~]# cd /root/hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# bin/hadoop namenode -format
16/04/03 23:24:12 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = java.net.UnknownHostException: Centos: Centos: unknown error
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf‘ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.8.0_65
************************************************************/
16/04/03 23:24:13 INFO util.GSet: Computing capacity for map BlocksMap
16/04/03 23:24:13 INFO util.GSet: VM type       = 64-bit
16/04/03 23:24:13 INFO util.GSet: 2.0% max memory = 1013645312
16/04/03 23:24:13 INFO util.GSet: capacity      = 2^21 = 2097152 entries
16/04/03 23:24:13 INFO util.GSet: recommended=2097152, actual=2097152
16/04/03 23:24:15 INFO namenode.FSNamesystem: fsOwner=root
16/04/03 23:24:15 INFO namenode.FSNamesystem: supergroup=supergroup
16/04/03 23:24:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/04/03 23:24:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/04/03 23:24:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/04/03 23:24:15 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/04/03 23:24:15 INFO namenode.NameNode: Caching file names occuring more than 10 times 
16/04/03 23:24:17 INFO common.Storage: Image file /tmp/hadoop-root/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/04/03 23:24:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:24:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:24:18 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
16/04/03 23:24:18 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: Centos: Centos: unknown error
************************************************************/
-----------------------------------------------------------------------------
格式化节点报错:Centos: unknown error--------别着急紧接着下一步配置
--------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# vi /etc/hosts
	配置信息如下:
	127.0.0.1   localhost Centos
-------------------------------------------------------------------------
再一次进行格式化		
--------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# vi /etc/hosts
[root@Centos hadoop-1.2.1]# bin/hadoop namenode -format
16/04/03 23:26:30 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Centos/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf‘ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.8.0_65
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) Y
16/04/03 23:26:33 INFO util.GSet: Computing capacity for map BlocksMap
16/04/03 23:26:33 INFO util.GSet: VM type       = 64-bit
16/04/03 23:26:33 INFO util.GSet: 2.0% max memory = 1013645312
16/04/03 23:26:33 INFO util.GSet: capacity      = 2^21 = 2097152 entries
16/04/03 23:26:33 INFO util.GSet: recommended=2097152, actual=2097152
16/04/03 23:26:33 INFO namenode.FSNamesystem: fsOwner=root
16/04/03 23:26:33 INFO namenode.FSNamesystem: supergroup=supergroup
16/04/03 23:26:33 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/04/03 23:26:33 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/04/03 23:26:33 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/04/03 23:26:33 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/04/03 23:26:33 INFO namenode.NameNode: Caching file names occuring more than 10 times 
16/04/03 23:26:34 INFO common.Storage: Image file /tmp/hadoop-root/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
16/04/03 23:26:34 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:26:34 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
16/04/03 23:26:34 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
16/04/03 23:26:34 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Centos/127.0.0.1
************************************************************/
---------------------------namenode格式化成功------------------------------
	4.启动hadoop
		关闭防火墙命令行  # service iptables stop
		启动hadoop集群命令行  # start-all.sh
        关闭hadoop集群命令行  # stop-all.sh
---------------------------------------------------------------------------
关闭防火墙
[root@Centos hadoop-1.2.1]# service iptables stop
iptables:清除防火墙规则:                                 [确定]
iptables:将链设置为政策 ACCEPT:filter                    [确定]
iptables:正在卸载模块:                                   [确定]
启动hadoop集群
[root@Centos hadoop-1.2.1]# bin/start-all.sh 
starting namenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-Centos.out
localhost: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-Centos.out
localhost: starting secondarynamenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-Centos.out
starting jobtracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-Centos.out
localhost: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-Centos.out
验证集群是否正常启动----5个节点在列表中则启动成功
再次验证启动项目
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# jps
30692 SecondaryNameNode
30437 NameNode
31382 Jps
30903 TaskTracker
30775 JobTracker
30553 DataNode
[root@Centos mahout-distribution-0.6]# jps
30692 SecondaryNameNode
31477 Jps
30437 NameNode
30903 TaskTracker
30775 JobTracker
30553 DataNode
[root@Centos mahout-distribution-0.6]# cd ..
关闭hadoop集群
[root@Centos hadoop-1.2.1]# bin/stop-all.sh 
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[root@Centos hadoop-1.2.1]# 
------------------------hadoop伪分布式安装成功------------------------
**********************************************************************
**********************************************************************
第二阶段:mahout的安装
			1.解压安装mahout
				[root@Centos hadoop-1.2.1]#  tar zxvf mahout-distribution-0.6.tar.gz 
			2.配置环境变量
					export HADOOP_HOME=/root/hadoop-1.2.1
					export HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
					export MAHOUT_HOME=/root/hadoop-1.2.1/mahoutdistribution-0.6
					export MAHOUT_CONF_DIR=/root/hadoop-1.2.1/mahoutdistribution-0.6/conf
					export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin
			3.测试mahout的启动
-------------------------------------------------------------------------
[root@Centos mahout-distribution-0.6]# cd ..
[root@Centos hadoop-1.2.1]# bin/stop-all.sh 
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
[root@Centos hadoop-1.2.1]# cd ..
You have mail in /var/spool/mail/root
[root@Centos ~]# cd ruanjian/
[root@Centos ruanjian]# tar zxvf
tar: 旧选项“f”需要参数。
请用“tar --help”或“tar --usage”获得更多信息。
[root@Centos ruanjian]# cd ..
[root@Centos ~]# cd hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# export HADOOP_HOME=/root/hadoop-1.2.1
[root@Centos hadoop-1.2.1]# export HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
[root@Centos hadoop-1.2.1]# export MAHOUT_HOME=/root/hadoop-1.2.1/mahoutdistribution-0.6
[root@Centos hadoop-1.2.1]# export MAHOUT_CONF_DIR=/root/hadoop-1.2.1/mahoutdistribution-0.6/conf
[root@Centos hadoop-1.2.1]# export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# bin/mahout 
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
An example program must be given as the first argument.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  dirichlet: : Dirichlet Clustering
  eigencuts: : Eigencuts spectral clustering
  evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  fpg: : Frequent Pattern Growth
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lda: : Latent Dirchlet Allocation
  ldatopics: : LDA Print Topics
  lucene.vector: : Generate Vectors from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  meanshift: : Mean Shift clustering
  minhash: : Run Minhash clustering
  pagerank: : compute the PageRank of a graph
  parallelALS: : ALS-WR factorization of a rating matrix
  prepare20newsgroups: : Reformat 20 newsgroups data
  randomwalkwithrestart: : compute all other vertices‘ proximity to a source vertex in a graph
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per line basis based on regular expressions
  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  svd: : Lanczos Singular Value Decomposition
  testclassifier: : Test the text based Bayes Classifier
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainclassifier: : Train the text based Bayes Classifier
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaDataSetCreator: : Splits data set of wikipedia wrt feature like country
  wikipediaXMLSplitter: : Reads wikipedia data and creates ch
[root@Centos mahout-distribution-0.6]# 
**********An example program must be given as the first argument.******出现则表示mahout安装成功
--------------------------------mahout安装成功-----------------------------------------------------------------------
第三阶段:20newsgroups的bayes算法测试
			1.解压20newsgroups的压缩包
				1.在根目录下创建data目录将下载的20newsgroups文件进行解压
				----------------------------------------------------------------------
				[root@Centos mahout-distribution-0.6]# cd ..
				[root@Centos hadoop-1.2.1]# cd ..
				[root@Centos ~]# mkdir data
				[root@Centos ~]# ls
				anaconda-ks.cfg  install.log         ruanjian  视频  下载
				data             install.log.syslog  公共的    图片  音乐
				hadoop-1.2.1     jdk1.8.0_65         模板      文档  桌面
				[root@Centos ~]# cd data/
				[root@Centos data]# ls
				20news-bydate.tar.gz
				[root@Centos data]# tar zxvf
				tar: 旧选项“f”需要参数。
				请用“tar --help”或“tar --usage”获得更多信息。
				[root@Centos data]# tar zxvf 20news-bydate.tar.gz 
				[root@Centos data]# ls
				20news-bydate.tar.gz  20news-bydate-test  20news-bydate-train
				[root@Centos data]# 
				-----------------------------------------------------------------------------------
				2.启动mahout
				----------------------------------------------------------------------------------
				[root@Centos data]# cd /root/hadoop-1.2.1/mahout-distribution-0.6/
				[root@Centos mahout-distribution-0.6]# jps
				34338 Jps
				[root@Centos mahout-distribution-0.6]# cd ..
				[root@Centos hadoop-1.2.1]# bin/start-all.sh 
				Warning: $HADOOP_HOME is deprecated.
				starting namenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-Centos.out
				localhost: starting datanode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-Centos.out
				localhost: starting secondarynamenode, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-Centos.out
				starting jobtracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-Centos.out
				localhost: starting tasktracker, logging to /root/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-Centos.out
				[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
				[root@Centos mahout-distribution-0.6]# jps
				34979 Jps
				34757 JobTracker
				34886 TaskTracker
				34663 SecondaryNameNode
				34408 NameNode
				34524 DataNode
				[root@Centos mahout-distribution-0.6]# bin/mahout 
				-------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------
************************************************************************************************************************
贝叶斯算法测试-----20newsgroups的文本自动分类
		第一步:建立训练集
		bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
		-p /root/data/20news-bydate-train \
		-o /root/data/bayes-test-input \
		-a org.apache.mahout.vectorizer.DefaultAnalyzer \
		-c UTF-8
		bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
		-p /root/data/20news-bydate-train \
		-o /root/data/bayes-train-input \
		-a org.apache.mahout.vectorizer.DefaultAnalyzer \
		-c UTF-8
			-----------------------------------------------------------------------------------------------------
			建立训练集
			------------------------------------------------------------------------------------------------------
			[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
			> -p /root/data/20news-bydate-train \
			> -o /root/data/bayes-test-input \
			> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
			> 
			MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
			Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
			HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
			MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
			Warning: $HADOOP_HOME is deprecated.
			16/04/04 08:59:20 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
			Usage:                                                                          
			 [--analyzerName <analyzerName> --charset <charset> --outputDir <outputDir>     
			--parent <parent> --help]                                                       
			Options                                                                         
			  --analyzerName (-a) analyzerName    The class name of the analyzer            
			  --charset (-c) charset              The name of the character encoding of the 
			                                      input files                               
			  --outputDir (-o) outputDir          The output directory                      
			  --parent (-p) parent                Parent dir containing the newsgroups      
			  --help (-h)                         Print out help                            
			16/04/04 08:59:20 INFO driver.MahoutDriver: Program took 167 ms (Minutes: 0.0027833333333333334)
			[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
			> -p /root/data/20news-bydate-train \
			> -o /root/data/bayes-test-input \
			> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
			> -c UTF-8
			MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
			Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
			HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
			MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
			Warning: $HADOOP_HOME is deprecated.
			16/04/04 08:59:41 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
			16/04/04 09:00:29 INFO driver.MahoutDriver: Program took 47897 ms (Minutes: 0.7982833333333333)
			[root@Centos mahout-distribution-0.6]# bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups \
			> -p /root/data/20news-bydate-train \
			> -o /root/data/bayes-train-input \
			> -a org.apache.mahout.vectorizer.DefaultAnalyzer \
			> -c UTF-8
			MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
			Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
			HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
			MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
			Warning: $HADOOP_HOME is deprecated.
			16/04/04 09:01:07 WARN driver.MahoutDriver: No org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.props found on classpath, will use command-line arguments only
			16/04/04 09:01:27 INFO driver.MahoutDriver: Program took 19347 ms (Minutes: 0.32245)
			---------------查看输出文件
			[root@Centos mahout-distribution-0.6]# cd ..
			[root@Centos hadoop-1.2.1]# cd ..
			[root@Centos ~]# cd data
			[root@Centos data]# ls
			20news-bydate.tar.gz  20news-bydate-train  bayes-train-input
			20news-bydate-test    bayes-test-input
			[root@Centos data]# 
			-------------------bayes-test-input----bayes-train-input-----------------训练集建立成功---
	
	
	第二步:上传到HDFS
		建立上传文件夹:	bin/hadoop fs -mkdir  20news
		上传到HDFS:        bin/hadoop fs -put 本地目录  20news
		查看:              bin/hadoop fs -ls
							bin/hadoop fs -ls  /20news
-----------------------------------------------------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# cd /root/hadoop-1.2.1/
[root@Centos hadoop-1.2.1]# bin/hadoop fs -mkdir  20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls
Warning: $HADOOP_HOME is deprecated.
Found 1 items
drwxr-xr-x   - root supergroup          0 2016-04-04 09:08 /user/root/20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -put ../data/bayes-train-input/ ./20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls  20news
Warning: $HADOOP_HOME is deprecated.
Found 1 items
drwxr-xr-x   - root supergroup          0 2016-04-04 09:08 /user/root/20news/bayes-train-input
[root@Centos hadoop-1.2.1]# bin/hadoop fs -put ../data/bayes-test-input/ ./20news
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls  20news
Warning: $HADOOP_HOME is deprecated.
Found 2 items
drwxr-xr-x   - root supergroup          0 2016-04-04 09:08 /user/root/20news/bayes-test-input
drwxr-xr-x   - root supergroup          0 2016-04-04 09:08 /user/root/20news/bayes-train-input
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls  20news/bayes-train-input
Warning: $HADOOP_HOME is deprecated.
Found 20 items
-rw-r--r--   1 root supergroup     773301 2016-04-04 09:08 /user/root/20news/bayes-train-input/alt.atheism.txt
-rw-r--r--   1 root supergroup     687018 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.graphics.txt
-rw-r--r--   1 root supergroup    1371301 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.os.ms-windows.misc.txt
-rw-r--r--   1 root supergroup     605082 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.sys.ibm.pc.hardware.txt
-rw-r--r--   1 root supergroup     539488 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.sys.mac.hardware.txt
-rw-r--r--   1 root supergroup     924668 2016-04-04 09:08 /user/root/20news/bayes-train-input/comp.windows.x.txt
-rw-r--r--   1 root supergroup     457202 2016-04-04 09:08 /user/root/20news/bayes-train-input/misc.forsale.txt
-rw-r--r--   1 root supergroup     649942 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.autos.txt
-rw-r--r--   1 root supergroup     610103 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.motorcycles.txt
-rw-r--r--   1 root supergroup     648313 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.sport.baseball.txt
-rw-r--r--   1 root supergroup     870760 2016-04-04 09:08 /user/root/20news/bayes-train-input/rec.sport.hockey.txt
-rw-r--r--   1 root supergroup    1139592 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.crypt.txt
-rw-r--r--   1 root supergroup     616166 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.electronics.txt
-rw-r--r--   1 root supergroup     901841 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.med.txt
-rw-r--r--   1 root supergroup     913047 2016-04-04 09:08 /user/root/20news/bayes-train-input/sci.space.txt
-rw-r--r--   1 root supergroup    1004842 2016-04-04 09:08 /user/root/20news/bayes-train-input/soc.religion.christian.txt
-rw-r--r--   1 root supergroup     973157 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.guns.txt
-rw-r--r--   1 root supergroup    1317255 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.mideast.txt
-rw-r--r--   1 root supergroup     980920 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.politics.misc.txt
-rw-r--r--   1 root supergroup     623882 2016-04-04 09:08 /user/root/20news/bayes-train-input/talk.religion.misc.txt
[root@Centos hadoop-1.2.1]# bin/hadoop fs -ls  20news/bayes-test-input
Warning: $HADOOP_HOME is deprecated.
Found 20 items
-rw-r--r--   1 root supergroup     773301 2016-04-04 09:08 /user/root/20news/bayes-test-input/alt.atheism.txt
-rw-r--r--   1 root supergroup     687018 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.graphics.txt
-rw-r--r--   1 root supergroup    1371301 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.os.ms-windows.misc.txt
-rw-r--r--   1 root supergroup     605082 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.sys.ibm.pc.hardware.txt
-rw-r--r--   1 root supergroup     539488 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.sys.mac.hardware.txt
-rw-r--r--   1 root supergroup     924668 2016-04-04 09:08 /user/root/20news/bayes-test-input/comp.windows.x.txt
-rw-r--r--   1 root supergroup     457202 2016-04-04 09:08 /user/root/20news/bayes-test-input/misc.forsale.txt
-rw-r--r--   1 root supergroup     649942 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.autos.txt
-rw-r--r--   1 root supergroup     610103 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.motorcycles.txt
-rw-r--r--   1 root supergroup     648313 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.sport.baseball.txt
-rw-r--r--   1 root supergroup     870760 2016-04-04 09:08 /user/root/20news/bayes-test-input/rec.sport.hockey.txt
-rw-r--r--   1 root supergroup    1139592 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.crypt.txt
-rw-r--r--   1 root supergroup     616166 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.electronics.txt
-rw-r--r--   1 root supergroup     901841 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.med.txt
-rw-r--r--   1 root supergroup     913047 2016-04-04 09:08 /user/root/20news/bayes-test-input/sci.space.txt
-rw-r--r--   1 root supergroup    1004842 2016-04-04 09:08 /user/root/20news/bayes-test-input/soc.religion.christian.txt
-rw-r--r--   1 root supergroup     973157 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.guns.txt
-rw-r--r--   1 root supergroup    1317255 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.mideast.txt
-rw-r--r--   1 root supergroup     980920 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.politics.misc.txt
-rw-r--r--   1 root supergroup     623882 2016-04-04 09:08 /user/root/20news/bayes-test-input/talk.religion.misc.txt
[root@Centos hadoop-1.2.1]# bin/hadoop fs -cat 20news/bayes-train-input/talk.politics.misc.txt
rce most part uninformed ignorant public democracy i don‘t think so society‘s sense justice judged basis treatment people who make up society all those people yes includes gays lesbians bisexuals whose crimes have victims who varied diverse society wich part frank jordan d d d c c c gay arab bassoonists unite 
talk.politics.misc	from steveh thor.isc br.com steve hendricks subject re limiting govt re employment re why concentrate summary promoting competition does depend upon libertarians organization free barbers inc lines 60 nntp posting host thor.isc br.com article c5kh8g 961 cbnewse.cb.att.com doctor1 cbnewse.cb.att.com patrick.b.hailey writes article 1993apr15.170731.8797 isc br.isc br.com steveh thor.isc br.com steve hendricks writes two paragraphs from two different posts splicing them together my intention change steve‘s meaning misrepresent him any way i don‘t think i‘ve done so noted another thread limiting govt problem libertarians face insuring limited government seek does become tool private interests pursue own agenda failure libertarianism ideology does provide any reasonable way restrain actions other than utopian dreams just marxism fails specify how pure communism achieved state wither away libertarians frequently fail show how weakening power state result improvement human condition patrick‘s example anti competitive regulations auto dealers deleted here‘s what i see libertarianism offering you does seem me utopian dream basic human decency common sense real grass roots example freedom liberty yes having few people acting our masters approving rejecting each our basic transactions each other does strike me wonderful way improve human condition thanks awfully patrick let me try drag discussion back original issues i‘ve noted before i‘m necessarily disputing benefits eliminating anti competitive legislation regard auto dealers barbers etc one need however swallow entire libertarian agenda accomplish end just because one grants benefits allowing anyone who wishes cut hair sell his her services without regulation does mean same unregulated barbers should free bleed people medical service without government intervention some many libertarians would argue case case basis cost benefit ratio government regulation obviously worthwhile libertarian agenda however does call assessment assumes costs regulation any kind always outweigh its benefits approach avoids all sorts difficult analysis strikes many rest us dogmatic say least i have objection analysis medical care education national defense local police suggests free market can provide more effective efficient means accomplishing social obj
第三步:训练贝叶斯分类器
	1.模型训练,已经上传了训练文本集,然后依据训练文本集来训练贝叶斯分类器模型。
	解释一下命令:-i 表示训练集的输入路径,HDFS路径。 -o分类模型输出路径 -type 分类器类型,这里使用bayes,可选cbayes -ng n-gram建模的大小,默认为1 -source
	 数据源的位置,HDFS或HBase 后面的测试也是一样的。
	bin/mahout trainclassifier \
	-i /user/root/20news/bayes-train-input \
	-o /user/root/20news/newsmodel \
	-type cbayes \
	-ng 2 \
	-source hdfs
---------------------------------------------------------------------------------------------------------------
[root@Centos hadoop-1.2.1]# cd mahout-distribution-0.6/
[root@Centos mahout-distribution-0.6]# bin/mahout 
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
[root@Centos mahout-distribution-0.6]# bin/mahout trainclassifier \
> -i /user/root/20news/bayes-train-input \
> -o /user/root/20news/newsmodel \
> -type cbayes \
> -ng 2 \
> -source hdfs
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
16/04/04 09:21:58 WARN driver.MahoutDriver: No trainclassifier.props found on classpath, will use command-line arguments only
16/04/04 09:21:58 INFO bayes.TrainClassifier: Training Complementary Bayes Classifier
16/04/04 09:21:59 INFO cbayes.CBayesDriver: Reading features...
16/04/04 09:22:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/04/04 09:22:02 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/04/04 09:22:02 WARN snappy.LoadSnappy: Snappy native library not loaded
16/04/04 09:22:02 INFO mapred.FileInputFormat: Total input paths to process : 20
16/04/04 09:22:04 INFO mapred.JobClient: Running job: job_201604040854_0001
16/04/04 09:22:05 INFO mapred.JobClient:  map 0% reduce 0%
16/04/04 09:22:48 INFO mapred.JobClient:  map 1% reduce 0%
16/04/04 09:22:49 INFO mapred.JobClient:  map 2% reduce 0%
16/04/04 09:23:11 INFO mapred.JobClient:  map 3% reduce 0%
16/04/04 09:23:12 INFO mapred.JobClient:  map 4% reduce 0%
····································
··········································
···················································
16/04/04 10:04:11 INFO mapred.JobClient: Job complete: job_201604040854_0004
16/04/04 10:04:11 INFO mapred.JobClient: Counters: 30
16/04/04 10:04:11 INFO mapred.JobClient:   Map-Reduce Framework
16/04/04 10:04:11 INFO mapred.JobClient:     Spilled Records=4309
16/04/04 10:04:12 INFO mapred.JobClient:     Map output materialized bytes=1473
16/04/04 10:04:12 INFO mapred.JobClient:     Reduce input records=41
16/04/04 10:04:12 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=7733702656
16/04/04 10:04:12 INFO mapred.JobClient:     Map input records=3146637
16/04/04 10:04:12 INFO mapred.JobClient:     SPLIT_RAW_BYTES=416
16/04/04 10:04:12 INFO mapred.JobClient:     Map output bytes=965613985
16/04/04 10:04:12 INFO mapred.JobClient:     Reduce shuffle bytes=1473
16/04/04 10:04:12 INFO mapred.JobClient:     Physical memory (bytes) snapshot=682602496
16/04/04 10:04:12 INFO mapred.JobClient:     Map input bytes=150138778
16/04/04 10:04:12 INFO mapred.JobClient:     Reduce input groups=20
16/04/04 10:04:12 INFO mapred.JobClient:     Combine output records=2128
16/04/04 10:04:12 INFO mapred.JobClient:     Reduce output records=20
16/04/04 10:04:12 INFO mapred.JobClient:     Map output records=28673441
16/04/04 10:04:12 INFO mapred.JobClient:     Combine input records=28675528
16/04/04 10:04:12 INFO mapred.JobClient:     CPU time spent (ms)=210830
16/04/04 10:04:12 INFO mapred.JobClient:     Total committed heap usage (bytes)=498544640
16/04/04 10:04:12 INFO mapred.JobClient:   File Input Format Counters 
16/04/04 10:04:12 INFO mapred.JobClient:     Bytes Read=150140285
16/04/04 10:04:12 INFO mapred.JobClient:   FileSystemCounters
16/04/04 10:04:12 INFO mapred.JobClient:     HDFS_BYTES_READ=150140770
16/04/04 10:04:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=383730
16/04/04 10:04:12 INFO mapred.JobClient:     FILE_BYTES_READ=152894
16/04/04 10:04:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=932
16/04/04 10:04:12 INFO mapred.JobClient:   File Output Format Counters 
16/04/04 10:04:12 INFO mapred.JobClient:     Bytes Written=932
16/04/04 10:04:12 INFO mapred.JobClient:   Job Counters 
16/04/04 10:04:12 INFO mapred.JobClient:     Launched map tasks=3
16/04/04 10:04:12 INFO mapred.JobClient:     Launched reduce tasks=1
16/04/04 10:04:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=214633
16/04/04 10:04:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/04/04 10:04:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=320403
16/04/04 10:04:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/04/04 10:04:12 INFO mapred.JobClient:     Data-local map tasks=3
16/04/04 10:04:14 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-docCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-termDocCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-featureCount
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-wordFreq
16/04/04 10:04:15 INFO common.HadoopUtil: Deleting /user/root/20news/newsmodel/trainer-tfIdf/trainer-vocabCount
16/04/04 10:04:16 INFO driver.MahoutDriver: Program took 2537700 ms (Minutes: 42.29723333333333)
[root@Centos mahout-distribution-0.6]# 
------------------------------------------------------------------------------------------------------------------------
第四步测试贝叶斯模型	  
	bin/mahout testclassifier \
	 -m /user/root/20news/newsmodel \
	 -d /user/root/20news/bayes-test-input \
	 -type cbayes \
	 -ng 2 \
	 -source hdfs \
	 -method mapreduce
---------------------------------------------------------------------------------
第四步:生成模型
第五步:测试贝叶斯分类器
---------------------------------
[root@Centos mahout-distribution-0.6]# bin/mahout testclassifier \
> -m /user/root/20news/newtestsmodel \
> -d /user/root/20news/bayes-test-input \
> -type cbayes \
> -ng 2 \
> -source hdfs \
> -method mapreduce
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/root/hadoop-1.2.1
HADOOP_CONF_DIR=/root/hadoop-1.2.1/conf
MAHOUT-JOB: /root/hadoop-1.2.1/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
16/04/04 14:10:54 WARN driver.MahoutDriver: No testclassifier.props found on classpath, will use command-line arguments only
16/04/04 14:10:56 INFO common.HadoopUtil: Deleting /user/root/20news/bayes-test-input-output
16/04/04 14:10:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/04/04 14:11:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/04/04 14:11:00 WARN snappy.LoadSnappy: Snappy native library not loaded
16/04/04 14:11:00 INFO mapred.FileInputFormat: Total input paths to process : 20
16/04/04 14:11:02 INFO mapred.JobClient: Running job: job_201604040854_0011
16/04/04 14:11:03 INFO mapred.JobClient:  map 0% reduce 0%
16/04/04 14:11:47 INFO mapred.JobClient:  map 5% reduce 0%
16/04/04 14:11:52 INFO mapred.JobClient:  map 10% reduce 0%
16/04/04 14:12:33 INFO mapred.JobClient:  map 19% reduce 0%
16/04/04 14:12:45 INFO mapred.JobClient:  map 19% reduce 6%
16/04/04 14:12:58 INFO mapred.JobClient:  map 29% reduce 6%
16/04/04 14:13:09 INFO mapred.JobClient:  map 29% reduce 10%
16/04/04 14:13:36 INFO mapred.JobClient:  map 39% reduce 10%
16/04/04 14:13:45 INFO mapred.JobClient:  map 39% reduce 13%
16/04/04 14:13:53 INFO mapred.JobClient:  map 44% reduce 13%
16/04/04 14:13:54 INFO mapred.JobClient:  map 50% reduce 13%
16/04/04 14:14:01 INFO mapred.JobClient:  map 50% reduce 16%
16/04/04 14:14:03 INFO mapred.JobClient:  map 55% reduce 16%
16/04/04 14:14:04 INFO mapred.JobClient:  map 60% reduce 16%
16/04/04 14:14:11 INFO mapred.JobClient:  map 60% reduce 20%
16/04/04 14:14:22 INFO mapred.JobClient:  map 70% reduce 20%
16/04/04 14:14:31 INFO mapred.JobClient:  map 70% reduce 23%
16/04/04 14:14:34 INFO mapred.JobClient:  map 80% reduce 23%
16/04/04 14:14:41 INFO mapred.JobClient:  map 80% reduce 26%
16/04/04 14:14:43 INFO mapred.JobClient:  map 85% reduce 26%
16/04/04 14:14:44 INFO mapred.JobClient:  map 90% reduce 26%
16/04/04 14:14:47 INFO mapred.JobClient:  map 90% reduce 30%
16/04/04 14:14:52 INFO mapred.JobClient:  map 95% reduce 30%
16/04/04 14:14:53 INFO mapred.JobClient:  map 100% reduce 30%
16/04/04 14:15:02 INFO mapred.JobClient:  map 100% reduce 66%
16/04/04 14:15:11 INFO mapred.JobClient:  map 100% reduce 100%
16/04/04 14:15:16 INFO mapred.JobClient: Job complete: job_201604040854_0011
16/04/04 14:15:28 INFO mapred.JobClient: Counters: 30
16/04/04 14:15:28 INFO mapred.JobClient:   Map-Reduce Framework
16/04/04 14:15:28 INFO mapred.JobClient:     Spilled Records=40
16/04/04 14:15:28 INFO mapred.JobClient:     Map output materialized bytes=993
16/04/04 14:15:28 INFO mapred.JobClient:     Reduce input records=20
16/04/04 14:15:28 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=40516427776
16/04/04 14:15:28 INFO mapred.JobClient:     Map input records=11314
16/04/04 14:15:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2573
16/04/04 14:15:28 INFO mapred.JobClient:     Map output bytes=470632
16/04/04 14:15:28 INFO mapred.JobClient:     Reduce shuffle bytes=993
16/04/04 14:15:28 INFO mapred.JobClient:     Physical memory (bytes) snapshot=4085964800
16/04/04 14:15:28 INFO mapred.JobClient:     Map input bytes=16607880
16/04/04 14:15:28 INFO mapred.JobClient:     Reduce input groups=20
16/04/04 14:15:28 INFO mapred.JobClient:     Combine output records=20
16/04/04 14:15:28 INFO mapred.JobClient:     Reduce output records=20
16/04/04 14:15:28 INFO mapred.JobClient:     Map output records=11314
16/04/04 14:15:28 INFO mapred.JobClient:     Combine input records=11314
16/04/04 14:15:28 INFO mapred.JobClient:     CPU time spent (ms)=34980
16/04/04 14:15:28 INFO mapred.JobClient:     Total committed heap usage (bytes)=3097051136
16/04/04 14:15:28 INFO mapred.JobClient:   File Input Format Counters 
16/04/04 14:15:28 INFO mapred.JobClient:     Bytes Read=16607880
16/04/04 14:15:28 INFO mapred.JobClient:   FileSystemCounters
16/04/04 14:15:28 INFO mapred.JobClient:     HDFS_BYTES_READ=16610453
16/04/04 14:15:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1166412
16/04/04 14:15:28 INFO mapred.JobClient:     FILE_BYTES_READ=879
16/04/04 14:15:28 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1092
16/04/04 14:15:28 INFO mapred.JobClient:   File Output Format Counters 
16/04/04 14:15:28 INFO mapred.JobClient:     Bytes Written=1092
16/04/04 14:15:28 INFO mapred.JobClient:   Job Counters 
16/04/04 14:15:28 INFO mapred.JobClient:     Launched map tasks=20
16/04/04 14:15:28 INFO mapred.JobClient:     Launched reduce tasks=1
16/04/04 14:15:28 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=195607
16/04/04 14:15:28 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/04/04 14:15:28 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=406966
16/04/04 14:15:28 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/04/04 14:15:28 INFO mapred.JobClient:     Data-local map tasks=20
16/04/04 14:15:38 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n    	o    	p    q    	r    	s    	t    	<--Classified as
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	a     = soc.religion.christian
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	b     = rec.autos
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	c     = talk.religion.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	d     = comp.windows.x
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	e     = rec.sport.baseball
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	f     = comp.graphics
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	g     = talk.politics.mideast
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	h     = comp.sys.ibm.pc.hardware
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	i     = sci.med
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	j     = comp.os.ms-windows.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	k     = sci.crypt
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	l     = comp.sys.mac.hardware
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	m     = misc.forsale
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	n     = rec.motorcycles
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	o     = talk.politics.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	p     = sci.electronics
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	q     = rec.sport.hockey
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	r     = sci.space
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	s     = alt.atheism
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    0    	0    	0    	0    	 |  0     	t     = talk.politics.guns
16/04/04 14:15:38 INFO driver.MahoutDriver: Program took 283133 ms (Minutes: 4.718883333333333)
-------------------------------
hadoop+mahout部署及20newsgroups经典算法测试
标签:
原文地址:http://www.cnblogs.com/learningforever/p/5350460.html