标签:
为了调查hadoop生态圈里的制品,特地的了解了一下RDBMS和hdfs之间数据的导入和导出工具,并且调查了一些其他同类的产品,得出来的结论是:都是基于sqoop做的二次开发或者说是webUI包装,实质还是用的sqoop。比如pentaho的PDI,Oracle的ODI,都是基于此,另外,Hortnetwork公司的sandbox,Hue公司的Hue webUI,coulder的coulder manger,做个就更不错了,差不多hadoop下的制品都集成了,部署也不是很复杂,还是很强大的。
apache sqoop现阶段分了2个系列制品,一个是sqoop1系列的,另一个是sqoop2系列的。相比较,sqoop1相对比较成熟,bug较少,但结构比较单一,现阶段的稳定版是1.4.6;sqoop2系列基于sqoop1的基础上,做了很大的改进,client跟server端分离,job跟connection做到了集成化管理,使用方面来看,比sqoop1简单多了,但部署比较复杂,且sqoop1不能跟sqoop2兼容,既存的一些应用脚本几乎要重写,但大的趋势来看,sqoop2将会变成主流。
#sqoop2从1.99.2以后,就没法将数据导入到hbase中,这一点,以后预定会在sqoop2.0.0这个稳定版中解决掉。
环境搭建依据官网的提示,这里着重说一下需要注意的是事项:
1.server/conf/sqoop.properties文件中需要修改的地方
org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/sqoop;create=true
这里的sqoop是事先在mysql这边创建的数据库,并赋予了权限:
create database sqoop ; create user sqoop identified by '123456'; grant privileges on sqoop.* to sqoop; flush privileges;
2.也是在sqoop.properties文件中修改hadoop的位置
org.apache.sqoop.Submission.engine.mapreduce.configuration.directory=your-hadoop-cluster-location
3.server/conf/catalina.properties文件中,追加hadoop/share下的所有lib文件。
common.loader=${Catalina.base}/lib,${CAtalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,your-hadoop-libs
4.【重要】修改hadoop的yarn-site.xml文件,追加如下信息:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
hadoop跟sqoop环境启动。
1.sqoop server以demaon启动后,会有如下信息:
[root@sv001 sqoop-1.99.3-bin-hadoop200]# ./bin/sqoop.sh server run
Sqoop home directory: /home/project/sqoop-1.99.3-bin-hadoop200
Setting SQOOP_HTTP_PORT: 12000
Setting SQOOP_ADMIN_PORT: 12001
Using CATALINA_OPTS:
Adding to CATALINA_OPTS: -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE: /home/project/sqoop-1.99.3-bin-hadoop200/server
Using CATALINA_HOME: /home/project/sqoop-1.99.3-bin-hadoop200/server
Using CATALINA_TMPDIR: /home/project/sqoop-1.99.3-bin-hadoop200/server/temp
Using JRE_HOME: /usr/java/jdk1.7.0_67
Using CLASSPATH: /home/project/sqoop-1.99.3-bin-hadoop200/server/bin/bootstrap.jar
May 11, 2016 6:56:00 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
May 11, 2016 6:56:00 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-12000
May 11, 2016 6:56:00 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 634 ms
May 11, 2016 6:56:00 PM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
May 11, 2016 6:56:00 PM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.36
May 11, 2016 6:56:00 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive sqoop.war
2016-05-11 18:56:00,972 INFO [main] core.SqoopServer (SqoopServer.java:initialize(47)) - Booting up Sqoop server
2016-05-11 18:56:00,979 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(96)) - Starting config file poller thread
log4j: Parsing for [root] with value=[WARN, file].
log4j: Level token is [WARN].
log4j: Category root set to WARN
log4j: Parsing appender named "file".
log4j: Parsing layout options for "file".
log4j: Setting property [conversionPattern] to [%d{ISO8601} %-5p %c{2} [%l] %m%n].
log4j: End of parsing for "file".
log4j: Setting property [file] to [@LOGDIR@/sqoop.log].
log4j: Setting property [maxBackupIndex] to [5].
log4j: Setting property [maxFileSize] to [25MB].
log4j: setFile called: @LOGDIR@/sqoop.log, true
log4j: setFile ended
log4j: Parsed "file" options.
log4j: Parsing for [org.apache.sqoop] with value=[DEBUG].
log4j: Level token is [DEBUG].
log4j: Category org.apache.sqoop set to DEBUG
log4j: Handling log4j.additivity.org.apache.sqoop=[null]
log4j: Parsing for [org.apache.derby] with value=[INFO].
log4j: Level token is [INFO].
log4j: Category org.apache.derby set to INFO
log4j: Handling log4j.additivity.org.apache.derby=[null]
log4j: Finished configuring.
log4j: Could not find root logger information. Is this OK?
log4j: Parsing for [default] with value=[INFO,defaultAppender].
log4j: Level token is [INFO].
log4j: Category default set to INFO
log4j: Parsing appender named "defaultAppender".
log4j: Parsing layout options for "defaultAppender".
log4j: Setting property [conversionPattern] to [%d %-5p %c: %m%n].
log4j: End of parsing for "defaultAppender".
log4j: Setting property [file] to [@LOGDIR@/default.audit].
log4j: setFile called: @LOGDIR@/default.audit, true
log4j: setFile ended
log4j: Parsed "defaultAppender" options.
log4j: Handling log4j.additivity.default=[null]
log4j: Finished configuring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/project/sqoop-1.99.3-bin-hadoop200/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/project/sqoop-1.99.3-bin-hadoop200/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/project/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
May 11, 2016 6:56:03 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory ROOT
May 11, 2016 6:56:03 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-12000
May 11, 2016 6:56:03 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 3605 ms
2.启动sqoop client
命令:sqoop.sh client
[root@sv001 sqoop-1.99.3-bin-hadoop200]# ./bin/sqoop.sh client Sqoop home directory: /home/project/sqoop-1.99.3-bin-hadoop200 Sqoop Shell: Type 'help' or '\h' for help. sqoop:000>
确认版本信息
sqoop:000> show version -all client version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 server version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 Protocol version: [1]
set server --host localhost --port 12000 --webapp sqoop
sqoop:000> set server --host localhost --port 12000 --webapp sqoop Server is set successfully
sqoop:000> create connection --cid 2 Creating connection for connector with id 2 Exception has occurred during processing command Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception sqoop:000> create connection --cid 1 Creating connection for connector with id 1 Please fill following values to create new connection object Name: test-mysql2hdfs Connection configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://<strong>your-mysql-ip</strong>:3306/<strong>sqoop</strong> Username: <strong>sqoop</strong> Password: <strong>******</strong> JDBC Connection Properties: There are currently 0 values in the map: entry# Security related configuration options Max connections: 10 New connection was successfully created with validation status FINE and persistent id 6
#加粗的部分要注意,是事前在mysql出准备的,且sqoop数据库跟sqoop.properties里的也要保持一致。
此处创建成功的connection是id=6
根据创建成功的connection来创建job【mysql-->HDFS】,如下
sqoop:000> create job --xid 6 --type import Creating job for connection with id 6 Please fill following values to create new job object Name: importmysql2hdfs Database configuration Schema name: sqoop Table name: t1 Table SQL statement: Table column names: Partition column name: id Nulls in partition column: Boundary query: Output configuration Storage type: 0 : HDFS Choose: 0 Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY Choose: 0 Output directory: /sqoopuse Throttling resources Extractors: Loaders: New job was successfully created with validation status FINE and persistent id 4
mysql> select * from t1; +------+---------+----------+ | id | int_col | char_col | +------+---------+----------+ | 2 | 2 | b | | 4 | 4 | d | | 1 | 1 | a | | 3 | 3 | c | +------+---------+----------+ 4 rows in set (0.00 sec)
且job的id=4
创建job【hdfs-->mysql】
sqoop:000> create job --xid 4 --type export Creating job for connection with id 4 Please fill following values to create new job object Name: hdfs2mysqlInfo Database configuration Schema name: sqoop Table name: t1 Table SQL statement: Table column names: Stage table name: Clear stage table: Input configuration Input directory: /sqoopuse Throttling resources Extractors: Loaders: New job was successfully created with validation status FINE and persistent id 11
1.启动job【mysql-->hdfs】
sqoop:000> start job --jid 4
Submission details
Job ID: 4
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:19:53 JST
Lastly updated by: root
External ID: job_1462962692840_0001
http://sv004:8088/proxy/application_1462962692840_0001/
2016-05-11 19:19:53 JST: BOOTING - Progress is not availablesqoop:000> status job --jid 4
Submission details
Job ID: 4
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:37:16 JST
Lastly updated by: root
External ID: job_1462962692840_0001
http://sv004:8088/proxy/application_1462962692840_0001/
2016-05-11 19:37:57 JST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 38212
MB_MILLIS_MAPS: 39129088
TOTAL_LAUNCHED_MAPS: 3
MILLIS_MAPS: 38212
VCORES_MILLIS_MAPS: 38212
SLOTS_MILLIS_REDUCES: 0
OTHER_LOCAL_MAPS: 3
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 32
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 497262592
SPILLED_RECORDS: 0
FAILED_SHUFFLE: 0
CPU_MILLISECONDS: 3520
COMMITTED_HEAP_BYTES: 603979776
VIRTUAL_MEMORY_BYTES: 2741444608
MAP_OUTPUT_RECORDS: 4
SPLIT_RAW_BYTES: 346
GC_TIME_MILLIS: 96
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_READ_OPS: 0
FILE_WRITE_OPS: 0
FILE_BYTES_READ: 0
FILE_LARGE_READ_OPS: 0
HDFS_BYTES_READ: 346
FILE_BYTES_WRITTEN: 318117
HDFS_LARGE_READ_OPS: 0
HDFS_BYTES_WRITTEN: 32
HDFS_READ_OPS: 12
HDFS_WRITE_OPS: 6
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 4
Job executed successfully[root@sv001 bin]# ./hadoop fs -ls /sqoopuse 16/05/11 19:43:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 4 items -rw-r--r-- 3 root supergroup 0 2016-05-11 19:37 /sqoopuse/_SUCCESS -rw-r--r-- 3 root supergroup 8 2016-05-11 19:37 /sqoopuse/part-m-00000 -rw-r--r-- 3 root supergroup 8 2016-05-11 19:37 /sqoopuse/part-m-00001 -rw-r--r-- 3 root supergroup 16 2016-05-11 19:37 /sqoopuse/part-m-00002
[root@sv001 bin]# ./hadoop fs -cat /sqoopuse/part* 16/05/11 19:43:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1,1,'a' 2,2,'b' 4,4,'d' 3,3,'c'
测试job【hdfs-->mysql】
1.清空mysql侧的数据
mysql> select * from t1; +------+---------+----------+ | id | int_col | char_col | +------+---------+----------+ | 2 | 2 | b | | 4 | 4 | d | | 1 | 1 | a | | 3 | 3 | c | +------+---------+----------+ 4 rows in set (0.00 sec) mysql> delete from t1; Query OK, 4 rows affected (0.27 sec) mysql> select * from t1; Empty set (0.00 sec) mysql>
sqoop:000> start job --jid 11
Submission details
Job ID: 11
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:50:42 JST
Lastly updated by: root
External ID: job_1462962692840_0002
http://sv004:8088/proxy/application_1462962692840_0002/
2016-05-11 19:50:42 JST: BOOTING - Progress is not availablesqoop:000> status job --jid 11
Submission details
Job ID: 11
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-05-11 19:50:42 JST
Lastly updated by: root
External ID: job_1462962692840_0002
http://sv004:8088/proxy/application_1462962692840_0002/
2016-05-11 19:51:39 JST: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 204363
MB_MILLIS_MAPS: 209267712
TOTAL_LAUNCHED_MAPS: 8
MILLIS_MAPS: 204363
VCORES_MILLIS_MAPS: 204363
SLOTS_MILLIS_REDUCES: 0
OTHER_LOCAL_MAPS: 8
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 1327665152
SPILLED_RECORDS: 0
COMMITTED_HEAP_BYTES: 1610612736
CPU_MILLISECONDS: 7590
FAILED_SHUFFLE: 0
VIRTUAL_MEMORY_BYTES: 7262990336
SPLIT_RAW_BYTES: 1224
MAP_OUTPUT_RECORDS: 4
GC_TIME_MILLIS: 316
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_WRITE_OPS: 0
FILE_READ_OPS: 0
FILE_LARGE_READ_OPS: 0
FILE_BYTES_READ: 0
HDFS_BYTES_READ: 1320
FILE_BYTES_WRITTEN: 839664
HDFS_LARGE_READ_OPS: 0
HDFS_WRITE_OPS: 0
HDFS_READ_OPS: 32
HDFS_BYTES_WRITTEN: 0
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 4
Job executed successfullymysql> select * from t1; +------+---------+----------+ | id | int_col | char_col | +------+---------+----------+ | 2 | 2 | b | | 4 | 4 | d | | 1 | 1 | a | | 3 | 3 | c | +------+---------+----------+ 4 rows in set (0.00 sec) mysql> delete from t1; Query OK, 4 rows affected (0.27 sec) mysql> select * from t1; Empty set (0.00 sec) mysql> select * from t1; <--------select确认 +------+---------+----------+ | id | int_col | char_col | +------+---------+----------+ | 1 | 1 | a | | 2 | 2 | b | | 4 | 4 | d | | 3 | 3 | c | +------+---------+----------+ 4 rows in set (0.00 sec)
导出成功,且没有发生数据丢失。
---over----
apache sqoop1.99.3+hadoop2.5.2+mysql5.0.7环境构筑以及数据导入导出
标签:
原文地址:http://blog.csdn.net/huyangshu87/article/details/51372495