SQOOP 导出Hive数据到MySQL

时间：2021-06-25 17:26:06 阅读：0 评论：0 收藏：0 [点我收藏+]

标签：hive tail 原创 ble 相同 user 链接 username term

基本知识：
Sqoop导出的基本用法：https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_syntax_4 的10. sqoop-export

内容摘要：
本文主要是对--update-mode参数的用法进行了验证。结论如下：

--update-mode模式有两种updateonly（默认）和allowinsert

updateonly：该模式用于更新Hive表与目标表中数据的不一致，即在不一致时，将Hive中的数据同步给目标表（如MySQL、Oracle等的目标表中），这种不一致是指，一条记录中的不一致，比如Hive表和MySQL中都有一个id=1的记录，但是其中一个字段的取值不同，则该模式会将这种差异抹除。对于“你有我无”的记录则“置之不理”。

allowinsert：该模式用于将Hive中有但目标表中无的记录同步到目标表中，但同时也会同步不一致的记录。可以这种模式可以包含updateony模式的操作，这也是为什么没有命名为insertonly的原因吧。

测试场景一：全量导出
1. 准备原始数据：

为简化处理，先在MySQL中创建原始数据表wht_test1，并添加测试数据，如下所示：

2. 将原始表中的数据导入到Hive中。

sqoop import --connectjdbc:mysql://localhost:3306/wht --username root --password cloudera --tablewht_test1 --fields-terminated-by ‘,‘ --hive-import --hive-table default.wht_test1 --hive-overwrite -m 1

执行完该操作后，导入的数据在HDFS的/user/hive/warehouse/wht_test1目录下。

3. 创建导出表。

在MySQL中创建结构相同的表，用于导出数据：

CREATE TABLEwht_test2 LIKE wht_test1;

4. 从Hive（HDFS）导出数据。

sqoop export --connectjdbc:mysql://localhost:3306/wht --username root --password cloudera --tablewht_test2 --fields-terminated-by ‘,‘ --export-dir /user/hive/warehouse/wht_test1

执行完该操作后，MySQL的wht_test2表中插入了Hive中的数据，如下所示：

测试场景二：增量导出，在源数据中增加2条记录，查看不同模式导出结果
1. 编辑HDFS中的数据文件，添加两行新的记录，编辑后的文件内容如下所示：

? updateonly模式：

sqoop export --connectjdbc:mysql://localhost:3306/wht --username root --password cloudera --tablewht_test2 --fields-terminated-by ‘,‘ --update-key c_id --export-dir /user/hive/warehouse/wht_test1

查看结果，可以看出Updateonly模式不能导出新增数据：

? allowinsert模式：

sqoop export --connectjdbc:mysql://localhost:3306/wht --username root --password cloudera --tablewht_test2 --fields-terminated-by ‘,‘ --update-key c_id --update-mode allowinsert --export-dir /user/hive/warehouse/wht_test1

查看结果，新增数据被导出：

测试场景三：修改Hive表数据，修改age的值，并新增一行记录，然后重新导出，看目标表中的数据是否会被修改
1. 编辑HDFS中的数据文件，编辑后的文件内容如下所示：

sqoop export --connectjdbc:mysql://localhost:3306/wht --username root --password cloudera --tablewht_test2 --fields-terminated-by ‘,‘ --update-key c_id --update-mode updateonly --export-dir/user/hive/warehouse/wht_test1

查看结果，Hive表中修改的数据被更新，但updateonly模式不会导出新插入的记录：

测试场景三：allowinsert模式（导出不同HDFS源文件中的新增数据）
Hive表可能有多个分区，在此新增一个目录，并保存结构相同的数据，使用allowinsert模式查看导出结果。

查看结果，新增数据被导出：

新增数据目录wht_test1_part，该目录下的数据文件如下所示：

执行导出命令：sqoop export --connect jdbc:mysql://localhost:3306/wht --usernameroot --password cloudera --table wht_test2 --fields-terminated-by ‘,‘ --update-key c_id --update-mode allowinsert --export-dir/user/hive/warehouse/wht_test1_part

查看导出结果，可以看出新增数据被导出：

————————————————
版权声明：本文为CSDN博主「汀桦坞」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/wiborgite/article/details/80958201

SQOOP 导出Hive数据到MySQL

标签：hive tail 原创 ble 相同 user 链接 username term

原文地址：https://www.cnblogs.com/javalinux/p/14930927.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行