码迷,mamicode.com
首页 > 其他好文 > 详细

Data Transfter

时间:2021-01-05 10:36:37      阅读:0      评论:0      收藏:0      [点我收藏+]

标签:traffic   test   links   ensure   sse   main   today   time   tar   

There are many ways to load text data file into the database. In general, we would use applications that database service providers‘ product. For example, in Oracle, we have sqlldr, or use sqlcmd in MSSQL. This practice was based on the case of loading 30GB text data file into a database, which has details in the links in the reference section.

 

 

Today, we would like to explorer a Pro*C/C++ option to import data into the database. The application was built on Oracle database. Below are the elements that used in the application.

1. Multi-threads application

2. Stack check to ensure we have enough memory to run the application.

3. Connection pool

We have following specfied format of the text file.

AA||BB||CC||DD

601330||8.43||0.08||2.45

600751||2.64||0.05||16.81

603939||95.34||2.31||3.41

601702||8.42||0.19||44.35

There are 4 columns, and each column are separated by “||”. This time, we have 4 text files named “sameplefile0001.txt”, “sameplefile0002.txt”,“sameplefile0003.txt”,“sameplefile0004.txt”. Each file has about 15MB in size, which is 584778 lines of data.

 

We use 4 threads to load the data. It means that each thread processes one text file. The connection pool parameter is used; there are 2 connections when the application starts. In total, we have 4 connections. Oracle offers connection pool technology, which is to improve client side performance. In this case, this technology would not provide that much performance gain due to the nature of the application, which is mainly I/O read from files and write to the database. On contrary, it is more like a resource controller.

 

We load data in batches. The application has the capability of processing data with full batch and leftover operations. Data conversion is in the process. However, it can also be done in many ways, such as converting data types via functions coded by C/C++, Java, or PL/SQL after they were in the database.

 

Two tests had been completed. The first one is in a small scope:

There are 25 lines/batch, and 4 files; each file has about 50 lines of data. The total time taken of the program execution was 1.097034 seconds.

The second test had above detailed parameters with 50000 lines/batch; and total time taken was 35.124050 seconds.

 

We can use this type of loading data in different workstations or different time frames. The one on server would have a better performance because it would not be impacted by any network traffic. We also could have more numbers of threads or connections increased, that would lead to shorter loading time, but will required more resources like CPU, memory, faster disks.

 


Reference section:

https://www.cnblogs.com/goodAndyxublog/p/14182457.html

https://studyidea.cn/import_large_file

 

Data Transfter

标签:traffic   test   links   ensure   sse   main   today   time   tar   

原文地址:https://www.cnblogs.com/zwang562020/p/14218466.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!