码迷,mamicode.com
首页 > 其他好文 > 详细

Working With Data Sources 10

时间:2016-11-24 09:32:07      阅读:167      评论:0      收藏:0      [点我收藏+]

标签:str   ram   use   get   frame   als   span   add   ons   

Preparing Data for SQL:

Sometimes we would like to stroe data into SQL server. However , the dataset need to be cleaned before it is sent. So here we use pandas to deal with dataset(.csv)file.

1. read_csv, set encoding:

  file = pd.read_csv("academy_awards.csv",encoding = ‘ISO-8859-1‘)

2. Use str function to read first 4 letters of all the strings in the column.

  file["Year"] = file["Year"].str[0:4]

3. Use .isin function to get the target rows I need:

  award_categories = ["Actor -- Leading Role","Actor -- Supporting Role",‘Actress -- Leading Role‘,‘Actress -- Supporting Role‘]

  nominations = later_than_2000[later_than_2000.isin(award_categories)[‘Category‘]]

4. Use .map() function to replace all the element in the column as I need:

  won_dic = {
  ‘NO‘ : 0,
  ‘YES‘: 1
  }
  nominations.is_copy = False #Attention, here we can not directly modify the copied dataframe, we have to run this line to make copied dataframe changable.
  nominations["Won?"] = nominations["Won?"].map(won_dic)

5. Use .drop() function to get rid of columns I do not need:

  final_nominations = nominations.drop(delete_list,axis = 1)

6. Use vectorized string method to modify each string in a column in the dataframe:

  additional_info_one = final_nominations["Additional Info"].str.rstrip("‘}") #rstrip is to get rid of all the strings on the right side of the target string in the bracket.
  additional_info_two = additional_info_one.str.split("{.")  
  movie_names = additional_info_two.str[0]
  characters = additional_info_two.str[1]

7.  Use to_sql request to save the dataset into the sql:

  final_nominations.to_sql("nominations",conn,index = False)

Working With Data Sources 10

标签:str   ram   use   get   frame   als   span   add   ons   

原文地址:http://www.cnblogs.com/kingoscar/p/6096320.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!