码迷,mamicode.com
首页 > 编程语言 > 详细

python学习——通过命令行参数根据fasta文件中染色体id提取染色体序列

时间:2019-05-15 09:37:43      阅读:586      评论:0      收藏:0      [点我收藏+]

标签:mic   fast   __name__   14.   _id   ==   turn   parse   提取   

提取fasta文件genome_test.fa中第14号染色体的序列,其内容如下:

>chr1
ATATATATAT
>chr2
ATATATATATCGCGCGCGCG
>chr3
ATATATATATCGCGCGCGCGATATATATAT
>chr4
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCG
>chr5
ATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr6
ATCGATGCAGCATG
>chr7
TATCGCGCGCGCGATATAT
>chr8
ATATCGCGCGCGCGATATATATATCGCG
>chr9
ATCGCGCGCGCGATATATATATCGCG
>chr10
GCGCGCGATATAT
>chr11
CGCGATATATATATC
>chr12
ATATATCGCGCGCGCGATATAT
>chr13
ATATATCGCGCGCGCGATATATGCGATATATATATC
>chr15
ATATATGCGAT
>chr14
GCGCGCGCGATATATGCGAT
>chr16
GCGATATATGCGATATATATATC
>chr17
GCGCGCGCGATATATATATCGCGCGCGCGATATATATAT
>chr18
GCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGATATATATATC
>chr19
ATATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr20
TATGCGATATATATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr21
TATATCGCGCGCGCGATATATATATCGCGCGCGCGATATATATATATATGCGA
>chr22
ATATATATCGCGCGCGCGATATATATATATATGCGA
>chrX
CGCGCGCGATATATATATATATGCGA
>chrY
CGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGACGCGCGCGATATATATATATATGCGA

用python以及命令行参数实现

新建.py文件“”GetSeqFromChrID.py”,

python脚本如下:

 1 import argparse
 2 
 3 def read_fasta(input):
 4 
 5     with open(input, r) as f:
 6         fasta = {}
 7         for line in f:
 8             line = line.strip()
 9             if line[0] == >:
10                 header = line[1:]
11             else:
12                 sequence = line
13                 fasta[header] = fasta.get(header, ‘‘) + sequence
14 
15     return fasta
16 
17 
18 if __name__ == __main__:
19     # read arguments
20     parser = argparse.ArgumentParser(description="this program is used to extract a single "
21                                                  "sequence from genome")
22     parser.add_argument(--input, -i,
23                         type=str,
24                         help=input file in fasta format)
25     parser.add_argument(--output, -o,
26                         type=str,
27                         help=output file)
28     parser.add_argument(seq_id,
29                         type=str,
30                         help=sequence id)
31     args = parser.parse_args()
32 
33     fasta = read_fasta(args.input)
34     with open(args.output, w) as f:
35         f.write(>{:s}\n{:s}\n.format(args.seq_id,fasta.get(args.seq_id, can not found this sequence)))

命令行参数输入如下:红色字体是输入部分

1 (base) e:\15_python\DEBUG>python GetSeqFromChrID.py -i genome_test.fa -o chr14.fa chr14

结果如下:

技术图片

 

  

python学习——通过命令行参数根据fasta文件中染色体id提取染色体序列

标签:mic   fast   __name__   14.   _id   ==   turn   parse   提取   

原文地址:https://www.cnblogs.com/caicai2019/p/10867405.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!