05 Computing GC Content

时间：2017-07-30 14:50:09 阅读：153 评论：0 收藏：0 [点我收藏+]

标签：ros absolute problem read data div where max tool

Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are ‘C‘ or ‘G‘. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with ‘>‘, followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with ‘>‘ indicates the label of the next string.

In Rosalind‘s implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540


方法一：

# -*- coding: utf-8 -*-


# to open FASTA format sequence file:
s=open(‘Computing_GC_Content.txt‘,‘r‘).readlines()

# to create two lists, one for names, one for sequences
name_list=[]
seq_list=[]

data=‘‘ # to put the sequence from several lines together

for line in s:
    line=line.strip()
    for i in line:
        if i == ‘>‘:
            name_list.append(line[1:])
            if data:
                seq_list.append(data)         #将每一行的的核苷酸字符串连接起来
                data=‘‘                       # 合完后data 清零
            break
        else:
            line=line.upper()
    if all([k==k.upper() for k in line]):    #验证是不是所有的都是大写
        data=data+line
seq_list.append(data)                         # is there a way to include the last sequence in the for loop?
GC_list=[]
for seq in seq_list:
    i=0
    for k in seq:
        if k=="G" or k==‘C‘:
            i+=1
    GC_cont=float(i)/len(seq)*100.0
    GC_list.append(GC_cont)


m=max(GC_list)
print name_list[GC_list.index(m)]              # to find the index of max GC
print "{:0.6f}".format(m)                    # 保留6位小数

05 Computing GC Content

标签：ros absolute problem read data div where max tool

原文地址：http://www.cnblogs.com/think-and-do/p/7258980.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行