查看原文
其他

Python学习教程(五)

2017-06-26 陈同 生信宝典

作业(二)

  1. 将 “作业(一)” 中的程序块用函数的方式重写,并调用执行

  • def func(para1,para2,…):

  • func(para1,para2,…)

  • 用到的知识点

  • 备注:

    • 每个提到提到的“用到的知识点”为相对于前面的题目新增的知识点,请综合考虑。此外,对于不同的思路并不是所有提到的知识点都会用着,而且也可能会用到未提到的知识点。但是所有知识点都在前面的讲义部分有介绍。

    • 每个程序对于你身边会写的人来说都很简单,因此你一定要克制住,独立去把答案做出,多看错误提示,多比对程序输出结果和预期结果的差异。

    • 学习锻炼“读程序”,即对着文件模拟整个的读入、处理过程来发现可能的逻辑问题。

    • 程序运行没有错误不代表你写的程序完成了你的需求,你要去插眼输出结果是不是你想要的。

  • 关于程序调试

    • 在初写程序时,可能会出现各种各样的错误,常见的有缩进不一致,变量名字拼写错误,丢失冒号,文件名未加引号等,这时要根据错误提示查看错误类型是什么,出错的是哪一行来定位错误。当然,有的时候报错的行自身不一定有错,可能是其前面或后面的行出现了错误。

    • 当结果不符合预期时,要学会使用print来查看每步的操作是否正确,比如我读入了字典,我就打印下字典,看看读入的是不是我想要的,是否含有不该存在的字符;或者在每个判断句、函数调入的情况下打印个字符,来跟踪程序的运行轨迹。

    %%writefile script/cat.py
    #作业1和2, cat.py
    #包含了三中处理换行符的写法

    def cat(file):
       for line in open(file):
           print line,
    #------END of cat---------
    cat("data/test1.fa")
    cat("data/test2.fa")
    cat("data/test1.fq")
    Overwriting script/cat.py%run script/cat>NM_001011874 gene=Xkr4 CDS=151-2091 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg >NM_001195662 gene=Rp1 CDS=55-909 AGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCA >NM_011283 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC >NM_0112835 gene=Rp15 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC >NM_001011874 gene=Xkr4 CDS=151-2091 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg aggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTG CCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACC CCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 gene=Rp1 CDS=55-909 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTGAGAAATTGC AGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCA AGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGT GGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGC TGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACA CAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG GTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCC TGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGA GGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGG CGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 gene=Rp1 CDS=128-6412 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG GTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCC TGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGA GGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGG CGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA @HWI-ST1223:80:D1FMTACXX:2:1101:1243:2213 1:N:0:AGTCAA TCTGTGTAGCCNTGGCTGTCCTGGAACTCACTTTGTAGACCAGGCTGGCATGCACCACCACNNNCGGCTCATTTGTCTTTNNTTTTTGTTTTGTTCTGTA + BCCFFFFFFHH#4AFHIJJJJJJJJJJJJJJJJJIJIJJJJJGHIJJJJJJJJJJJJJIIJ###--5ABECFFDDEEEEE##,5=@B8?CDD<AD>C:@> @HWI-ST1223:80:D1FMTACXX:2:1101:1375:2060 1:N:0:AGTCAA NTGCTGAGCCACGACAAGGATCCCAGAGGGCCNAGCCCTGCATCTTGTATGGACCAGTTACNCATCAAAAGAGACTACTGTAGGCACCATCAATCAGATC + #1:DDDD;?CFFHDFEEIGIIIIIIG;DHFGG#)0?BFBDHBFF<FCFEFD;@DD@A=7?E#,,,;=(>3;=;;C>ACCC@CCCCCBBBCCAACCCCCCC @HWI-ST1223:80:D1FMTACXX:2:1101:1383:2091 1:N:0:AGTCAA NGTTCGTGTGGAACCTGGCGCTAAACCATTCGTAGACGACCTGCTTCTGGGTCGGGGTTTCGTACGTAGCAGAGCAGCTCCCTCGCTGCGATCTATTGAA + #1=DDFDFHHHHHJGJJJJJJJJJJJJJJJIJIGDHIHIGIJJJJJJJIIIGHHFDD3>BDDBDDDDDDDDDDBDCCBDDDDDDDDDDDBBDDDDEEACD @HWI-ST1223:80:D1FMTACXX:2:1101:1452:2138 1:N:0:AGTCAA NTCTAGGAGGTCTAGAAAGCCCAGGCCACCGGTACAAACATCAAGGGTGTTACGGATGTGCCGCTCTGAACCTCCAGGACGACTTTGATTTCAACTACAA + #4=DFFEFHHHHHJJJJJIJJJJHIIJGJJJJ@GIIJJJJJJIJJJJFGHIIIJJHHHDFFFFDDDDDDDDDDDDCDDDDDDDDDDDCCCEDEDDDDDDD%%writefile script/splitName.py
    #作业3
    filename = "data/test2.fa"
    def splitName(filename):
       for line in open(filename):
           if line[0] == '>':
               lineL = line.split(' ')
               print lineL[0]
           else:
               print line,
    #------END splitName---------------
    splitName(filename)
    Overwriting script/splitName.py%run script/splitName>NM_001011874 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcg aggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTG CCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACC CCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTGAGAAATTGC AGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCA AGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGT GGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGC TGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACA CAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG GTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCC TGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGA GGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGG CGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCAC ACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACAC CTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATAT CACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGG GTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCC TGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGA GGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGG CGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA%%writefile script/formatFasta.py
    #作业4
    filename = "data/test2.fa"

    def formatFasta(filename):
       aList = []
       for line in open(filename):
           if line[0] == '>':
               lineL = line.split(' ')
               if aList:
                   print ''.join(aList)
                   aList = []
               name = lineL[0]
               print name
           else:
               aList.append(line.strip())
    #不要忘了最后一个序列
       print ''.join(aList)
    #-----End formatFasta----------------
    formatFasta(filename)
    Overwriting script/formatFasta.py%run script/formatFasta>NM_001011874 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcgaggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTGCCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACCCCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTGAGAAATTGCAGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA%%writefile script/formatFasta-2.py
    #作业5
    filename = "data/test2.fa"
    length = 80
    def formatSeq(aList):
       seq = ''.join(aList)
       len_seq = len(seq)
       for i in range(0,len_seq,length):
           print seq[i:i+length]
    #----END formatSeq-------------------

    def formatFasta(filename,length):
       aList = []
       for line in open(filename):
           if line[0] == '>':
               lineL = line.split(' ')
               if aList:
                   formatSeq(aList)
                   aList = []
               name = lineL[0]
               print name
           else:
               aList.append(line.strip())
       #不要忘了最后一个序列
       formatSeq(aList)
    #------------END of formatFasta------------
    formatFasta(filename,length)
    Overwriting script/formatFasta-2.py%run script/formatFasta-2>NM_001011874 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcgaggcgaggag gtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTGCCGAGGCATCATGGCCGCTA AGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACCCCGCTGCAGAACTCGGACAATTCGGGCTCT GTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTGAGAAATTGCAGGTCTCACC CAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATT CAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTG GTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGT AAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCT CCCACAATAAGAAGGTGCTG >NM_011283 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATT TACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATG ATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAG TTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTC TGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATC ACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAA GGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATT TACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATG ATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAG TTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTC TGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATC ACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAA GGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATA TGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA%%writefile script/sortFasta.py
    #作业6

    filename = "data/test2.fa"

    def readFasta(filename):
       aDict = {}
       for line in open(filename):
           if line[0] == '>':
               key = line.split()[0]
               aDict[key] = [] #字典的值是一个列表, = 左边的字符串可以认为是右边列表的名字
           else:
               aDict[key].append(line.strip())
       return aDict
    #-----END readFasta-------------------
    def sortFasta(aDict):
       keyL = aDict.keys()
       keyL.sort()

       for key in keyL:
           print "%s\n%s" % (key, ''.join(aDict[key]))
    #--------END sortFasta--------------------
    aDict = readFasta(filename)
    sortFasta(aDict)
    Overwriting sortFasta.py%run script/sortFasta>NM_001011874 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcgaggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTGCCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACCCCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_001195662 AAGCTCAGCCTTTGCTCAGATTCTCCTCTTGATGAAACAAAGGGATTTCTGCACATGCTTGAGAAATTGCAGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTG >NM_011283 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA >NM_0112835 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA%%writefile script/grepFasta.py
    #作业7.1

    seqFile = "data/test2.fa"
    nameFile = "data/fasta.name"

    def readFasta(filename):
       aDict = {}
       for line in open(filename):
           if line[0] == '>':
               key = line.split()[0][1:] #注意去掉开头的'>',只取序列的名字
               aDict[key] = []  #字典的值是一个列表
           else:
               aDict[key].append(line.strip())
       return aDict
    #--------END readFasta--------------------
    aDict = readFasta(seqFile)
    for line in open(nameFile):
       name = line.strip()
       print ">%s\n%s" % (name, ''.join(aDict[name]))
    Overwriting grepFasta.py%run script/grepFasta.py>NM_001011874 gcggcggcgggcgagcgggcgctggagtaggagctggggagcggcgcggccggggaaggaagccagggcgaggcgaggaggtggcgggaggaggagacagcagggacaggTGTCAGATAAAGGAGTGCTCTCCTCCGCTGCCGAGGCATCATGGCCGCTAAGTCAGACGGGAGGCTGAAGATGAAGAAGAGCAGCGACGTGGCGTTCACCCCGCTGCAGAACTCGGACAATTCGGGCTCTGTGCAAGGACTGGCTCCAGGCTTGCCGTCGGGGTCCGGAG >NM_011283 AATAAATCCAAAGACATTTGTTTACGTGAAACAAGCAGGTTGCATATCCAGTGACGTTTATACAGACCACACAAACTATTTACTCTTTTCTTCGTAAGGAAAGGTTCAACTTCTGGTCTCACCCAAAATGAGTGACACACCTTCTACTAGTTTCTCCATGATTCATCTGACTTCTGAAGGTCAAGTTCCTTCCCCTCGCCATTCAAATATCACTCATCCTGTAGTGGCTAAACGCATCAGTTTCTATAAGAGTGGAGACCCACAGTTTGGCGGCGTTCGGGTGGTGGTCAACCCTCGTTCCTTTAAGACTTTTGACGCTCTGCTGGACAGTTTATCCAGGAAGGTACCCCTGCCCTTTGGGGTAAGGAACATCAGCACGCCCCGTGGACGACACAGCATCACCAGGCTGGAGGAGCTAGAGGACGGCAAGTCTTATGTGTGCTCCCACAATAAGAAGGTGCTGCCAGTTGACCTGGACAAGGCCCGCAGGCGCCCTCGGCCCTGGCTGAGTAGTCGCTCCATAAGCACGCATGTGCAGCTCTGTCCTGCAACTGCCAATATGTCCACCATGGCACCTGGCATGCTCCGTGCCCCAAGGAGGCTCGTGGTCTTCCGGAATGGTGACCCGAA%%writefile script/grepFastq.py
    #作业7.2
    seqFile = "data/test1.fq"
    nameFile = "data/fastq.name"

    def readFastq(seqFile):
       aDict = {}
       count = 0
       for line in open(seqFile):
           count += 1
           if count % 4 == 1:
               name = line.strip()[1:] #去掉开头的@
               aDict[name] = [line] #存储名字行
           else:
               aDict[name].append(line)
       #----END reading--------------
       return aDict
    #-------END readFastq----------这行只是注释,可有可无------
    aDict = readFastq(seqFile)

    for line in open(nameFile):
       name = line.strip()
       print ''.join(aDict[name]),
    Overwriting grepFastq.py%run script/grepFastq@HWI-ST1223:80:D1FMTACXX:2:1101:1243:2213 1:N:0:AGTCAA TCTGTGTAGCCNTGGCTGTCCTGGAACTCACTTTGTAGACCAGGCTGGCATGCACCACCACNNNCGGCTCATTTGTCTTTNNTTTTTGTTTTGTTCTGTA + BCCFFFFFFHH#4AFHIJJJJJJJJJJJJJJJJJIJIJJJJJGHIJJJJJJJJJJJJJIIJ###--5ABECFFDDEEEEE##,5=@B8?CDD<AD>C:@> @HWI-ST1223:80:D1FMTACXX:2:1101:1375:2060 1:N:0:AGTCAA NTGCTGAGCCACGACAAGGATCCCAGAGGGCCNAGCCCTGCATCTTGTATGGACCAGTTACNCATCAAAAGAGACTACTGTAGGCACCATCAATCAGATC + #1:DDDD;?CFFHDFEEIGIIIIIIG;DHFGG#)0?BFBDHBFF<FCFEFD;@DD@A=7?E#,,,;=(>3;=;;C>ACCC@CCCCCBBBCCAACCCCCCC @HWI-ST1223:80:D1FMTACXX:2:1101:1383:2091 1:N:0:AGTCAA NGTTCGTGTGGAACCTGGCGCTAAACCATTCGTAGACGACCTGCTTCTGGGTCGGGGTTTCGTACGTAGCAGAGCAGCTCCCTCGCTGCGATCTATTGAA + #1=DDFDFHHHHHJGJJJJJJJJJJJJJJJIJIGDHIHIGIJJJJJJJIIIGHHFDD3>BDDBDDDDDDDDDDBDCCBDDDDDDDDDDDBBDDDDEEACD @HWI-ST1223:80:D1FMTACXX:2:1101:1452:2138 1:N:0:AGTCAA NTCTAGGAGGTCTAGAAAGCCCAGGCCACCGGTACAAACATCAAGGGTGTTACGGATGTGCCGCTCTGAACCTCCAGGACGACTTTGATTTCAACTACAA + #4=DFFEFHHHHHJJJJJIJJJJHIIJGJJJJ@GIIJJJJJJIJJJJFGHIIIJJHHHDFFFFDDDDDDDDDDDDCDDDDDDDDDDDCCCEDEDDDDDDD%%writefile script/screenResult.py
    #作业8
    filename = "data/test.expr"

    def screenResult(filename, fc_value, adjP_value):
       head = 1 #This is used to indicate there is one head line needing to skip
       for line in open(filename):
           if head:
               head -= 1
               continue
           #-----Begin data lines------
           lineL = line.split()

           foldChange = float(lineL[9])
           adjP = float(lineL[12])
           if foldChange > fc_value and adjP < adjP_value:
               print lineL[0]
    #-------END screenResult-------------------
    screenResult(filename, 2, 0.05)
    Overwriting screenResult.py%run script/screenResultNovel00011 Novel00043 Novel00047 Novel00077 Novel00079 Novel00080 Novel00084 Novel00085 Novel00086 Novel00087 Novel00090 Novel00124 Novel00148 Novel00156 Novel00162 Novel00166%%writefile script/transferMultipleColumToMatrix.py
    #作业9
    filename = "data/multipleColExpr.txt"

    def readMultipleColFile(filename):
       head = 1
       tissueSet = set()
       aDict = {}
       for line in open(filename):
           if head: #skip header line
               head -= 1
               continue
           #-------------------------------
           lineL = line.split()
           gene = lineL[0]
           tissue = lineL[1]
           expr = lineL[2]
           if gene not in aDict:
               aDict[gene] = {}
           assert tissue not in aDict[gene], "Duplicate tissues"
           aDict[gene][tissue] = expr
           tissueSet.add(tissue)
       #---END reading----------------------------
       tissueL = list(tissueSet)
       tissueL.sort()
       return tissueL, aDict
    #----END readMultipleColFile--------------------------

    tissueL, aDict = readMultipleColFile(filename)

    print "Gene\t%s" % '\t'.join(tissueL)

    for gene, tissueD in aDict.items():
       exprL = [gene]
       for tissue in tissueL:
           exprL.append(tissueD[tissue])
       print '\t'.join(exprL)
    Overwriting transferMultipleColumToMatrix.py%run script/transferMultipleColumToMatrixGene    A-431    A-549    AN3-CA    BEWO    CACO-2 ENSG00000000460    25.2    14.2    10.6    24.4    14.2 ENSG00000000938    0.0    0.0    0.0    0.0    0.0 ENSG00000000457    2.8    3.4    3.8    5.8    2.9 ENSG00000000419    73.8    38.6    33.9    53.7    155.5 ENSG00000000003    21.3    32.5    38.2    31.4    63.9 ENSG00000000005    0.0    0.0    0.0    0.0    0.0%%writefile script/reverseComplementary.py
    #作业10
    str1 = "ACGTACGTACGTCACGTCAGCTAGAC"

    def reverseComplementary(string):
       ATCG_dict = {'A':'T','T':'A','C':'G','G':'C', 'a':'t','t':'a','c':'g','g':'c'}

       tmpL = []
       for i in string:
           tmpL.append(ATCG_dict[i])
       tmpL.reverse()
       return ''.join(tmpL)
    #------END reverseComplementary-------
    print reverseComplementary(str1)
    print reverseComplementary('ACGT')
    Overwriting reverseComplementary.py%run script/reverseComplementaryGTCTAGCTGACGTGACGTACGTACGT ACGT%%writefile script/collapsemiRNAreads.py
    #作业11
    smRNA_file = "data/mir.collapse"
    def collapsemiRNAreads(smRNA_file, sample, head = 1):
       lineno = 0
       for line in open(smRNA_file):
           if head:
               head -= 1
               continue
           #----end skip header line---
           seq, value = line.split()
           lineno += 1
           print ">%s_%d_x%s\n%s" % (sample, lineno, value, seq)
    #-------end collapsemiRNAreads------------
    collapsemiRNAreads(smRNA_file, "ESB")
    Overwriting collapsemiRNAreads.py%run script/collapsemiRNAreads>ESB_1_x2 ACTGCCCTAAGTGCTCCTTCTGGC >ESB_2_x2 ATAAGGTGCATCTAGTGCAGATA >ESB_3_x1 TGAGGTAGTAGTTTGTGCTGTTT >ESB_4_x1 TCCTACGAGTTGCATGGATTC >ESB_5_x1 ACCGGGTGGAGCCGCCGCA >ESB_6_x1 ACTGCCCTAAGTGCTCCTTCTGGT >ESB_7_x1 TACAGGGCTGGGGATGG >ESB_8_x1 CCCTGGATGCTGTAGGATG >ESB_9_x1 AAAATGCTACTACTTTTGAGTC >ESB_10_x1 TCCCTGGTGGTCTAGTGGCTAGGAT >ESB_11_x4 CTCTTAGATCGATGTGGTGCTC >ESB_12_x1 TTGGTGGTTCAGTGGTA >ESB_13_x1 TCACAATTCCCCTATACCATGGGCCAT >ESB_14_x1 CAAATGCTAGGGTTGGTGG >ESB_15_x4 AGGCAGCTTCCACAGCA >ESB_16_x3 GCACTGAGATGGAGTGGTGTAA >ESB_17_x3 CTCCATTTGTTTGATGATGGA >ESB_18_x1 GCACTGAGATGGAGTGGTGTAT >ESB_19_x2 AGTGCCCAGAGTTTGTAGTGT >ESB_20_x1 ATAAGGTGCATCTAGTGCTGTTAGA >ESB_21_x9 AAAATGCTTCCACTTTGTGTG >ESB_22_x2 GGTAATTCTAGAGCTAATACATGCCG >ESB_23_x6 CCGGGCGGAAACACCA >ESB_24_x1 TGAAAGGCTGACCCGCCGGGC >ESB_25_x1 TTCGTACAGTGGTCGTAAGTTCGTGC >ESB_26_x1 GGTTAGCACTCTGGACTC%%writefile script/map.py
    #作业12
    ref = "data/ref.fa"
    short = "data/short.fa"

    def readRef(ref):
       refD = {}
       for line in open(ref):
           if line[0] == '>':
               name = line[1:-1]
               assert name not in refD
               refD[name] = []
           else:
               refD[name].append(line.strip())
       #---------END reading----------------
       for key, valueL in refD.items():
           refD[key] = ''.join(valueL)
       return refD
    #---------END readRef----------------
    def mapSeq(seq, refD):
       len_seq = len(seq)
       for key, value in refD.items():
           pos = value.find(seq)
           while pos != -1:
               print "%s\t%d\t%d\t%s" % (key, pos, pos+len_seq, seq)
               current = pos + 1
               newpos = value[current:].find(seq)
               if newpos == -1:
                   break
               pos = current + newpos
    #----------END mapSeq------------
    refD = readRef(ref)
    #------------------------------------
    for line in open(short):
       if line[0] == '>':
           pass
       else:
           seq = line.strip()
           mapSeq(seq, refD)
    Overwriting map.py%run script/mapERROR: File `u'script/map.py'` not found.


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存