IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

CDS Extraction Tool

Don Gilbert gilbertd at bio.indiana.edu
Thu Feb 24 15:00:49 EST 2000


Readseq, version 2, will extract (or remove) any set of features or
fields from Genbank or EMBL format sequence files,
See http://iubio.bio.indiana.edu/soft/molbio/readseq/java
One caveat, this only works for feature ranges within the
given sequence.  Where other sequence records are part of range,
they are not included in extraction.

Example to pull all CDS entries from a genbank format file:
 jre -cp readseq.jar run -feature=CDS -format=gb out=gbinvcds.gb data/gbinv1a.seq
Output example:

LOCUS       AAAAGC        147 bp    mRNA            INV       28-NOV-1994 
FEATURES             Location/Qualifiers
     CDS             join(<31..63,64..177)
                     /codon_start=1
                     /product="alpha globin"
                     /db_xref="PID:g402359"
                     /translation="INRKISGDAFGSIIEPMKETLKARMGSYYSDDVAGAWAALIGVVQAAL"
     extracted_range join(<31..63,64..177)
                     /note="Range of sequence extracted from original, due to feature
                     selection.  Feature locations are not valid for this
                     sequence, but for original."
ORIGIN      
        1 atcaacagga aaatcagcgg tgacgcattc gggtcaatca ttgaaccaat gaaggagaca
       61 ctgaaggcca ggatgggcag ttattacagt gatgatgtcg ctggagcatg ggccgctctg
      121 attggtgtag ttcaggctgc tttgtaa
//
LOCUS       AAABDA        224 bp    DNA             INV       05-AUG-1992 
FEATURES             Location/Qualifiers
     CDS             1016..1239
                     /partial
                     /gene="abd-A"
                     /codon_start=3
                     /product="abdominal-A homologue"
                     /db_xref="PID:g5554"
                     /db_xref="SWISS-PROT:P29552"
                     /translation="PNGCPRRRGRQTYTRFQTLELEKEFHFNHYLTRRRRIEIAHALCLTERQIKIWFQN
                     RRMKLKKELRAVKEINEQ"
     extracted_range 1016..1239
                     /note="Range of sequence extracted from original, due to feature
                     selection.  Feature locations are not valid for this
                     sequence, but for original."
ORIGIN      
        1 gtcccaacgg atgcccgcgt cgacgaggcc ggcaaacgta cacccgcttc cagacgctcg
       61 agctggagaa agagttccac ttcaaccact acctgacccg gcgacggagg atcgaaattg
      121 cgcacgccct gtgtctgacc gagcggcaga tcaaaatctg gttccaaaat cgccggatga
      181 agctgaagaa ggaactgcgg gcggtgaagg aaattaacga acag
//
...

--
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu





More information about the Bio-soft mailing list

Send comments to us at archive@iubioarchive.bio.net