IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Staden scripting

Tim Cutts timc at chiark.greenend.org.uk
Thu Jun 28 11:16:58 EST 2001


In article <3B3B47A6.281EE8C at bbsrc.ac.uk>,
Simon Andrews  <simon.andrews at bbsrc.ac.uk> wrote:
>1) open a database of a given name

load_package gap

set io [open_db -name foobar -version 0 -access rq -create 1]

>2) add in exp files from pregap.passed using normal shotgun assembly

Not sure here, but it might be somethign like:

assemble_new_contigs -io $io -files {read1 read2 read3}

>3) count how many contigs are in the database
>4) halt if there is more than one contig (return an error or something)

set num_contigs [db_info num_contigs $io]

if {$num_contigs > 1} {
  ...
}

>5) if there is one contig extract its consensus (with all gaps removed)

get_consensus -io $io -contigs contigname -outfile filename ...

there are a few more options to that to precisely control the way the
consensus is generated.

>... and that's it.  This will then be wrapped in a larger iterative Perl
>script which will do all sorts of fancy stuff around it - but the whole
>thing relies on being able to get this simple Staden script working.

One thing I would advise is not firing up gap4 repeatedly from a perl
script.  Starting up the tcl shell and invoking load_package gap4 tajes
a significant time, and if you want to do this to a lot of gap4 dbs,
like I do here, then you'll find it's at least an order of magnitude
faster if you can write the whole script in TCL.  I dump annotation from
about 20,000 gap4 databases on a nightly basis using a tcl script.  It
used to be done your way, with the outer loop in perl.  It's now
entirely in tcl, and is well over ten times faster.


The wrapping loop is quite easy.  If you have a list of databases to
work on in file foobar:

set f_list [open foobar r]

while {[gets $f_list db] >= 0} {
   #Parse the filename
    set froot [file root $db]
    set ftail [file tail $db]
    set version [string trimleft [file extension $db] .]

    if {[file exists "$froot.$version"] == 0} {
        if {$verbose == 1} {
            puts "ERROR:  File $db does not exist"
        }
        continue
    }

    set io [open_db -name $froot -version $version -access rw -create 1]

    # Skip this if we failed to open the database
    if {$io == ""} {
        puts "WARNING:  Could not open $ftail"
        continue
    }

    # Do all your funky stuff with the open database in $io here

    # Close the database
    close_db -io $io
}

close $f_list

exit

>Can anyone help me out on this one?  Even examples of working Staden
>scripts would be useful so I can sort out the information in James'
>scripting manual.

I can't give you full examples because of (boo hiss) company
confidentiality clauses, but hopefully the above will help a bit.

Finding a good tcl book helps, too.  :-)

Tim.




More information about the Staden mailing list

Send comments to us at archive@iubioarchive.bio.net