Przemko Tylzanowski <przemko at med.kuleuven.ac.be> wrote in message
news:3C4EAA7E.2842EADA at med.kuleuven.ac.be...
> Long time ago existed a program called TargetFinder- it was used to
> identify sequences of interest (e.g. targets for transcription factors)
> in promoters. The superiority of that program over EPD was that it was
> not limited to 600bp of the promoter. Anyway, italians took it offline
> (after publishing it!). So, I am stuck now.
>> What I would like to do is the following. Identify in the GenBank (or
> EMBL- does not matter) all sequences containing promoters, enhancers
> and/or sequences upstream of TATA of mouse or human origin (at this
> point forget about TATA-less). This bit is easy. I can do it using SRS
> (funny part is- it will work on the server in England but not
> Brussels...). But here problems start. What I get as an output is the
> Feature Sequence (I ask for it) but also the rest of the gene. In cases
> of large genomic sequences this is VERY PAINFUL... What I would like to
> do is yo extract from these initial hits (between 2000-4500 depending
> on the selection of databases) ONLY the sequences containing the
> promoter part (IT WAS POSSIBLE IN SRS4- command line). There I could
> say- get me the feature and all sequences that are -2000 and +100 from
> it. Then I would like to build a database and then run Findpatterns or
> something like that. So, I guess I need a combination of SRS and
> So, HOW DO I DO IT IN SRS6? I know that I could probably write something
> in PERL, the problem is I don't really know it.
>> Any suggestions, solutions are welcome!
You can probably do it with awk().
I have an idea what you need. I'll ask permission first (now)
if you need a hand to implement this just yell. I hate banging
on mailboxes unannounced.
I'd like to make the solution open source if possible.
> Przemko Tylzanowski Ph.D.
> LSD & Joint
> O & N
> University of Leuven
> Herestraat 49
> 3000 Leuven
>> phone: (32-16)34-61-96
> fax : (32-16)34-62-00