MAILFASTA and GETENTRY NAME mailfasta - send a sequence file for comparison with sequence databases or for ORF predicting getentry - get an entry from the sequence databases SYNOPSIS mailfasta [-d # sequencefile | sequencefile1 sequencefile2 ...] # = Number of service to be used getentry database entry1 [entry2 ...] DESCRIPTION MAILFASTA 3.2 The EMBL site in Germany and the NCBI site in US contain several sequence databases. A sequence can be compared with these databases by sending a specially formatted email message to the sites mail-server. On the sites the sequence is compared to the databases with the FASTA and MPsrch program (EMBL) or the BLAST program (NCBI) and the results will be e-mailed back. The Pythia site in the US contains two small databases, which contain Human repetitive elements. These databases can also be compared against the sequence query. There are two other mailservers which will take a DNA sequence and try to predict exon-intron splice sites and gene organisation. These are GRAIL and GeneID. Another way to find homology in sequences is to compare it with a database of blocks of AA sequences derived from the PROSITE database. This can be done with the BLOCKS mailserver. DNA sequences will be translated in the 6 reading frames and a BLOCKS search will be done on those 6 AA sequences. The PredictProtein mailserver at the EMBL can give some insight in the secondary structure of the AA sequence query. It will produce a multiple sequence alignment if it finds a block of homologous proteins from the Swiss-Prot database. Mailfasta is a (csh) program which reads a sequence file and sends a specially formatted email message to the different email servers, where it will be processed and the results are emailed back. The program can operate in interactive and in default mode. In interactive mode the user will be prompted for all the settings to be used. In the default mode (option -d), only the service and one sequence file is given and the sequence will be processed with the default setting of the requested service. In default mode it's 'no questions asked'. The number of the service to be used can be found in the interactive mode. In interactive mode, more than one sequencfile can be given and it will prompt for all the sequences given, one at the time. The sequence file(s) can have several formats. File containing more than one sequence can also be used, but only in the interactive mode. The program readseq is used to translated all the different formats. This program MUST be present and is not a part of the mailfasta package. It can be obtained via anonymous ftp from several sources (for instance : ftp.bio.indiana.edu::/molbio/readseq) The ORF predicting GRAIL server can only be used if a valid user id is given. If you don't have a GRAIL user id you can aply for one in the program. It will ask you for your name and address and will sent your request to GRAIL. GRAIL will provide you with the userid via email. If you do have a GRAIL userid, you can make a global variable GRAIL_USERID by adding this line to your .cshrc file : setenv GRAIL_ID your_grail_user_id but you can also enter it when it asks you for one. The PredictProtein server needs a name, adres and email adres. You can define the global variables NAME_2D, ADRES_2D and EMAIL_2D in your .cshrc file, to contain your name, adres and email adres. (Use double quotes if they contain spaces). If you have not defined (one of the global) variables, it will ask you for the data. The program will check whether the sequence is DNA or PROTEIN, simply by counting the percentage of the bases G, A, T and C. For this, a small c program is used, called cid.c. This program comes with the program. It must be present and compiled to 'cid'. The shar file will compile it. If you did not use the shar file to install 'mailfasta' you will have to compile it yourself, by using the command 'cc -o cid cid.c'. GETENTRY Entries of interest can be obtained with the program getentry. entry is the locus name which can be found in the FASTA or BLAST output. The entries will be e-mailed back. A entry from the BLOCKS database can be retrieved by sending a email to blocks@howard.fhcrc.org with the block number in the subject line. GOPHER is a much better way to retrieve sequences from databases. I strongly recommend everybody to use GOPHER. Ask your local network guru for more information on GOPHER. DATABASES FASTA searches are sent to the EMBL site in Germany. There the sequence can be compared to the GenBank and EMBL DNA database and the Swiss-Prot, PIR and PDB AA databases. It is also possible to compare DNA sequences to subsets of the EMBL database. BLITZ searches of the Swiss-Prot database are also sent to EMBL. BLAST searches are sent to the NCBI server in the US. There the sequence can be compared to GenBank, GenBank Update, EMBL, EMBL Update, Vector subset of GB, Expressed Sequence Tags (EST), Eukaryotic promotor database (EPD), Swiss-Prot, Swiss-Prot Update, PIR, GenPept, Transcription Factor Database (TFD), Kabat (sequences of immunological interest), Alu (Human repetitive elements) and sequences in the Brookhaven 3D structure DB. If a AA database is chosen for a DNA sequence, the sequence will be translated in the 6 reading frames and the resulting AA sequences will be compared to the AA databases. If a DNA database is chosen for a AA sequence, the database will be translated in the 6 reading frames and the AA sequences will be compared to the translated DNA database. HELP Help or more information on interpreting the results can be obtained by sending HELP in an email message to the following adresses : FASTA : fasta@embl-heidelberg.de BLAST : blast@ncbi.nlm.nih.gov BLOCKS : blocks@howard.fhcrc.org GRAIL : grail@ornl.gov GENEID : geneid@darwin.bu.edu PYTHIA : pythia@anl.gov Predict 2D structure : predictprotein@embl-heidelberg.de BLITZ : blitz@embl-heidelberg.de RETRIEVE : retrieve@ncbi.nlm.nih.gov (getentry) BUGS you tell me :-) FILES cid A c program which checks if a file is a DNA sequence or a protein sequence. If a file contains more than 85% A, C, G and T's it is considered to be a DNA file. This file MUST be present and compiled from cid.c. readseq A file conversion program. Must be present. Program can be obtained via anonymous ftp from fly.bio.indiana.edu. mailfasta.doc This documentation mailfasta.changes Changes since the last version (starting at 3.0) /tmp/mflist$$ /tmp/mfseq$$ /tmp/mf$$ Temporary storage of the email message. /tmp/ge$$ Temporary storage for getentry CHANGES SINCE THE LAST VERSIONS Version 3.2 (July 11, 1993) * The EMBL FASTA server has returned to this version. It has replaced the FASTA server in Japan. The respons time of the server at EMBL in Germany is much shorter now and it has a wider range of databases to be searched. * The BLITZ server at EMBL has been added. This service runs the MPsrch program of Shane Sturrock and John Collins Edinburgh, UK. It performs extrememly fast "best local similarity" searches of the Swiss-Prot protein sequence database, using the well known Smith and Waterman algorithm. * The PDB database has been added to the databases to be searched using the BLAST server at NCBI Version 3.1 (February 18, 1993) * Two new servers have been added : Pythia (Human repetitive DNA and ALU subfamily membership) Predict 2D structure of a AA sequence * Three new databases have been added to the BLAST search : Alu (Human repetitive elements) Kabat (Sequences of immunological interest) Swiss-Prot Update (Cumulative weekly update) * Option -d has been added. If option -d is used, mailfasta operates in the default mode. After -d one number and one sequenfile can be given. The number is the number of the service to be used for the sequence file. The sequence will be emailed to the requested service with the default settings for that service. If there are no errors mailfasta will return no messages. The '-d sequencfile' part can be repeated several times. It is also posible to give a sequencfilename after the '-d # file' part for normal interactive mode. So : mailfasta -d 18 file1 -d 1 file2 file3 file4 -d 13 file5 is possible. Version 3.0 (August, 22 1992) There have been a lot of changes since the last version of mailfasta (2.1) Version 3.0 no longer uses the FASTA and BLAST server at the GenBank site (genbank.bio.net) as those services will be terminated soon. Instead it uses the FASTA server at the FLAT site in Japan. This site does not have the vector subset, so this cannot be searched using the FASTA program. Version 3.0 now uses the NCBI site with the BLAST program which can search almost all DNA and PROTEIN databases. It can also search a query for PROSITE blocks using the BLOCKS server. ORF predicting is now possible using the GRAIL and GeneID/NetGene server. Retrieving sequnce entries is no handled by the NCBI RETRIEVE server. Getentry now needs a database name and a LOCUS name from that database to be retrieved. A better way to retrieve sequence entries is to use the gopher hole at iubio and the EMBNET hole in switserland. Last change: February 18, 1993