From usenet.ucs.indiana.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!biosci!afrc.ac.uk!odonnell Tue Sep 21 17:52:04 EST 1993
Article: 583 of bionet.announce
Path: usenet.ucs.indiana.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!biosci!afrc.ac.uk!odonnell
From: odonnell@afrc.ac.uk
Newsgroups: bionet.announce
Subject: Training manual available on ftp
Message-ID: <1993Sep21.084559.22770@gserv1.dl.ac.uk>
Date: 21 Sep 93 08:46:00 GMT
Sender: kristoff@net.bio.net
Lines: 491
Approved: bionews-moderator@net.bio.net


           Molecular Biology Software Training Manual on ftp
           =================================================

    The AFRC's training manual is available on anonymous ftp from

       ftp.embl-heidelberg.de

     as the compressed tar file

       /pub/doc/afrc_manual.tar.Z 

     and on the EMBL file server at e-mail address: netserv@embl-heidelberg.de

      using the request "get doc:afrc_man.uaa"

Summary
     The training manual comprises a contents section, 16 chapters 
     and four appendices. There are three illustrations: a
     front cover and two diagrams for chapter 6 which must be added 
     separately.

     Chapter 1 is  very  AFRC-specific.   Chapter  2  is  essentially  a
     GCG-reference section with some AFRC-specific information included.
     Chapters 3-16 contain worked exercises and background  notes.   All
     the  examples show how the software behaves on Agrenet VAXes - some
     programs default to batch-submission, using local queue names.  All
     sequences  used  in the exercises are obtainable from the exercises
     themselves, or can be located using Appendix D.

     The separate chapter-files are in postscript form. Each chapter is 
     formatted for double-sided printing, so some pages are blank.

     The contents and preface pages (in plain text) are shown below,
     along with a list of the tar file contents.

     Dare I say "comments are welcome" ?

 *************************************************************************
 Cary O'Donnell     Scientific Support Group, AFRC Computing Division,   
                    West Common, Harpenden, Herts AL5 2JE, United Kingdom
                    (AFRC = Agricultural & Food Research Council)
 Tel: (+44) 582 762271     Internet e-mail: odonnell@afrc.ac.uk            
 Fax: (+44) 582 761710     Long/Lat 00d 21m 45s West  51d 48m 30s North
 -------------------------------------------------------------------------


=============================================================================


                            MOLECULAR BIOLOGY SOFTWARE

                                  TRAINING MANUAL


                                     CONTENTS



CHAPTER 1       STARTING SEQUENCE ANALYSIS ON AGRENET

        1.1     LOGGING ON . . . . . . . . . . . . . . . . . . . . 1-3
        1.2     STARTING UP THE MAIN PACKAGES  . . . . . . . . . . 1-3
        1.3     USING LOGIN.COM  . . . . . . . . . . . . . . . . . 1-4
        1.4     OTHER SOFTWARE PACKAGES  . . . . . . . . . . . . . 1-4
        1.5     HELP INFORMATION . . . . . . . . . . . . . . . . . 1-4
        1.6     DOCUMENTATION  . . . . . . . . . . . . . . . . . . 1-4
        1.7     GRAPHICAL OUTPUT . . . . . . . . . . . . . . . . . 1-4
        1.7.1     Unipict files  . . . . . . . . . . . . . . . . . 1-5
        1.8     SOFTWARE AND DATABASES ON AGRENET  . . . . . . . . 1-5
        1.9     BROOKHAVEN DATABASE  . . . . . . . . . . . . . . . 1-6
        1.10    USEFUL VMS COMMANDS  . . . . . . . . . . . . . . . 1-6
        1.11    QUEUES ON AGRENET  . . . . . . . . . . . . . . . . 1-6
        1.11.1    Batch queues . . . . . . . . . . . . . . . . . . 1-6
        1.11.2    Printer queues . . . . . . . . . . . . . . . . . 1-7



CHAPTER 2       THE GENETICS COMPUTER GROUP PACKAGE

        2.1     WHAT IS GCG ?  . . . . . . . . . . . . . . . . . . 2-3
        2.1.1     Program Examples . . . . . . . . . . . . . . . . 2-3
        2.1.2     Command Line Modifiers . . . . . . . . . . . . . 2-3
        2.2     DATABASES WITH GCG . . . . . . . . . . . . . . . . 2-4
        2.2.1     Database Sequence Names  . . . . . . . . . . . . 2-4
        2.2.2     Database Accession Numbers . . . . . . . . . . . 2-4
        2.2.3     Searching Databases  . . . . . . . . . . . . . . 2-4
        2.3     DATABASES ON AGRENET . . . . . . . . . . . . . . . 2-5
        2.3.1     Nucleic Acid Databases . . . . . . . . . . . . . 2-5
        2.3.1.1   Database Divisions . . . . . . . . . . . . . . . 2-5
        2.3.2     Protein databases  . . . . . . . . . . . . . . . 2-5
        2.4     GROUPS OF SEQUENCES  . . . . . . . . . . . . . . . 2-6
        2.4.1     Files Of Sequence Names  . . . . . . . . . . . . 2-6
        2.4.2     Multiple Sequence Files  . . . . . . . . . . . . 2-6
        2.5     PROGRAM DEFAULT VALUES . . . . . . . . . . . . . . 2-6
        2.6     GRAPHICS WITH THE GCG PACKAGE  . . . . . . . . . . 2-7
        2.6.1     Graphics Driver-Selection  . . . . . . . . . . . 2-7
        2.6.2     Fonts  . . . . . . . . . . . . . . . . . . . . . 2-7
        2.6.3     GCG 'Figure' Files . . . . . . . . . . . . . . . 2-7
        2.7     LOCAL DATA FILES . . . . . . . . . . . . . . . . . 2-8
        2.7.1     Enzyme Data Files  . . . . . . . . . . . . . . . 2-8
        2.8     NUCLEOTIDE SYMBOLS IN GCG  . . . . . . . . . . . . 2-9
        2.9     AMINO ACID SYMBOLS AND THE STANDARD TRANSLATION
                TABLE  . . . . . . . . . . . . . . . . . . . . .  2-10
        2.10    NUCLEOTIDE SYMBOL COMPARISON TABLE FOR BESTFIT .  2-11
        2.11    AMINO ACID SYMBOL COMPARISON TABLE . . . . . . .  2-11
        2.11.1    Analysis of symbol comparison values . . . . .  2-12
        2.12    PROGRAMS IN THE GCG PACKAGE (RELEASE 7.2)  . . .  2-14
        2.12.1    Supplementary Programs . . . . . . . . . . . .  2-16



CHAPTER 3       GENERAL SEQUENCE MANIPULATION

        3.1     HELP INFORMATION AND DOCUMENTATION . . . . . . . . 3-3
        3.1.1     GENHELP - Help on GCG programs.  . . . . . . . . 3-3
        3.1.2     GENMANUAL - Help by program function . . . . . . 3-3
        3.1.3     EASYGCG - Finding your way . . . . . . . . . . . 3-4
        3.2     FORMATTING A GCG SEQUENCE  . . . . . . . . . . . . 3-4
        3.2.1     Formatting raw sequence data . . . . . . . . . . 3-4
        3.2.2     DNA <-> RNA conversion . . . . . . . . . . . . . 3-5
        3.2.3     Sequence complementing . . . . . . . . . . . . . 3-5
        3.3     SEQHELP - HELP FOR OTHER ANALYSIS PROGRAMS . . . . 3-6
        3.4     COPYING A DATABASE SEQUENCE  . . . . . . . . . . . 3-6
        3.5     RESTRICTION MAPPING PROGRAMS . . . . . . . . . . . 3-7
        3.5.1     MAP  . . . . . . . . . . . . . . . . . . . . . . 3-7
        3.5.1.1   Selecting enzymes  . . . . . . . . . . . . . . . 3-7
        3.5.2     MAPSORT (and selecting enzymes by region)  . . . 3-8
        3.5.2.1   Digest (and selecting enzymes by name) . . . . . 3-8
        3.5.2.2   Creating a plasmid map.  . . . . . . . . . . . . 3-9
        3.5.3     The enzyme list  . . . . . . . . . . . . . . . . 3-9
        3.5.4     PROTEIN SEQUENCE MAPPING . . . . . . . . . . . . 3-9
        3.5.5     SEQUENCE EDITING . . . . . . . . . . . . . . .  3-10
        3.5.5.1   Editing "Modes"  . . . . . . . . . . . . . . .  3-10
        3.5.5.2   Adding Sequence Data From a File . . . . . . .  3-10
        3.5.5.3   Moving Around in the Sequence  . . . . . . . .  3-10
        3.5.5.4   Editing Comments . . . . . . . . . . . . . . .  3-11
        3.5.5.5   Writing Part of the Sequence to a File . . . .  3-11
        3.5.5.6   Deleting Part of the Sequence  . . . . . . . .  3-11
        3.5.5.7   Help . . . . . . . . . . . . . . . . . . . . .  3-11
        3.5.5.8   Exiting  . . . . . . . . . . . . . . . . . . .  3-11
        3.5.5.9   Changing your keyboard . . . . . . . . . . . .  3-11
        3.5.6     INTERCONVERTING SEQUENCE FORMATS . . . . . . .  3-12
        3.5.6.1   PIR format . . . . . . . . . . . . . . . . . .  3-12
        3.5.6.2   STADEN format  . . . . . . . . . . . . . . . .  3-12


CHAPTER 4       PROTEIN ANALYSIS

        4.1     IDENTIFY OPEN READING FRAMES . . . . . . . . . . . 4-3
        4.2     IDENTIFYING POTENTIAL CODING REGIONS . . . . . . . 4-4
        4.2.1     Base composition of bulk DNA . . . . . . . . . . 4-4
        4.2.2     Base composition in the third codon position . . 4-5
        4.2.3     Codon usage bias . . . . . . . . . . . . . . . . 4-5
        4.3     TRANSLATING RNA INTO PROTEIN . . . . . . . . . . . 4-6
        4.3.1     Three and one-letter abbreviations . . . . . . . 4-6
        4.4     PREDICTING SECONDARY STRUCTURE IN PROTEINS . . . . 4-7
        4.4.1     PEPTIDESTRUCTURE & PLOTSTRUCTURE . . . . . . . . 4-7
        4.4.2     MOMENT . . . . . . . . . . . . . . . . . . . . . 4-8
        4.4.3     PEPPLOT  . . . . . . . . . . . . . . . . . . . . 4-8
        4.5     TRANSLATING PROTEIN INTO RNA . . . . . . . . . . . 4-9
        4.5.1     Best Sequence Option . . . . . . . . . . . . . . 4-9
        4.5.2     Most Ambiguous Sequence Option . . . . . . . . . 4-9



CHAPTER 5       COMPARING SEQUENCES
        5.1     IDENTIFYING SEQUENCE HOMOLOGY  . . . . . . . . . . 5-3
        5.1.1     WORD COMPARISON  . . . . . . . . . . . . . . . . 5-3
        5.1.2     DOTPLOTTING  . . . . . . . . . . . . . . . . . . 5-3
        5.1.2.1   Interpreting the Plot  . . . . . . . . . . . . . 5-3
        5.1.2.2   The Effect of Word Size  . . . . . . . . . . . . 5-3
        5.1.2.3   Types of patterns  . . . . . . . . . . . . . . . 5-4
        5.1.3     WINDOW COMPARISON  . . . . . . . . . . . . . . . 5-4
        5.1.4     COMPARISON OF PROTEINS . . . . . . . . . . . . . 5-5
        5.1.5     Symbol comparison tables . . . . . . . . . . . . 5-5
        5.2     SEQUENCE ALIGNMENTS  . . . . . . . . . . . . . . . 5-6
        5.2.1     BESTFIT  . . . . . . . . . . . . . . . . . . . . 5-6
        5.2.2     GAP  . . . . . . . . . . . . . . . . . . . . . . 5-6
        5.2.3     Protein sequence alignment . . . . . . . . . . . 5-7
        5.2.4     ALIGNMENT MEASUREMENTS . . . . . . . . . . . . . 5-7
        5.2.4.1   Quality  . . . . . . . . . . . . . . . . . . . . 5-7
        5.2.4.2   Ratio  . . . . . . . . . . . . . . . . . . . . . 5-7
        5.2.4.3   Identity . . . . . . . . . . . . . . . . . . . . 5-7
        5.2.4.4   Similarity . . . . . . . . . . . . . . . . . . . 5-7

CHAPTER 6       SEARCHING DATABASES

        6.1     SEARCHING BY TEXT - STRINGSEARCH . . . . . . . . . 6-3
        6.1.1     Definition search  . . . . . . . . . . . . . . . 6-3
        6.1.2     Full text search . . . . . . . . . . . . . . . . 6-3
        6.2     INTERACTIVE TEXT SEARCHES - XQS  . . . . . . . . . 6-4
        6.2.1     Nucleotide databases . . . . . . . . . . . . . . 6-4
        6.2.2     Protein databases  . . . . . . . . . . . . . . . 6-5
        6.3     SEQUENCE HOMOLOGY SEARCH . . . . . . . . . . . . . 6-6
        6.3.1     FASTA (direct search)  . . . . . . . . . . . . . 6-6
        6.3.2     TFASTA (translation search)  . . . . . . . . . . 6-6
        6.3.3     EXHAUSTIVE HOMOLOGY SEARCHING  . . . . . . . . . 6-7
        6.4     INTERPRETING FASTA OUTPUT  . . . . . . . . . . . . 6-8
        6.4.1     The FASTA algorithm  . . . . . . . . . . . . . . 6-8
        6.4.1.1   Disadvantages of the FASTA algorithm . . . . . . 6-8
        6.4.2     A FASTA Strategy . . . . . . . . . . . . . . . . 6-8
        6.4.3     The histogram  . . . . . . . . . . . . . . . .  6-10
        6.4.4     Mean Scores and CPU  . . . . . . . . . . . . .  6-10
        6.4.5     Example FASTA histogram: . . . . . . . . . . .  6-11
        6.4.6     The best scores  . . . . . . . . . . . . . . .  6-12
        6.4.7     The alignments . . . . . . . . . . . . . . . .  6-13
        6.5     INTERPRETING PROSRCH OUTPUT  . . . . . . . . . .  6-14
        6.5.1     Data check . . . . . . . . . . . . . . . . . .  6-14
        6.5.2     Symbol comparison table  . . . . . . . . . . .  6-14
        6.5.3     Additional information . . . . . . . . . . . .  6-14
        6.5.4     How PROSRCH works  . . . . . . . . . . . . . .  6-14
        6.5.5     Figure: Score vs log (number of entries).  . .  6-15
        6.5.6     The score distribution and statistics  . . . .  6-16
        6.5.7     The alignments . . . . . . . . . . . . . . . .  6-17
        6.5.8     The individual alignment scores  . . . . . . .  6-17
        6.5.9     Score ratios and PAM tables  . . . . . . . . .  6-17
        6.5.10    Mapping  . . . . . . . . . . . . . . . . . . .  6-18
        6.5.11    Additional alignments  . . . . . . . . . . . .  6-18
        6.5.12    A PROSRCH strategy . . . . . . . . . . . . . .  6-19
        6.6     OTHER BIOSEARCH PROGRAMS . . . . . . . . . . . .  6-19



CHAPTER 7       MULTIPLE SEQUENCE ALIGNMENT

        7.1     CLUSTER ALIGNMENTS . . . . . . . . . . . . . . . . 7-3
        7.1.1     PILEUP . . . . . . . . . . . . . . . . . . . . . 7-3
        7.1.2     CLUSTALV . . . . . . . . . . . . . . . . . . . . 7-4
        7.2     MANUAL ALIGNMENT . . . . . . . . . . . . . . . . . 7-6
        7.3     ALIGNMENT DISPLAYS . . . . . . . . . . . . . . . . 7-7
        7.3.1     Threshold, Plurality and Weightings  . . . . . . 7-7
        7.4     BOXED GRAPHIC DISPLAYS . . . . . . . . . . . . . . 7-8
        7.4.1     PRETTYPLOT . . . . . . . . . . . . . . . . . . . 7-8
        7.4.2     PRETTYBOX  . . . . . . . . . . . . . . . . . . . 7-8



CHAPTER 8       FRAGMENT ASSEMBLY SYSTEM

        8.1     INTRODUCTION . . . . . . . . . . . . . . . . . . . 8-3
        8.1.1     Goals  . . . . . . . . . . . . . . . . . . . . . 8-3
        8.1.2     Summary of programs  . . . . . . . . . . . . . . 8-3
        8.2     FRAGMENT ASSEMBLY TUTORIAL . . . . . . . . . . . . 8-4
        8.2.1     NEWGELSTART  . . . . . . . . . . . . . . . . . . 8-4
        8.2.2     GELENTER . . . . . . . . . . . . . . . . . . . . 8-4
        8.2.3     GELMERGE . . . . . . . . . . . . . . . . . . . . 8-5
        8.2.4     GELVIEW  . . . . . . . . . . . . . . . . . . . . 8-5
        8.2.5     GELASSEMBLE  . . . . . . . . . . . . . . . . . . 8-6
        8.2.6     GELOVERLAP . . . . . . . . . . . . . . . . . . . 8-7
        8.2.7     GELENTER as an editor  . . . . . . . . . . . . . 8-8
        8.2.8     Redefining the project . . . . . . . . . . . . . 8-8
        8.2.9     Detecting vector the unofficial way  . . . . . . 8-8



CHAPTER 9       FINDING SEQUENCE MOTIFS

        9.1     LOCATING PARTIAL SEQUENCES . . . . . . . . . . . . 9-3
        9.2     PROTEIN STRUCTURE MOTIFS . . . . . . . . . . . . . 9-4
        9.2.1     Retrieving PROSITE documentation . . . . . . . . 9-4



CHAPTER 10      SEQUENCE PROFILING

        10.1    THE PROFILING METHOD . . . . . . . . . . . . . .  10-3
        10.2    DESCRIPTION OF THE PROFILE TABLE . . . . . . . .  10-3
        10.2.1    A profile as a symbol comparison table . . . .  10-4
        10.3    FINDING A NEW MEMBER OF THE ALIGNMENT  . . . . .  10-4
        10.4    PROFILING TUTORIAL . . . . . . . . . . . . . . .  10-5
        10.4.1    PROFILEMAKE  . . . . . . . . . . . . . . . . .  10-5
        10.4.2    PROFILEGAP . . . . . . . . . . . . . . . . . .  10-5
        10.4.3    PROFILESEARCH  . . . . . . . . . . . . . . . .  10-6
        10.4.4    PROFILESEGMENTS  . . . . . . . . . . . . . . .  10-6
        10.4.5    PROFILESCAN  . . . . . . . . . . . . . . . . .  10-7
        10.5    NUCLEOTIDE PROFILING . . . . . . . . . . . . . .  10-7



CHAPTER 11      RNA SECONDARY STRUCTURE

        11.1    IDENTIFYING INVERTED REPEATS . . . . . . . . . .  11-3
        11.2    CALCULATING RNA FOLDING  . . . . . . . . . . . .  11-4
        11.3    DISPLAY OF FOLDING STRUCTURES  . . . . . . . . .  11-4
        11.4    ALTERNATIVE STRUCTURES . . . . . . . . . . . . .  11-5



CHAPTER 12      GCG COMMAND FILES
        12.1    WHAT ARE THEY? . . . . . . . . . . . . . . . . .  12-3
        12.2    EDITING GCLUSTALV.COM  . . . . . . . . . . . . .  12-3
        12.3    STRINGSEARCH . . . . . . . . . . . . . . . . . .  12-4
        12.4    OTHER COMMAND FILES  . . . . . . . . . . . . . .  12-4



CHAPTER 13      GCG DATA FILES

        13.1    LOCAL DATA FILES . . . . . . . . . . . . . . . .  13-3
        13.1.1    Enzyme Tables  . . . . . . . . . . . . . . . .  13-3
        13.1.2    Codon Usage (or Codonpreference) Tables  . . .  13-4
        13.1.3    Symbol Comparison Tables . . . . . . . . . . .  13-4
        13.1.4    Translation Tables . . . . . . . . . . . . . .  13-5
        13.1.5    Yet more data!!  . . . . . . . . . . . . . . .  13-5
        13.2    PLASMIDMAP FILES . . . . . . . . . . . . . . . .  13-6
        13.2.1    Displaying blocks and ranges   . . . . . . . .  13-6

CHAPTER 14      DATABASE HANDLING

        14.1    ORGANISING YOUR OWN DATABASES  . . . . . . . . .  14-3
        14.1.1    Using files of sequence names  . . . . . . . .  14-3
        14.1.2    Create an indexed database . . . . . . . . . .  14-3



CHAPTER 15      PHYLOGENY INFERENCING

        15.1    THE PHYLIP PACKAGE . . . . . . . . . . . . . . .  15-3
        15.2    CONVERTING TO PHYLIP FORMAT  . . . . . . . . . .  15-3
        15.3    THE DNADIST PROGRAM  . . . . . . . . . . . . . .  15-3
        15.4    THE NEIGHBOR AND FITCH PROGRAMS  . . . . . . . .  15-4



CHAPTER 16      BEYOND GCG .......

        16.1    SUBMITTING A SEQUENCE TO THE DATABASES . . . . .  16-3
        16.1.1    Copy the Submission Form . . . . . . . . . . .  16-3
        16.1.2    Enter the Details  . . . . . . . . . . . . . .  16-3
        16.1.3    Mail the Sequence  . . . . . . . . . . . . . .  16-3
        16.1.4    Acknowledgement  . . . . . . . . . . . . . . .  16-3
        16.1.5    Authorin . . . . . . . . . . . . . . . . . . .  16-3
        16.2    OBTAINING SOFTWARE FROM REMOTE SITES . . . . . .  16-4
        16.2.1    The EMBL file server . . . . . . . . . . . . .  16-4
        16.2.2    The Indiana FTP site . . . . . . . . . . . . .  16-4
        16.3    BIOSCI BULLETIN BOARD  . . . . . . . . . . . . .  16-5
        16.3.1    Topics . . . . . . . . . . . . . . . . . . . .  16-5
        16.3.2    Subscription requests  . . . . . . . . . . . .  16-6
        16.3.3    Sending a message to a bulletin board. . . . .  16-6
        16.3.4    Reading the bulletin board . . . . . . . . . .  16-6
        16.3.5    Cancelling subscriptions . . . . . . . . . . .  16-7
        16.3.6    Biosci and local bulletins on Agrenet  . . . .  16-7
        16.4    THE INTERNET GOPHER  . . . . . . . . . . . . . .  16-8



APPENDIX A      DATABASE SEARCH RESULTS

        A.1     FASTA SEARCH OF PLATELET.SEQ WITH A WORD SIZE = 6  A-3
        A.1.1     Histogram  . . . . . . . . . . . . . . . . . . . A-3
        A.1.2     The 100 best scores  . . . . . . . . . . . . . . A-4
        A.1.3     The alignments   . . . . . . . . . . . . . . . . A-6
        A.2     FASTA SEARCH OF PLATELET.SEQ WITH A WORD SIZE = 1  A-7
        A.2.1     The histogram  . . . . . . . . . . . . . . . . . A-7
        A.2.2     The 100 best scores  . . . . . . . . . . . . . . A-8
        A.3     TFASTA SEARCH OF PLATELET.PEP WITH A WORD SIZE =
                1  . . . . . . . . . . . . . . . . . . . . . . .  A-10
        A.3.1     The histogram  . . . . . . . . . . . . . . . .  A-10
        A.3.2     The best 100 scores  . . . . . . . . . . . . .  A-11
        A.3.3     The alignments . . . . . . . . . . . . . . . .  A-13



APPENDIX B      USING A SEQUENCE DIGITISER

        B.1     DIGISEQ  . . . . . . . . . . . . . . . . . . . . . B-3
        B.2     INTERPRETATION OF SEQUENCE GELS  . . . . . . . . . B-4
        B.2.1     Band distribution  . . . . . . . . . . . . . . . B-4
        B.2.2     Variable band intensity  . . . . . . . . . . . . B-4
        B.2.2.1   The C rules: . . . . . . . . . . . . . . . . . . B-4
        B.2.2.2   The A rules: . . . . . . . . . . . . . . . . . . B-4
        B.2.2.3   Other rules: . . . . . . . . . . . . . . . . . . B-4
        B.3     KERMIT FILE TRANSFERS  . . . . . . . . . . . . . . B-5
        B.4     EMUTEK FILE TRANSFERS  . . . . . . . . . . . . . . B-6



APPENDIX C      DOTPLOT DIAGRAMS

        C.1     A SEQUENCE COMPARED TO ITSELF  . . . . . . . . . . C-3
        C.2     SEQUENCE DIVERGENCE  . . . . . . . . . . . . . . . C-4
        C.3     INSERTIONS AND DELETIONS . . . . . . . . . . . . . C-4
        C.4     TANDEM DUPLICATION . . . . . . . . . . . . . . . . C-5
        C.5     INTERNAL REPEATS . . . . . . . . . . . . . . . . . C-5



APPENDIX D      MISCELLANEOUS

        D.1     SEQUENCES USED IN THE EXERCISES  . . . . . . . . . D-3
        D.1.1     Main example mRNA sequence.  . . . . . . . . . . D-3
        D.1.2     Other RNA sequences  . . . . . . . . . . . . . . D-3
        D.1.3     Protein sequences  . . . . . . . . . . . . . . . D-3
        D.2     FURTHER READING  . . . . . . . . . . . . . . . . . D-3


=============================================================================

                                  PREFACE
     History

        This document began, in 1989, as a set of exercises for a  training
        course  in  the  use of the GCG package.  Its main intention was to
        introduce research workers,  most  of  whom  were  novice  computer
        users,  to  molecular  biology  software.  The current document has
        evolved to include background notes and other software as  part  of
        the  training  course.   The  revised  aim:   to  provide  a  brief
        introduction to the facilities available within the AFRC's  VAX/VMS
        network called AGRENET, and beyond.

        This document was never intended to be a comprehensive coverage  of
        the  subject.  Many items of detail are omitted, which were covered
        in  short  verbal  presentations  during  the  course.   Given  the
        shortage  of such training material, one should not be surprised at
        the many requests for copies of the document from outside the AFRC.
        This is the main reason for making the document available generally
        at FTP sites.  If the user is prepared to  play  around  with  some
        data  and  explore,  then  this  document may be of some use.  Many
        users will find themselves using UNIX-based systems in  which  case
        the amendments for using GCG programs are quite minor:  the command
        line options use a space and a minus instead of a slash  key.   eg:
        on page 3-8 use:  mapsort -exclude=388,1020 -six

     Contents summary

        Chapter 1 is  very  AFRC-specific.   Chapter  2  is  essentially  a
        GCG-reference section with some AFRC-specific information included.
        Chapters 3-16 contain worked exercises and background  notes.   All
        the  examples show how the software behaves on Agrenet VAXes - some
        programs default to batch-submission, using local queue names.  All
        sequences  used  in the exercises are obtainable from the exercises
        themselves, or can be located using Appendix D.

     The course

        In its present form the course is given over a period of two  days,
        although  a  three day course might be more appropriate.  The order
        of presentation is intended as starting with the easy part,  moving
        to   progressively   more   complex   programs,  or  where  greater
        explanation is required.  A case is easily  made  for  providing  a
        course with the chapters in a completely different order.

     Acknowledgements

        My thanks to the many (hundreds of) people who  have  attended  the
        course   for   their  comments,  criticisms,  and  suggestions  for
        improvements.  I am particulary grateful to  the  following  people
        for  their  comments on the document and for their contributions to
        my own understanding.

        David Judge - Department of Genetics, University of Cambridge, U.K.
        Sarah McQuay - BRU, Kings Buildings, University of Edinburgh, U.K.
        Frank Wright - SASS, Kings Buildings, University of Edinburgh.

                            Cary O'Donnell 06-Sep-1993

==============================================================================
Contents of ftp tar file:

     Files without file extensions are plain text.

     2971 Sep 14 10:04 0ADVERT        - Training Course timetable
    18459 Sep 13 17:58 0CONTENTS      - Contents pages of manual
    45080 Sep 13 17:58 0Contents.PS
     3166 Sep 13 17:58 0PREFACE       - Preface to manual
     1894 Sep 20 16:19 0README        _ This file
    46197 Sep 13 17:58 Chapter01.PS
   154711 Sep 13 17:58 Chapter02.PS
   134566 Sep 13 17:58 Chapter03.PS
    72367 Sep 13 17:58 Chapter04.PS
    52480 Sep 13 17:58 Chapter05.PS
   329050 Sep 13 17:58 Chapter06.PS
    79583 Sep 13 17:58 Chapter07.PS
    65286 Sep 13 17:58 Chapter08.PS
    26510 Sep 13 17:58 Chapter09.PS
    41700 Sep 13 17:58 Chapter10.PS
    27116 Sep 13 17:58 Chapter11.PS
    19249 Sep 13 17:58 Chapter12.PS
    47195 Sep 13 17:58 Chapter13.PS
    15447 Sep 13 17:58 Chapter14.PS
    27291 Sep 17 11:20 Chapter15.PS
    73666 Sep 13 17:58 Chapter16.PS
    81145 Sep 13 17:58 appendixA.PS
    50164 Sep 13 17:58 appendixB.PS
   270018 Sep 20 15:57 appendixC.PS
    25011 Sep 17 11:19 appendixD.PS
    46631 Sep 17 15:53 cover.ps     - Cover page
   338776 Sep 13 17:58 dapjob.ps    - page 6-15
    56764 Sep 13 17:58 fasta.ps     - page 6-9
 Cary O'Donnell 20-Sep-1993


