DNA Stacks Info Features of DNA Stacks HyperCard 2.0 stacks Version 1.1 (12/95) Copyright 1990-1995 D. J. Eernisse Email: DEernisse@fullerton.edu WWW: http://biology.fullerton.edu/people/faculty/doug-eernisse Mailing address: Dept. Biol. Sci. MH282, Calif. St. Univ., Fullerton, CA 92634 For a description of version 1.0 see: Eernisse, D. J. 1992. DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules. CABIOS 8: 177-184. DNA Stacks is a software package of HyperCard 2.x stacks providing complementary sets of utilities for viewing and manipulating molecular data on a Macintosh computer. The best way to get the most current version of DNA Stacks is from the author's home page: . The stack package has three main stacks: DNA Translator, Aligner, and Codon Usage. The package also includes a stack called Startup, which is used internally by the other stacks, and File Combiner, which is a more general utility stack useful for combining folders of sequence files together. DNA Translator stack has two kinds of "cards." A gene mapping facility draws and displays 2 linearized gene maps for comparison, automatically adjustable to desired scales and locations along all or part of the mapped molecule. Maps and select corresponding complete sequence data are provided for most available fully documented mitochondrial and chloroplast DNA (mtDNA and cpDNA) sequences, which include 26 animal, 1 yeast, and 1 ciliate mtDNAs and 3 green plant cpDNAs. Additional gene maps may be user-created by a direct file conversion of standard GenBank- or EMBL-format documented sequences and their features tables. A user can extract the sequence of any particular mapped gene or region by clicking on the corresponding place on the gene map or by menu selection of the gene or feature and mapped taxon. DNA Translator utility cards are a workbench of specialized sequence manipulation tools, catering especially to those with interests in phylogenetic analysis. DNA Translator can also import single or multiple sequences in a variety of formats, including multiple aligned sequence output of various programs (EuGene, Prophet, CLUSTAL, Nexus, PHYLIP, etc.), and then further manipulate, interleave, compare or translate the gene sequences to amino acids. Multiple aligned sequences can be converted to Nexus, Hennig86 or PHYLIP for subsequent phylogenetic analysis. Most known deviations from the "universal" code, which are typical for mtDNAs, may optionally be used during translation. Note that extraction of animal mtDNA sequences and many of the translation/report utilities are now available and easier to use from the "Data" menu of Aligner stack, described next. Aligner is a stack for manual editing and display of multiple sequence alignments. Aligner 1) handles up to 100 sequences, each up to 30,000 bp or amino acid residues in length; 2) displays sequences as "interleaves" with the number of lines automatically adjusting for the number of sequences imported; 3) imports or exports multiple sequences by default in "named" string format (i.e., 'NameAGCTGA...'), which is the same as PAUP's "simple text" format; 4) additionally will import/export numerous other formats that are widely used, including support for interleaved sequences; 5) additionally will export a wide variety of alignment/sequence reports or filtered sequence data; 6) can toggle between two window widths, either full 640 or 512 pixels ("standard" or "classic" monitor display widths) and can be expanded up to a 1200 pixel window length; 7) can quickly toggle between match characters (dashes) to the first sequence and no match characters; 8) can more slowly color-code base pairs, amino acids, or codons, including support of alternative genetic codes; 9) supports addition/deletion of one or more gaps to all but the current sequence; 10) speaks nucleotide/peptide characters during keyboard entry or after entry starting from the current insertion point; 11) extracts a selection of about 15 DNA or corresponding amino acid sequences to Aligner, where the gene is any animal mtDNA gene and the sequences are a selection of either metazoans or mostly vertebrate metazoan sequences; 12) allows one to align amino acids, then introduce gaps to the corresponding unaligned nucleotide strings to preserve the amino acid alignment; 13) has an error checking facility to check current alignment against the starting strings, disregarding any added gaps; 14) has optional help facilities, including this general overview, a tutorial to using Aligner, and a baloon-like help feature that optionally displays the function of fields or buttons as the cursor enters them. Codon Usage stack displays codon and amino acid usage data as it differs for a wide variety of organisms and organelles. Dynamically-constructed graphs display percentage usage for a particular codon relative to the other 63 codons or to only other codons with equivalent amino acid coding. A user can select from a diverse list of some 70 taxon-organelle combinations. The codon usage, translation and gene mapping data can be exported or imported in spreadsheet format for incorporating additional molecules of interest or in various standard formats (UWGCG, GenBank, EMBL,Intelligenetics). A built-in editor supports sequence entry with optional computer-speaking for error checking, and a variety of output conversion options. Summary of enhancements to versions since 1.0 (Numerous bug fixes are not detailed here) Copyright 1990-1995 D. J. Eernisse deernisse@fullerton.edu New features added since version 1.0 (Eernisse, 1992) General: DNA Translator stack: * gene mapping matrices can now be exported/imported from the menu of gene mapping cards, whereas before these features were hidden. * Added a feature to report sequence composition statistics by codon * modified interleave" format conversion so that first and last taxon names are guessed at before making the user enter them manually * added support for Phylip input files and for the output created by GCG's "Pilepup" and Jotun Hein's alignment algoriths. * multiple animal mtDNA gene sequences can now be exported to a single file or directly to Aligner, and amino acid sequences can be exported at the same time, in the case of coding DNA sequences * added capabilities to test how likely it is that "signal" in a sequence alignment could be due to experimental error, rather than historical (phylogenetic) signal * added a recoding option so that amino acid alignments can be treated as if it were DNA or RNA data (assuming equal weighting is used). * it is now possible to create Nexus-format "Charsets" for use in PAUP from a comment line in a user's alignment * it is now possible to extract only those sites in an alignment that were included in a charset to a second alignment file. * one can now output matrices of pairwise transition/transversion tallies or pairwise percent identity for any number of named strings and, for the "Substitution" (transition/transversion) matrices, there is an option to specify whether the tallies should be reported by codon position for for all positions. * added a feature for combining two alignments into a single alignment, provided at least one sequence is common to both alignments * streamlined the GenBank/EMBL conversion so that a minimum of prompts are required from the user * improved certain hierarchical menus in DNA Translator * added facilities to create PAUP blocks to automate the calculation of "Support Index" values for all nodes of all input trees in Nexus format * added the ability to calculate synonymous and nonsynonymous changes, using the method of Nei and Gojobori * added a special purpose data conversion to make it possible to treat 2nd positions normal (A,C,G, or T) while ignoring transitions at all 1st and 3rd positions * added the ability to compute a log determinant distance matrix (see Mol. Biol. Evol. 7/94) * added support for newer versions of HyperCard or HyperCard Player, but did not use any HyperTalk commands specific to the newer versions so the stacks should, in theory, work on all versions back to version 2.0 Aligner stack: * added many options to color the alignments, including the ability to color triplets of DNA that corresponds to amino acid coding, or coloring according to chemical, functional, hydrophobic/hydrophilic, and ionic charge groups of amino acid sequences (or DNA triplets that code for amino acids. * added an online tutorial that will give users practice working with some of the more advanced features of Aligner * sequences with different genetic codes can be colored in a single alignmtent * improved sequence selection, sequence entry, and key filtering for editing data in Aligner * improved several of the navigation buttons and indicators * the card size in Aligner is now adjustable so that if one has ample RAM, it is possible to draw even an entire lib * improved handling of RAM shortages * add n gaps to all but the currently selected sequence * calculate a DNA alignment that is identical to the corresponding protein alignment * added an error checking facility to check manually edited sequences against the starting strings, disregarding added gaps, etc. * added a color editor facility so that the colors used for nucleotides or peptides can be specified by a user New features added since version 1.0n6 (last version posted to indiana archives , 8/94) General: * a new stack "Codon Usage" was created to split off what was formerly part of DNA Translator * revised help facilities throughout * added a more colorful "About DNA Stacks" dialog * enhanced the appearance of the startup process * new icons * resources and scripts common to Aligner and DNA Translator were moved to a new "Startup" stack used by both of the other stacks * the new "Startup" stack also helps with more reliable expansion of the card size in Aligner * added support for newer versions of HyperCard or HyperCard Player, but did not use any HyperTalk commands specific to the newer versions so the stacks should, in theory, work on all versions back to version 2.0 DNA Translator stack gene mapping cards: * reorganized menus on gene mapping cards, with some options now available from a new "Extract" menu * import/export features were enhanced for gene maps * gene map selection lists now also include common names as well as scientific names, and these are listed in approximately phylogenetic, rather than alphabetic, order * more than 25 animal mtDNA maps are now available, and these have been split into a "Metazoan mtDNA" card and a "Vertebrate mtDNA" card DNA Translator stack utility cards: * these are still used but mostly without the user ever knowing it, because most conversion/report options that were formerly only available from utility cards are now more directly accessible from the "Data" menu of Aligner * added a "Multistate -> Binary" conversion * added a feature to find all repeated sequences of n length in a large genomic sequence, with an option to decrement n until such repeats are found Aligner stack * completely reorganized menus so that all file opening, saving, and manipulations of data in the editor can be performed using the "Data" menu. * most of the data format conversions or reports that were formerly possible only from utility cards in DNA Translator are now available from the simpler menu structure of Aligner * added support for Phylip input files * it is now possible to use Aligner's menu to extract multiple animal mtDNA sequences for any specified gene regions directly to Aligner, whereas it was formerly necessary to extract them from the gene mapping facility of DNA Translator * Aligner now automatically detects the proper genetic code if the sequences were extracted from the gene mapping facility * improved export of Nexus (PAUP/MacClade) format from within Aligner stack, including automatic provisions to interleave, match, and set data type appropriately The following is a somewhat out-of-date listing of conversions supported from either the "Data" menu of Aligner stack or (if you can't find it there) from the "Convert" menu or "field menus" of a utility card in DNA Translator. A. Import Formats Accepted 1. Multiple sequence alignments a. Simple text files of string sequences, with or without interleaves or match characters b. MBIR (EuGene, Prophet) "Doolittle" progressive alignments c. CLUSTAL nucleo- or peptide output. d. Nexus (PAUP, MacClade) files e. Phylip input files (from Aligner stack only) f. Phylip output files (i.e., matrix display option specified) [Also normal Phylip input files, at least from the "Data" menu of Aligner stack] g. Wisconsin GCG Pileup output files h. Treealign (by Jotun Hein) output files 2. String sequences a. String sequence text files of the form: Name AGCTACCT... b. Data input from built-in sequence entry editor c. Sequences extracted from the provided gene mapper d. String output generated by many commonly used programs 3. GenBank/EMBL or PIR-CODATA documented sequence files a. Converted to string sequences for use in conversions below b. GenBank or EMBL documented features to spreadsheet matrix c. Matrix directly convertable to rescalable gene map d. Gene map allows extraction of any mapped or custom subsequence B. Conversions provided 1. Multiple sequence alignments (A1) converted/exported in the following formats: a. Nexus (PAUP, MacClade) with optional cost matrices added b. Hennig86 (Nucleotide data only) c. Phylip 3.x formats (i.e., with or without interleaves) d. Multiple sequence strings (A2a) with gaps preserved or deleted 2. String sequences (A2a,B1d) converted/exported in the following formats: a. Unmodified to be reimported or converted as needed b. As straight single sequence strings for import by MBIR (EuGene, Prophet), Authorin, Gene Construction Kit, etc. c. Intelligenetics format for import by many programs d. GenBank, EMBL, FASTP, or PIR-CODATA formats for viewing or import by many programs e. Simple interleaved format with optional match characters for manual alignment or reimport (A1a) f. Direct export of corresponding sequences (e.g., all animal mtDNA cyt. B genes) from gene mapping facility to Aligner stack [Animal mtDNAs can now be extracted directly from the "Data" menu of Aligner.] g. A subblock of aligned strings to MULFOLD (RNA folding) format h. Consensus sequence created by combining two or more sequences employing ambiguous nucleotide symbols (IUPAC-IUB) i. Subsequence, DNA <-> RNA, upper <-> lower case, ACGTU -> 01233, filter IUPAC-IUB codes, complementary strand sequence conversions j. Tally and display autapomorphies (site-unique nucleotides or gaps) for sets of aligned nucleotide strings k. Compute compositional statistics of each nucleotide string l. Compute pairwise (uncorrected) identity comparison matrices m. Compute pairwise transversion/transition comparison matrices either for all positions or by codon position 3. DNA/RNA -> peptide translation > 1st, 1st & 2nd, or all 3 possible reading frames output > Formatted output displays codons below amino acid abbreviations with either 1- or 3-letter amino acid abbreviations > String output is useful for export or conversion (B2) > Can use standard or any of the provided codon usage tables > Custom codon usage tables can be added using a table editor > Codons with ambiguous nucleotide symbols (IUPAC-IUB) are translated as appropriate when translation is unambiguous > Optional termination when stop codons are encountered 4. Peptide -> DNA translation > Peptide strings can be backtranslated, using the reverse of whichever codon usage table the user selects > Backtranslation uses IUPAC-IUB conventions for ambiguous nucleotides D. Nucleotide or peptide sequence entry [Aligner does most of this so that DNA Translator's editor is not recommended.] > Special Editor field has autoformatting capabilities > Adjustable computer speaking supported during or after entry > Sequences can be edited, manipulated, or exported (B2) E. Data provided with stack > Codon usage patterns for about 70 organism/organelle combinations [Codon Usage is now a separate stack.] > Most mitochondrial variation in coding supported (3) > Gene maps of 28+ mtDNAs plus string sequences for most > Gene maps and matrices for 3 green plant chloroplast DNAs F. General Features > Online help facility is available from any card in stack > References, taxon names, and sample data provided > Pulldown or popup menus or dialogs used for commands > Buttons or pulldown menus are used for stack navigation > Data fields can be locked or unlocked for text entry > Data fields shrink down or expand by clicking > Contents of fields exportable as text or printable > Codon usage tables output for any incorporated taxon [Codon Usage is now a separate stack.] > Gene maps and codon usage graphs have special printing options > Default word processor for exported text files can be chosen > PAUP/MacClade specification for exported Nexus files > PAUP/MacClade open directly from the stack > Add gene mapping or utility cards as required > All data, scripts and resources used are easily accessible > Custom XFCNs by Nigel Perry enable efficient manipulations G. Hardware and System Requirements > Requires any Macintosh that can run HyperCard 2.0 or greater > HyperCard 2.x (2.0 or greater) or HyperCard Player (widely available for Macintosh users) > Macintosh System 6.0.5 or greater > 2 or more MB Ram for HyperCard's partition recommended H. Availability of current version > By WWW at > By anonymous ftp to at least the following site Ftp.Bio.Indiana.Edu (after connection: 'cd molbio/mac' and 'get DNAStacks_xxx.sea.hqx') where xxx is the current version number (although this site is not guaranteed to have the most current version) > Request it from me by email and I will attach the current version (assuming your mail program can handle Eudora attachments for Macintosh). For further information, tips, trouble-shooting, and version history, see the online help facilities and tutorials. These stack are only free for any noncommercial use and are not public domain. The DNA Stacks package is copyrighted 1990-1995 by D. J. Eernisse. The external resources (XFCN/XCMD) used are copyrighted and have copyright restrictions similar to that of DNA Stacks (see DNA Translator and Aligner stack scripts for details).