Hi all,
As part of its recent winter update, the Protein Data Bank in Europe (PDBe;
http://pdbe.org) introduced a new, chemistry-based module of its PDB archive
browser (a.k.a. PDBeXplore). It can be accessed at:
http://pdbe.org/compounds
As you may (or may not) know, the PDB browser is an interface that enables you
to retrieve and analyse information on subsets of structures in the PDB using
various biological or chemical classifications. Previously released modules
enable browsing of the archive based on the Enzyme Class (http://pdbe.org/ec),
CATH domains (http://pdbe.org/cath), Pfam families (http://pdbe.org/pfam) or
Fasta-based sequence-similarity searches (http://pdbe.org/fasta).
The new compound-based browser allows you to enter the name of a chemical
compound of interest and analyse all the PDB entries that contain that
compound. Once you start typing the name (or three-letter code, if you happen
to know it) of a compound, a drop-down menu will show you matching compound
names and you can select the compound of interest. For instance, if you are
interested in Sildenafil, just start typing the name and once you get to
"sild", the only remaining matching compound is:
VIA -
5-{2-ETHOXY-5-[(4-METHYLPIPERAZIN-1-YL)SULFONYL]PHENYL}-1-METHYL-3-PROPYL-1H,6H,7H-PYRAZOLO[4,3-D]PYRIMIDIN-7-ONE
(Note: the auto-complete function uses information about synonyms from the
wwPDB chemical component dictionary.)
Select this compound, click on the "Submit" button and the central panel of
the browser will soon be filled with a table of all PDB entries that contain
this compound (currently there are only five). The right-hand panel will
contain more information about the compound you have selected, including a
chemical diagram, formula and SMILES codes.
Note: if you don't know if your compound occurs in the PDB or what its name
is, you can use the search options of PDBeChem - at http://pdbe.org/pdbechem -
including an option to draw a (sub)structure (to do this, click on the "edit"
button for the "Non-Stereo SMILES (Has Sub-Structure)" field in the PDBeChem
search form).
In order to demonstrate the powerful analysis options in the compound browser,
select a more abundant compound, e.g. ATP, and hit the "Submit" button again
(or click on this link: http://pdbe.org/compounds?ligand=ATP). The central
panel will show a list of the PDB entries containing the compound you
selected. The information here can be sorted by clicking on any of the column
headings in the table (clicking again reverts the sort order).
You will notice a number of tabs at the top of the central panel - they are
labelled "PDB entries", "Ligands", etc. Selecting one of these tabs gives you
a new "perspective" on the selected set of PDB entries (in this case, all
entries containing ATP or whichever compound you selected):
* PDB entries: this is the default view that the browser will present once you
have selected a compound. To download the entire table as a text file, use the
link in the right-hand panel. If you move your mouse over the PDB code of an
entry, it will show a miniature image of the structure; clicking the link will
open the PDBe summary page for that entry. Clicking on the "view" link will
load the structure in an interactive viewer so that you can study it in
detail.
* Ligands: this view displays a table of information about the additional
compounds found in all the PDB entries that contain your compound of interest.
The table is ordered such that the compounds that occur most often are at the
top. Each row in the table gives information about the three-letter code of
the compound, its chemical structure, chemical formula and systematic name.
The second column contains a link to information about the interaction
statistics of the compound with the standard amino-acid types. The link "Get
PDB entries" generates a list of all PDB entries containing both that compound
and our compound of interest.
* Structure folds: this view displays information about the fold families
(based on the CATH classification) encountered in the PDB entries containing
the selected compound. The tab also shows the distribution of CATH classes and
CATH architectures for the selected PDB entries as a pie chart. If you click
on a pie slice (or in the legend), only the appropriate CATH categories will
be shown in the table. By the way, the pie charts can also be printed or
downloaded.
* Assemblies: this view provides information about the possible quaternary
structure(s) of the selected PDB entries. A small table shows how many entries
are monomeric, homomeric and heteromeric, and two (clickable) pie charts show
a further breakdown of the homomeric and heteromeric structures respectively.
The main table in the tab shows the possible quaternary structure(s) for the
entries, together with (for non-monomeric structures) the accessible and
buried surface areas of the complex and the estimated free energy gain upon
formation of the complex.
* Sequence families: this view lists all Pfam families that are present in the
selected PDB entries.
* Organisms: the source organisms found in all selected PDB entries are shown
in a table. The clickable pie charts show the distribution of these organisms
based on superkingdom (bacteria, archaea, etc.) and genus (homo, rattus,
bacillus, etc.).
* Publications: this table contains details about the (primary) publications
of all the PDB entries with the selected compound.
* Authors: this tab lists the names of all the authors of the structures
containing the selected compound in the PDB, sorted by the number of those
PDB entries of which they are an author. This information is useful to
biologists and journal editors who wish to get in touch with, for instance,
crystallographers who have solved many structures containing a particular
ligand.
The information presented by the browser is taken from the PDBe database,
which means that it is always up to date.
Using this browser, it is now child's play to dig up titbits such as:
- the compound that occurs most commonly in entries that also contain ATP is
magnesium
- about 1 in 10 entries that contain NAD also contain FAD
- 95% of CATH domains occurring in entries with NAD are of the alpha-beta
class
- there is only one hetero-hexameric assembly in all the entries that contain
NAD, namely http://pdbe.org/3ket
- Johan Weigelt has deposited more structures of NAD-containing proteins than
Michael Sundstrom
Note: currently, the statistics presented by the browser are based on all the
PDB entries that contain your compound of interest, i.e. not only the
macromolecules to which it is actually bound in those entries.
By the way, all the previously released browser modules have been updated
recently to include clickable pie charts and retrieve results much faster than
before.
We welcome your comments, bug reports and feature requests on the compound
browser (and the other browser modules). Please use the feedback button at the
top of any PDBe web page.
--Gerard
---
Gerard J. Kleywegt, PDBe, EMBL-EBI, Hinxton, UK
gerard from ebi.ac.uk ..................... pdbe.org
Secretary: Pauline Haslam pdbe_admin from ebi.ac.uk