IUBio GIL .. BIOSCI/Bionet News .. Biosequences .. Software .. FTP

Grid computing notes from GlobusWorld for bioinformatics

Don Gilbert gilbertd at bio.indiana.edu
Mon Jan 20 11:24:59 EST 2003

Here are some notes from bioinformatics grid/biogrid perspectives of
the recent GlobusWorld conference http://www.globusworld.com/ and
its life sciences workshop, that I attended last week. Debating
with myself, should I stay there another 2 weeks for the O'Reilly
bioinfo. conference? The warm sun is a nice change from 10oF
snowstorms here in Indiana, but when they turned off the wireless
network on Thursday, I knew I couldn't survive a netless
fortnight there  :)

The new Global Grid Forum lifesci. work group 
looks like it may be a useful, though many of those involved are
approaching life sci. from a computer industry perspective, and
could use help from bioinformatics folks steering toward our
needs for grid computing.

The gist of globusworld take home messages to me are:

-- Globus toolkit 3 (GT3) will be all OGSA, meaning WebServices,
XML messaging, with the reference/alpha implenentation
mostly in Java (Servlets/Soap/Tomcat/Jakarta/...), the
complied C parts to be copied over from GT2 or added
later.  This is good from my perspective, as I have yet
to get GT2 to fully compile on MacOS X.  From quick view of
docs, much of the core grid services for GT3 will run thru
Tomcat-based servlets and web services.  It will be work to
change from GT2 to GT3, but if you haven't invested time in
GT2, and are not in a hurry to use, start with the alpha
GT3, just released.  Read also the FAQs at
-- My perspective on biogrids is that data grid issues are most
critically in need of work, and several lifesci. talks tended to
agree with this.  One new hope from grid infrastructure is the
OGSA-DAI (data access and integration) project, which defines
data service registry, search/retrieval methods thru web
services (OGSA grid services). Still in alpha stage, it has
testable software, see http://www.ogsa-dai.org.uk/  It will form
a core part of the expanded globus 3 toolkit.

-- The globusworld lifesciences workshop had many interesting
talks.  Outline at
where you can get better summaries.

  --- Avaki.com is doing well selling usable, data-centric grid systems
  to lifesci. companies and  academics.  Their product literature
  is well worth reading, and if you want to buy a usable grid for
  lifesci. now, this is probably your best bet.  I think this is great,
  and shows need in bioinfo. for such; my hope is that public,
  open-source, freely usable grids for bioinfo. will catch up to
  some extent w/ Avaki's work.  Esp. of interest, Avaki grids will
  become OGSA compliant, and hopefully thru this will interoperate
  with any public academic grids.  It would be very good for
  our community if we can keep public and proprietary grids interoperating
  from data exchange perspectives at least.
  The way Avaki's data grid works now is to provide a 'grid file
  system' view of distributed data, and the important clue on how
  they do this is to make it work as Network File System daemon
  available on most OSes (MSWindows predominately for lifesci.
  --- Japan has biogrid efforts underway, including an interesting
  one at http://www.biogrid.jp/ From the data perspective, they
  are working at reformatting a lot of biology databanks into a 
  common XML structure (a formidable task... hope there is another
  --- Novartis now has a functioning PC Grid linking some
  2,000 PC (MSWindows) workstations, and are actively bringing
  grid computing in as an addition, eventual replacement, for
  high performance computing's centralized 'big boxes'.
  --- the eDiamond project in UK is building a very interesting
  practical biomedical grid
  for breast cancer screening and diagnosis; should be an example
  of how grid infrastructure can do important things otherwise
  not feasible. http://www.gridoutreach.org.uk/docs/pilots/ediamond.htm
  --- Singapore is building a biomedical grid using globus

  --- IBM Almaden is doing research on  OptimalGrid, based on globus,
  for computational biology
  --- Protein Databank (PDB), SCSD & UCSD folks have various grid
  things for biology in progress.  The NIH sponsored BIRN project
  http://birn.ncrr.nih.gov/ for neurosciences and biomedical computing
  is using globus and related grid infrastructure for practical
-- Trust & security issues are big all around in grid computing,
and much will be changing in globus and related grid systems to
address this (certificates, limiting internet port/firewall problems
by putting everything behind port 80, ...)

-- some important grid issues others mentioned that resonate w/ me:
 --- moving data on grid: costly, time consuming, an unresolved issue
 --- queries and answers: no standard grid query method yet;
     SQL not good enough, XML query maybe,
 --- complexity management is a problem for usable grids

-- IBM, Microsoft, Sun, HP, Oracle, software industry in
general, financial services industry,  others all have growing
interest, investments in grid, globus, and esp. OGSA/OGSI
as common framework.

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu

More information about the Bio-soft mailing list

Send comments to us at archive@iubioarchive.bio.net