Interesting discussion guys.
As a programmer (PhD student) and ex biologist (3 years of undergrad genetics)
I see a few of these things as orthogonal issues. When you develop
something, you should use the best in each area, provided they can be
made to talk to one another - and if it's the best in its area,
then it should be flexible enough to talk. Despite the criticisms
and its still under construction nature, I think C++ is the best
general purpose programming language around. It's well supported,
a lot safer than C, more flexible, and allows a high level of
abstraction, or low level interaction if needed.
The previous comments about variation between compilers is salient,
we have found the best way to develop code is to run it on a
wide range of platforms, and compilers - (we use cfront, borland
and gcc), although we still avoid templates for portability
Class libraries are a separate issue - if you want to use generic
data types (strings, arrays, lists, hash tables .. trees etc)
and you should be, then there are plenty of libraries around -
we are writing a small one here cleanly to support our needs, so
we can control it, but I use libraries like LEDA for most of these
sorts of things.
Interfaces: This is one area where I prefer not to use C++, for
several reasons - it doesn't handle the event driven nature of
most modern interface systems very well - to its credit, it handles
things a lot better than most other languages, but something designed
specifically for interactive interface construction is more
useful (Tcl/tcl under X, although joining it to C++ is a little icky
the first time you try it), and Visual Basic under Windows.
If you want biological specific software systems, then C++ classes
are the way to go - the main problem being data formats - NCBI have
had quite a lot of success with a limited number of formats, but
in my short experience as a programmer in the biological sciences,
the sheer diversity of variations and varieties of data defeats
any ad-hoc attempts to succinctly capture them - well thought out
data definitions, maybe using a proper data description language,
eg ASN.1 (abstract syntax notation) - is a good way to start.
I am personally looking at the issues involved with integrating
different kinds of data into the mapping process, so I face these
issues - I'm forever writing small gawk scripts to convert data
type A into data type B, usually throwing something away in
the process. I applaud the intent, but make sure that you don't
(a) reinvent the wheel and (b) attempt things in an ad-hoc
manner - and please, if you write anything for public distribution,
make it (a) easy to install (gnu autoconf is a good way to go) and
(b) cleanly written and (c) document it.