Well, it's good to be a computer scientist for a change in this field ;-)
I am pretty sure that this problem is NP-Hard (a concept related to
NP-completeness, which you may have heard of), i.e., it takes exponential
time to solve this problem.
I had grabbed my undergraduate Algorithms book off the shelf, Horowitz &
Sahni, _Fundamental of Computer Algorithms_ ISBN 0-914894-22-6, Computer
Science Press, 1984. Originally I was trying to see if your problem could
be transformed into the classical minimum spanning tree problem, for which
there is a well-known fast algorithm known as Kruskal's algorithm. I played
with this for a bit without success, and then found the following problem
listed as exercise 20 on page 554:
[Set Cover] Let F = {Sj} be a finite family of sets. Let T be a
subset of F. T is a cover of F iff:
the union of all Si where Si is an element of T = the
union of all the Sj's over all j
The set cover decision problem is to determine if F has a cover T
containing no more than _k_ sets.
Show that the node cover decision problem is reducible to this
problem.
I looked earlier in the chapter, and learned that the Node Cover Decision
Problem is NP-hard, so therefore the Set Cover Decision Problem is as well.
The good news is, now that you know the name of the problem ("set cover"),
you can look around the net for suboptimal heuristics. Here are some
resources:
http://mscmga.ms.ic.ac.uk/jeb/jeb.html#sethttp://x86.cs.duke.edu/~mlittman/courses/Archive/cps130-97/lectures/lect23/n
ode16.html
HTH,
-Jonathan
"Charlie" <cckim at stanford.edu> wrote in message
news:Pine.GSO.4.31.0109131835160.232-100000 at saga12.Stanford.EDU...
>> I have an algorithmic problem that I suspect has already been solved long
> ago, but, alas, I am a biologist.
>> And so:
>> I have a large number of sets of data, which includes many members each.
> I would like to known the smallest possible union of these sets of data
> that would include every member.
>> An example:
> D1 = (A, B, C, D) (dataset 1 contains members A, B, C)
> D2 = (A, B, D)
> D3 = (C, D, E)
> D4 = (Z)
>> The smallest set of data which would include all members would be D1, D3,
> and D4.
>> Does anyone know an algorithm that does this?
>> Thanks in advance,
> Charlie
>