Software simulates life at the molecular level

Read an article today in NY Times Science on Software emulating an organism.  I suppose it was only a matter of time but they have finally taken apart an entire bacterium’s DNA and other mechanisms, creating a complete simulation of it’s life. Apparently, the simulation is run on a cluster of 128 nodes and takes 9 to 10 hours to simulate the bacterium dividing.

They chose Mycoplasmine genitalium because it’s relatively simple, has only 525 genes and was relatively well researched with over 900 papers.  The simulation heralds a significant advance in the new field of computational biology.

As I understand it, the simulation is at the molecular level and emulates the 28 molecular processes present in the bacteria.  In order to do all this, they had to map out the cell’s metabolome (small molecular metabolites), transcriptome  (cell’s complete RNA activities), genome (cell’s complete hereditary information) and proteome (cell’s protein complex).  I suppose this is what went into the over 1900 parameters in the simulation.

Still unclear to me what they plan to do with the simulation other than verify its operation against the real thing which has already been done.    The caption on the visual abstract in the Cell Article shows a couple of things they hope to get out of the simulation such as predictions on cell behavior, discovering new cell biological process, and understanding cell evolution among other things.  [I wanted to republish it here but wasn’t allowed to use the graphic here without purchasing rights to it??]

It turns out this bacterium causes genital disease but the reason they chose it was because it was the simplest standalone organism they could find.

In any event this is just the start.  There are plenty of other organisms that computational biologist would be interested in simulating, like a human neuron and other neurological organisms.

Let’s see with 525 genes it takes 128 processing nodes and therefore, for a human cell with ~23K genes, it should probably take ~5600 nodes.  I am thinking Google or Yahoo should be able to take this on if they wanted to. Just unclear whether all the information about the human metabolome, transcriptome, genome, and proteome are completely available yet.


I didn’t see any indication of what the storage requirements were for the simulation but with 100+ servers it’s probably on the order of 10TBs, if not more.  A human cell would take substantially more storage.

Image: Wikipedia