Bar Refaeli, DNA Sequencing and Cloud Computing

Much like Bar Refaeli and Leonardo DiCaprio, DNA Sequencing and cloud computing go hand in hand together.

[picapp align=”none” wrap=”false” link=”term=Bar+Refaeli&iid=3965233″ src=”5/e/7/5/PicImg_Sports_Illustrated_Swimsuit_a842.jpg?adImageId=8071751&imageId=3965233″ width=”390″ height=”594″ /]

I had a  very  interesting conversation with a friend yesterday about DNA Sequencing and cloud computing.

My friend is leading one of the largest cancer genome research projects in the world (and  yes, he is extremely  bright).

It appears that there is a great progress in DNA sequencing technology, based on chemical process. The pace is much faster than Moore’s law. As a result the budgets are shifting from the chemistry side to the computational side.

In the past, the budget would be 90% for biology and 10% for analyzing the data coming our of the DNA.

As the sequencing costs have fallen by orders of magnitude there is more and more data ( a single patient genome data is one TeraByte).

The more data , the more computing power needed to analyze it and hence the budget split becomes 50-50.

Each computation can take up to 24 hours, running on 100 cores mini grid.

[picapp align=”none” wrap=”false” link=”term=DNA&iid=7062711″ src=”c/a/5/d/SCIGENOME_737a.JPG?adImageId=8071402&imageId=7062711″ width=”500″ height=”332″ /]

In theory, such tasks are great for cloud computing IAAS (Infra Structure as a Service) platforms or even PAAS (Platform as a service) solutions with Map-Redux capabilities.This EC2 Bioinformatics post provide interesting examples.

In practice there are three main challenges

  1. Since Cancer research facilities need this server power everyday, it is cheaper for them to build the solutions internally.
  1. To make things even more challenging, the highest cost in most clouds is the bandwidth in and out of the cloud. It would cost $150 to store one patient data on Amazon S3, but $170-$100 to transfer it into S3.
  1. Even if the cost gap can  be mitigated, there can be regulatory problems with privacy of patients data.After all its one person entire DNA we speak about. Encryption would probably be too expensive, but spiting and randomizing the data can probably solve this hurdle.

So, where do clouds make most sense for this kind of biological research ?

One use case is the testing of new improved  algorithm. Then, the researchers want to run the algorithm on all the existing data, not just the new one.

They need to compare the results  of the new algorithm with the old algorithms on same data set.They also need to finish the paper on time for the submission deadline🙂.

In such scenarios there is a huge burst of computation,needed on static data, at a very short period of time.Moreover,  if the data can be stored on shared cloud, and used by researchers form across the world, than data transport would not be so expensive in the overall calculation.

These ideas are fascinating and hopefully would drive new solutions, cures and treatments for cancer.

[picapp align=”none” wrap=”false” link=”term=genome&iid=96824″ src=”0093/03895531-6d57-46bd-a1ad-def577b31174.jpg?adImageId=8078279&imageId=96824″ width=”500″ height=”333″ /]

Tags: , , , , , , , , ,

2 Responses to “Bar Refaeli, DNA Sequencing and Cloud Computing”

  1. Yaniv Says:

    An internal cloud (not for the startups or the faint of heart) could be a good solution for such tasks.

  2. Efi P Says:

    These days they use farms of SONY PS3 for that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: