When genomics pioneer George Church recently announced that he and his team at Harvard’s Wyss Institute for Biologically Inspired Engineering will vie in a September 2013 competition to rapidly and accurately sequence 100 whole human genomes at a cost of $1,000 or less each, he did not say which technology they would use to do it. That’s because quite possibly it has not yet been invented.
Church’s Harvard Medical School lab is known for developing or collaborating on some of today’s most advanced DNA analysis tools, including the Polonator instrument and a recent upgrade, “LFR,” or long fragment read sequencing—recently shown to be highly accurate and cost-effective—and nanopore sequencing, which has been licensed and commercialized by several companies. Since the dawn of the human genome project, Church and his colleagues have been instrumental in bringing the cost of DNA sequencing down one-million fold. But genomics technology continues to evolve so quickly that it’s impossible to predict what the state-of-the-art will look like nine months from now.
So far, Church’s team has only one opponent for the $10 million Archon Genomics X Prize. Corporate giant Life Technologies entered the contest with its Ion Proton Sequencer, invented by the team’s leader Jonathan Rothberg.
Organizers say the challenge aims to produce the world’s first “medical grade genome.” Being able to cost effectively sequence DNA at such a level of accuracy will “transform genomic research into usable medical information to improve patient diagnosis and treatment,” they say.
What’s more, the contest will promote research into preventive care and the genetics of well being in a field typically focused on investigating the genetic causes of disease. By sequencing the genomes of people who have reached age 100, X Prize organizers say the challenge “presents an unprecedented opportunity to identify those ‘rare genes’ that protect against diseases, while giving researchers valuable clues to health and longevity.”
Church recently sat down with Techonomy in Boston to discuss the competition and its goals.
Do you expect to use technology developed in your own lab to meet the X Prize challenge?
We’re keeping our options open and possibly will be using hybrid methods, but up until September we can change our strategy. We just need to be transparent about what strategy we’re using, which is fine with us; we’re academics—we have no secrets.
There are a total of 44 genome sequencing technologies. Nine are available, and six more will be soon. There are no fewer than four or five sequencing technologies that we’ve been developing intellectual property on in our lab. And we are working on two to four more modifications for those techniques.
We’re happy to collaborate with just about anyone—we already collaborate with just about everyone. Any other technology providers could enter the contest within the next year with us or as a third contestant.
We felt there needed to be at least two teams. And it’s kind of a nice story to have a fairly academic team competing with a corporate team. The advantage we have over a company team is that we’re not a company. We can use the Ion Torrent, which is, in principle, our competitor. They have the advantage that they can use the latest Ion Torrent, but they can’t use all the companies we’re using because that wouldn’t look good.
Will you test all these different technologies during the next nine months?
We will sequence 100 genomes in practice before going in [to the competition]. We’re not going to go in there without testing a technology or combination. But no, we’re not going to test lots of technologies—just the best one or two in a given moment, and that may change by summer.
Do you expect that a combination of technologies will be better than a single technology?
We’re not trying to play favorites, just pick the best combination, and it probably will be a combination of technologies. If you combine two technologies, now you have a hybrid single technology.
What does is mean by “medical grade” genome sequence?
What Archon X Prize means by medical grade is that we need standards and community ways of testing whether a sequencing technology is ready for a particular standard of accuracy and phasing. Quality requires that you sequence real genomes where you have some kind of gold standard and can say how many errors you get.
With LFR we estimate that errors are one in 10 million. Most other technologies have a 100-times worse error rate and almost none of them present any phasing. That’s incredibly important and it’s hard to do that with exome sequencing or selective sequencing or, for that matter, all the other whole genome sequencing approaches. We think the methods we’re working on—the Polonator, LFR, and Nanopore sequencing—are, so far, the only ones addressing that.
What will be the difference between the quality of the genomes to come out of this contest and, say, genomes from the Personal Genome Project or those produced by 23andMe?
One of the Personal Genome Project genomes was done by LFR, and we hope to do more that way, so that’s the medical grade standard. We don’t consider a genome done [being sequenced] until it’s a whole genome sequence, and all Personal Genome Project genomes are whole genome sequences, not exome sequences (which present only the coding regions of DNA).
I’m an advisor for 23andMe and what they provide is incredible—they give people something to think about, it’s very educational, and it engages people in research. But 23andMe does not even provide exome sequences. They provide SNP sequences aimed at common [genetic traits], which tend not to be medical. Occasionally they’ll include a few medical alleles but it’s not their intention.
Does the reduced approach have something to do with how much storage space a whole genome sequence takes up?
Most people would say it takes 100 or 1,000 gigabytes to store a genome sequence. But that’s because they’re bad sequences full of errors. You have to store up all the redundant reads and error estimates, information about which base is more probable than which base, and you keep all that raw data. That it fills up 100 gigabytes easily, a terabyte if you’re undisciplined or have very high coverage. But if you could reduce that down to an actual sequence and compare it to a reference, computer science has shown that you can compress that down to 4 megabytes and store 4,000 of them on a 16-gig USB.
So then the quality of what you’ll produce for the prize challenge means that these 100 genomes of 100-year olds will all fit on one USB drive?
We’re not guaranteeing that. We’re aiming for that. But the criteria of the prize have nothing to do with verbosity. They’re worried about accuracy, completeness, haplotyping, cost, and speed.
Will your lab focus exclusively on this challenge in the coming year?
This will not greatly change our priorities. The X Prize work goes hand in hand with what we’re already doing.