码迷,mamicode.com
首页 > 其他好文 > 详细

为了获得1000美元的基因组,是否以牺牲质量为代价?

时间:2020-10-05 22:25:33      阅读:30      评论:0      收藏:0      [点我收藏+]

标签:summary   研究   point   ike   att   project   斯坦福   -o   多少   

为了获得1000美元的基因组,是否以牺牲质量为代价?
新系列的长读测序的崛起

作者:
Theral Timpson
根据2010年发表在《生物信息技术世界》(Bio-IT World)上的一篇文章,“1000美元基因组”这个词早在2001年就出现了。
威斯康辛大学的大卫·施瓦茨声称,他是在美国国家人力资源研究所的一次会议上创造出这个词的。
无论其起源如何,这个价值1000美元的基因组很快成为了下一代测序技术(NGS)快速发展的目标。

在NGS市场上占主导地位的Illumina公司声称,今年他们的HiSeq X Ten系统已经达到了这个目标,我们应该停下来问问他们到底实现了什么。
你从这1000美元中得到了什么?
此外,NGS将何去何从?

从下周开始,我们将推出一个新的系列,长读测序的兴起。

去年,在接受太平洋生物科学公司(Pacific Biosciences)首席执行官迈克·亨卡皮勒(Mike Hunkapiller)的采访时,我第一次听到了“长读”测序与“短读”测序的区别。
我问了他一个很明显的问题,关于他将如何与Illumina竞争,他回答说“短读技术”有严重的退步。
 

“等一下,”我记得当时在想,“迈克把Illumina的技术排除在外了吗?
他说的这些长篇阅读是什么?”

毫无疑问,《Illumina》是一个成功的故事。
在最新版的《福布斯》,马修爬虫冠Illumina公司与发光的一篇文章中,命名测序价格的快速下降后CEO,”弗定律。
这对Illumina公司的Jay Flatley来说是一个很大的赞誉,他从初创公司开始就领导着公司,他曾经以0.15美元/基的价格提供oligos,成为测序领域的主导者,现在又成为临床诊断行业的有力竞争者。
 

但这是你在任何地方都会听到的故事。
 

鲜为人知的是太平洋生物科学的转变和长读测序的兴起。
PacBio有一个备受吹捧的开始,筹集了超过6亿美元的资金。
但他们没有提供一些早期的宣传,他们可以与Illumina竞争的吞吐量在15分钟的人类基因组测序,这让业界感到失望。
事实上,PacBio不仅没有提高Illumina的高吞吐量,而且他们的技术有15%的低错误率。
更重要的是,他们的机器更贵。

然而,一年多来,我们一直在跟随研究人员的一个新兴趋势,即不仅使用PacBio的长读序列进行重新测序,而且还探测人类基因组的某些领域的短读技术。
从更好地描述RNA亚型到提高人类参考基因组的质量,越来越多的论文发表了,鼓吹PacBio的长阅读的新可能性。
 

现在,来自Oxford Nanopore的新助手的一些数据也让第一轮用户兴奋不已。
这是长读数据。
此外,我最近参观了Genia科技在山景城的设施,并展示了他们的新测序器,现在正在进行alpha测试。
Genia的首席执行官Stefan Roever说,他们的新芯片每次运行将读取超过100万次的长时间读取。

一旦你有长读和高吞吐量,使用短阅读技术吗?
我问斯蒂芬。
“不完全是,”他证实道。

为了记录长篇幅阅读的兴起,我们去了PacBio,问他们是否愿意把我们介绍给他们的一些用户,并赞助一个关于这个话题的系列活动。
他们所做的。

以吉恩·迈尔斯(Gene Myers)的故事为例。
早在90年代,Gene公司就在Celera公司从事人类基因组计划,帮助开发了用于序列比对的BLAST算法。
然后他放弃了排序,开始追求“更有趣的科学”。
他认为测序的未来是相当直接的,对一个科学家来说没有那么刺激。
 

迈尔斯在我们即将到来的采访中说:“基本上所有东西都短缺了,因为只有这样才能降低成本。”
“今天每个人是否经常但我不认为他们应该。
他们使用了100个bp的读数,而且组装很糟糕,”Gene说。

Gene现在回到了测序,在德国的Max Planck研究所工作。
他对长篇阅读很感兴趣。
他说,PacBio的技术在理论上第一次可以达到100%的准确率。
 

等一下。
那PacBio糟糕的准确率呢?
 

结果表明,尽管PacBio SMRT系统的错误率相当高,但这些错误是随机的。
所以如果你把序列叠得足够深,你就可以大大提高精确度。

我们问Gene,为什么这个行业这么长时间都在购买短读技术?

“我认为这是因为他们没有得到其他的东西。
这就是你得到的,”迈尔斯说。

我们从斯坦福大学的mikesnyder开始这个系列,他解释了PacBio的长读技术是如何开启他对转录组的研究的。Mike说,通常有各种各样的RNA亚型很难用Illumina的短读技术进行分析。他最近发表了几篇论文,表明通过PacBio的长阅读,他能够完全覆盖RNA分子的全长,从而描述出以前没有注释过的区域。
之后,我们将与PacBio的前CSO Eric Schadt谈话,他现在在纽约西奈山的伊坎研究所工作。在他目前的工作中,他正致力于将测序技术引入临床,他说PacBio的长阅读对于更好地了解基因组非常重要。埃里克的采访:
“为了推动超高的吞吐量,我们一直忽略了基因组中与一些单核苷酸攻击同等重要的结构特征,无论是其长串联重复序列的变化,还是更大的结构变化,或者是在癌症中起重要作用的局部变异——用目前的短读技术很难明确地描述这些东西。[短文阅读]适应了某些问题,并具有某些优势,使这一重大进展得以实现,但它们绝对没有像我们需要的那样解决整个问题。”

 

为了应对这些挑战,我们开发了一种改进的技术,除了提高我们对基因组转录组和结构变异的理解之外,长阅读技术还帮助我们确定基因组中被称为HLA区域的麻烦区域。这是一个对生物医学研究有很大希望的地区,因为它不仅不容易描述,而且恰好与我们所患的许多常见疾病有关。
Dan Geraghty多年来一直在对HLA区域进行测序。他的一些工作被用于最初的人类基因组计划。丹说,长阅读排序是一个游戏规则的改变。
他在我们面试前的聊天中对我说:“长阅读是NGS年度最佳新闻。”。
目前,这篇长篇大论的故事几乎归帕奇比所有。但所有这些研究人员都表示,他们与平台无关,他们很高兴看到新技术的出现,这些新技术有望实现长阅读。有牛津纳米孔和杰尼亚和其他人,包括我们在这里介绍的纳布西斯。Illumina提供了他们的Moleculo技术,可以将长时间的读取从较短的读取进行组装,但是没有多少人看到过该技术的数据集或其他细节。
那么这对NGS的未来意味着什么呢?长阅读是在基因组学领域开辟了一个尚未被发现的广阔的新领域,还是仅仅是一个不错的奖励?我们还将与其他客人探讨这些问题,包括即将与肖恩·贝克、测序市场CSO、Allseq以及哈佛大学乔治·丘奇的谈话。

Has the Race to the $1,000 Genome Proceeded at the Expense of Quality? New Series on The Rise of Long Read Sequencing


Author: 
Theral Timpson

According to a 2010 article in Bio-IT World, the term $1,000 Genome has been around since 2001.  The University of Wisconsin’s David Schwartz claims to have coined the term at an NHGRI retreat during a breakout session.  Whatever its origin, the $1,000 Genome soon became the target for the rapid development of next-gen sequencing (NGS).

With Illumina, the dominant player in the NGS market, claiming this year that they’ve reached that target with their HiSeq X Ten system, it’s fair to stop and ask just what has been achieved.  What do you get for that $1,000?  And furthermore, where does NGS go from here?

Beginning next week, we‘re launching a new series, The Rise of Long Read Sequencing.

I first heard “long read” sequencing differentiated from “short read” in an interview with Mike Hunkapiller, CEO of Pacific Biosciences last year.   I had asked him the obvious question about how he expects to compete with Illumina, and he responded saying that “short read technologies” had serious draw backs.  

“Wait a minute,” I remember thinking at the time, “did Mike just dismiss Illumina’s technology out right?  And what are these long reads he’s talking about.”

There’s no doubt that Illumina is a major success story.  In the current edition of Forbes, Matthew Herper crowns Illumina with a glowing article, naming the rapid decrease in the price of sequencing after their CEO, “Flatley’s Law.”  This is no small praise for Illumina’s Jay Flatley, who has led the company from startup who used to offer oligos for $0.15/base to be the dominant player in the sequencing space, and now strongly poised as an upcoming contender in the clinical diagnostics industry.  

But this is the story you’ll hear everywhere.  

What is less known is that of the turnabout of Pacific Biosciences and the rise of long read sequencing.  PacBio had a much touted beginning, raising north of $600 million.  But they disappointed the industry by not delivering on some early hype that they could compete with Illumina on throughput by sequencing a human genome in fifteen minutes.  In fact, PacBio not only didn’t improve on Illumina’s high throughput,  their technology had the unattractive high error rate of 15%.  And to top that, their machine was more expensive.

However, for over a year now, we’ve been following an emerging trend among researchers toward the use of PacBio’s long reads to do not only de novo sequencing, but to probe areas of the human genome that have defied short read technologies.  From better characterization of RNA isoforms to raising the quality of the human reference genome, more and more papers are published touting the new possibilities of PacBio‘s long reads.  

There’s also now some data coming from Oxford Nanopore’s new minION that is exciting the first round of users.  This is long read data.  In addition, I recently toured  Genia Technologies’ facility in Mountain View and was shown their new sequencer now in alpha testing.    Genia’s CEO, Stefan Roever, says their new chip will read over a million long reads per run.

Once you have long reads and high throughput, is there any use for short read technology? I asked Stefan.  “Not really,” he confirmed.

To chronicle the rise of long reads, we went to PacBio and asked them if they’d introduce us to some of their users and sponsor a series on the topic. They did. 

Take the story of Gene Myers, for instance.  Gene helped develop the BLAST algorithm for sequence alignment back in the 90’s, working on the Human Genome Project at Celera.   Then he got out of sequencing to pursue “more interesting science.”  He thought that the future of sequencing was pretty straight forward and not that provocative for a scientist.  

“Everything basically went short because that’s where you could get the reduction in cost,” says Myers in our upcoming interview.  “Today everyone does it routinely but I don’t think they should be. . . . They’re using 100 bp reads, and the assemblies are crappy,” Gene says.

Gene is now back into sequencing, working at the Max Planck Institute in Germany.  And he’s very excited about long reads.  He says that for the first time ever it is theoretically possible to get to 100% accuracy with PacBio’s technology.  

Wait a minute.  What about PacBio’s terrible accuracy rate?  

It turns out that that even though the error rate of the PacBio SMRT system was quite high,  the errors were random.  So if you stacked the sequences deep enough, you could greatly improve the accuracy.

We ask Gene how is it that the industry has bought in for so long to the short read technology?

“I think it’s because they weren’t offered anything else.  It’s what you got,” says Myers.

We start off the series with Mike Snyder from Stanford who explains how PacBio’s long read technology has opened up his research into the transcriptome.  Often there are various RNA isoforms that are hard to analyze with Illumina’s short read technology, Mike says.  He’s recently published a couple papers showing that with PacBio’s long reads he is able to completely cover the full-length RNA molecules, thereby characterizing areas that previously have not been annotated.

After that we’ll be talking with the former CSO of PacBio, Eric Schadt, now at the Icahn Institute at Mt. Sinai in New York.  In his current job he’s working to bring sequencing to the clinic and says that the PacBio long reads are very important for getting a better picture of the genome.   From Eric‘s interview:

“In order to drive the throughput super high, we’ve been ignoring  a lot of the structural features in the genome that are as important as some of the single nucleotide hits, whether its long tandem repeats that vary, or bigger structural variations, or focal variants that are important in cancer--those things are difficult to characterize unambiguously with the current short read technology.    [Short reads] were attuned to certain problems and had certain advantages that enabled this big advance, but they are absolutely not hitting the entire problem like we need hit.”

In addition to improving our understanding of the transcriptome and structural variation of the genome, the long read technology is helping us nail down that troublesome area of the genome known as the HLA region.  This is a region that holds much promise for biomedical research because not only has it defied easy characterization, it just happens to be connected to many of the common diseases we have.  

Dan Geraghty has been sequencing the HLA region for many years.  Some of his work was used in the original Human Genome Project.  Dan says that long read sequencing is a game changer.

“Long reads is the NGS story of the year,” he told me in our pre-interview chat.

For now this long read story is pretty much owned by PacBio.  But all of these researchers say they are platform agnostic and are happy to see new technologies on the horizon that are promising long reads.  There’s Oxford Nanopore and Genia and others, including Nabsys who we’ve profiled here as well.  Illumina offers their Moleculo technology which assembles long reads from shorter reads, but not many have seen the datasets or other details about this technology.  

So what does this mean for the future of NGS?  Do long reads open up vast new territories in genomics that have yet to be discovered or are they just a nice bonus?   We’ll be pursuing these questions with other guests as well, including upcoming chats with Shawn Baker, CSO of the sequencing marketplace, Allseq, and with George Church of Harvard.

为了获得1000美元的基因组,是否以牺牲质量为代价?

标签:summary   研究   point   ike   att   project   斯坦福   -o   多少   

原文地址:https://www.cnblogs.com/wangprince2017/p/13770505.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!