An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Genome Research
Personen und Körperschaften:	Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min
In:	Genome Research, 25, 2015, 6, S. 918-925
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Cold Spring Harbor Laboratory
Schlagwörter:	Genetics (clinical) Genetics

author_facet	Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min
author	Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min
spellingShingle	Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min Genome Research An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data Genetics (clinical) Genetics
author_sort	jun, goo
spelling	Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min 1088-9051 1549-5469 Cold Spring Harbor Laboratory Genetics (clinical) Genetics http://dx.doi.org/10.1101/gr.176552.114 <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p> An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data Genome Research
doi_str_mv	10.1101/gr.176552.114
facet_avail	Online Free
finc_class_facet	Biologie
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
institution	DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14 DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161 DE-Gla1 DE-Zi4
imprint	Cold Spring Harbor Laboratory, 2015
imprint_str_mv	Cold Spring Harbor Laboratory, 2015
issn	1088-9051 1549-5469
issn_str_mv	1088-9051 1549-5469
language	English
mega_collection	Cold Spring Harbor Laboratory (CrossRef)
match_str	jun2015anefficientandscalableanalysisframeworkforvariantextractionandrefinementfrompopulationscalednasequencedata
publishDateSort	2015
publisher	Cold Spring Harbor Laboratory
recordtype	ai
record_format	ai
series	Genome Research
source_id	49
title	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_unstemmed	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_fullStr	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full_unstemmed	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_short	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_sort	an efficient and scalable analysis framework for variant extraction and refinement from population-scale dna sequence data
topic	Genetics (clinical) Genetics
url	http://dx.doi.org/10.1101/gr.176552.114
publishDate	2015
physical	918-925
description	<jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p>
container_issue	6
container_start_page	918
container_title	Genome Research
container_volume	25
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792347830387474434
geogr_code	not assigned
last_indexed	2024-03-01T18:01:00.698Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=An+efficient+and+scalable+analysis+framework+for+variant+extraction+and+refinement+from+population-scale+DNA+sequence+data&rft.date=2015-06-01&genre=article&issn=1549-5469&volume=25&issue=6&spage=918&epage=925&pages=918-925&jtitle=Genome+Research&atitle=An+efficient+and+scalable+analysis+framework+for+variant+extraction+and+refinement+from+population-scale+DNA+sequence+data&aulast=Kang&aufirst=Hyun+Min&rft_id=info%3Adoi%2F10.1101%2Fgr.176552.114&rft.language%5B0%5D=eng
SOLR
_version_	1792347830387474434
author	Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min
author_facet	Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min, Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min
author_sort	jun, goo
container_issue	6
container_start_page	918
container_title	Genome Research
container_volume	25
description	<jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p>
doi_str_mv	10.1101/gr.176552.114
facet_avail	Online, Free
finc_class_facet	Biologie
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
imprint	Cold Spring Harbor Laboratory, 2015
imprint_str_mv	Cold Spring Harbor Laboratory, 2015
institution	DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4
issn	1088-9051, 1549-5469
issn_str_mv	1088-9051, 1549-5469
language	English
last_indexed	2024-03-01T18:01:00.698Z
match_str	jun2015anefficientandscalableanalysisframeworkforvariantextractionandrefinementfrompopulationscalednasequencedata
mega_collection	Cold Spring Harbor Laboratory (CrossRef)
physical	918-925
publishDate	2015
publishDateSort	2015
publisher	Cold Spring Harbor Laboratory
record_format	ai
recordtype	ai
series	Genome Research
source_id	49
spelling	Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min 1088-9051 1549-5469 Cold Spring Harbor Laboratory Genetics (clinical) Genetics http://dx.doi.org/10.1101/gr.176552.114 <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p> An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data Genome Research
spellingShingle	Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min, Genome Research, An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data, Genetics (clinical), Genetics
title	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_fullStr	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full_unstemmed	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_short	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_sort	an efficient and scalable analysis framework for variant extraction and refinement from population-scale dna sequence data
title_unstemmed	An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
topic	Genetics (clinical), Genetics
url	http://dx.doi.org/10.1101/gr.176552.114