author_facet Jun, Goo
Wing, Mary Kate
Abecasis, Gonçalo R.
Kang, Hyun Min
Jun, Goo
Wing, Mary Kate
Abecasis, Gonçalo R.
Kang, Hyun Min
author Jun, Goo
Wing, Mary Kate
Abecasis, Gonçalo R.
Kang, Hyun Min
spellingShingle Jun, Goo
Wing, Mary Kate
Abecasis, Gonçalo R.
Kang, Hyun Min
Genome Research
An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
Genetics (clinical)
Genetics
author_sort jun, goo
spelling Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min 1088-9051 1549-5469 Cold Spring Harbor Laboratory Genetics (clinical) Genetics http://dx.doi.org/10.1101/gr.176552.114 <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p> An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data Genome Research
doi_str_mv 10.1101/gr.176552.114
facet_avail Online
Free
finc_class_facet Biologie
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
institution DE-15
DE-Pl11
DE-Rs1
DE-105
DE-14
DE-Ch1
DE-L229
DE-D275
DE-Bn3
DE-Brt1
DE-Zwi2
DE-D161
DE-Gla1
DE-Zi4
imprint Cold Spring Harbor Laboratory, 2015
imprint_str_mv Cold Spring Harbor Laboratory, 2015
issn 1088-9051
1549-5469
issn_str_mv 1088-9051
1549-5469
language English
mega_collection Cold Spring Harbor Laboratory (CrossRef)
match_str jun2015anefficientandscalableanalysisframeworkforvariantextractionandrefinementfrompopulationscalednasequencedata
publishDateSort 2015
publisher Cold Spring Harbor Laboratory
recordtype ai
record_format ai
series Genome Research
source_id 49
title An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_unstemmed An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_fullStr An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full_unstemmed An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_short An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_sort an efficient and scalable analysis framework for variant extraction and refinement from population-scale dna sequence data
topic Genetics (clinical)
Genetics
url http://dx.doi.org/10.1101/gr.176552.114
publishDate 2015
physical 918-925
description <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p>
container_issue 6
container_start_page 918
container_title Genome Research
container_volume 25
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792347830387474434
geogr_code not assigned
last_indexed 2024-03-01T18:01:00.698Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=An+efficient+and+scalable+analysis+framework+for+variant+extraction+and+refinement+from+population-scale+DNA+sequence+data&rft.date=2015-06-01&genre=article&issn=1549-5469&volume=25&issue=6&spage=918&epage=925&pages=918-925&jtitle=Genome+Research&atitle=An+efficient+and+scalable+analysis+framework+for+variant+extraction+and+refinement+from+population-scale+DNA+sequence+data&aulast=Kang&aufirst=Hyun+Min&rft_id=info%3Adoi%2F10.1101%2Fgr.176552.114&rft.language%5B0%5D=eng
SOLR
_version_ 1792347830387474434
author Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min
author_facet Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min, Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min
author_sort jun, goo
container_issue 6
container_start_page 918
container_title Genome Research
container_volume 25
description <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p>
doi_str_mv 10.1101/gr.176552.114
facet_avail Online, Free
finc_class_facet Biologie
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTEwMS9nci4xNzY1NTIuMTE0
imprint Cold Spring Harbor Laboratory, 2015
imprint_str_mv Cold Spring Harbor Laboratory, 2015
institution DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4
issn 1088-9051, 1549-5469
issn_str_mv 1088-9051, 1549-5469
language English
last_indexed 2024-03-01T18:01:00.698Z
match_str jun2015anefficientandscalableanalysisframeworkforvariantextractionandrefinementfrompopulationscalednasequencedata
mega_collection Cold Spring Harbor Laboratory (CrossRef)
physical 918-925
publishDate 2015
publishDateSort 2015
publisher Cold Spring Harbor Laboratory
record_format ai
recordtype ai
series Genome Research
source_id 49
spelling Jun, Goo Wing, Mary Kate Abecasis, Gonçalo R. Kang, Hyun Min 1088-9051 1549-5469 Cold Spring Harbor Laboratory Genetics (clinical) Genetics http://dx.doi.org/10.1101/gr.176552.114 <jats:p>The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.</jats:p> An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data Genome Research
spellingShingle Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R., Kang, Hyun Min, Genome Research, An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data, Genetics (clinical), Genetics
title An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_fullStr An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_full_unstemmed An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_short An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
title_sort an efficient and scalable analysis framework for variant extraction and refinement from population-scale dna sequence data
title_unstemmed An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data
topic Genetics (clinical), Genetics
url http://dx.doi.org/10.1101/gr.176552.114