Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Bioinformatics
Personen und Körperschaften:	Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee
In:	Bioinformatics, 30, 2014, 12, S. 1707-1713
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Oxford University Press (OUP)
Schlagwörter:	Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability

author_facet	Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee
author	Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee
spellingShingle	Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee Bioinformatics Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability
author_sort	cheng, anthony youzhi
spelling	Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/btu067 <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals Bioinformatics
doi_str_mv	10.1093/bioinformatics/btu067
facet_avail	Online Free
finc_class_facet	Mathematik Informatik Biologie Chemie und Pharmazie
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
institution	DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161 DE-Gla1 DE-Zi4 DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14
imprint	Oxford University Press (OUP), 2014
imprint_str_mv	Oxford University Press (OUP), 2014
issn	1367-4811 1367-4803
issn_str_mv	1367-4811 1367-4803
language	English
mega_collection	Oxford University Press (OUP) (CrossRef)
match_str	cheng2014assessingsinglenucleotidevariantdetectionandgenotypecallingonwholegenomesequencedindividuals
publishDateSort	2014
publisher	Oxford University Press (OUP)
recordtype	ai
record_format	ai
series	Bioinformatics
source_id	49
title	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_unstemmed	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_fullStr	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full_unstemmed	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_short	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_sort	assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
topic	Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability
url	http://dx.doi.org/10.1093/bioinformatics/btu067
publishDate	2014
physical	1707-1713
description	<jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p>
container_issue	12
container_start_page	1707
container_title	Bioinformatics
container_volume	30
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792341885809852418
geogr_code	not assigned
last_indexed	2024-03-01T16:27:00.552Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Assessing+single+nucleotide+variant+detection+and+genotype+calling+on+whole-genome+sequenced+individuals&rft.date=2014-06-15&genre=article&issn=1367-4803&volume=30&issue=12&spage=1707&epage=1713&pages=1707-1713&jtitle=Bioinformatics&atitle=Assessing+single+nucleotide+variant+detection+and+genotype+calling+on+whole-genome+sequenced+individuals&aulast=Ong&aufirst=Rick+Twee-Hee&rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbtu067&rft.language%5B0%5D=eng
SOLR
_version_	1792341885809852418
author	Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee
author_facet	Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee, Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee
author_sort	cheng, anthony youzhi
container_issue	12
container_start_page	1707
container_title	Bioinformatics
container_volume	30
description	<jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p>
doi_str_mv	10.1093/bioinformatics/btu067
facet_avail	Online, Free
finc_class_facet	Mathematik, Informatik, Biologie, Chemie und Pharmazie
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
imprint	Oxford University Press (OUP), 2014
imprint_str_mv	Oxford University Press (OUP), 2014
institution	DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14
issn	1367-4811, 1367-4803
issn_str_mv	1367-4811, 1367-4803
language	English
last_indexed	2024-03-01T16:27:00.552Z
match_str	cheng2014assessingsinglenucleotidevariantdetectionandgenotypecallingonwholegenomesequencedindividuals
mega_collection	Oxford University Press (OUP) (CrossRef)
physical	1707-1713
publishDate	2014
publishDateSort	2014
publisher	Oxford University Press (OUP)
record_format	ai
recordtype	ai
series	Bioinformatics
source_id	49
spelling	Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/btu067 <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals Bioinformatics
spellingShingle	Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee, Bioinformatics, Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals, Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
title	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_fullStr	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full_unstemmed	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_short	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_sort	assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_unstemmed	Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
topic	Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
url	http://dx.doi.org/10.1093/bioinformatics/btu067