author_facet Cheng, Anthony Youzhi
Teo, Yik-Ying
Ong, Rick Twee-Hee
Cheng, Anthony Youzhi
Teo, Yik-Ying
Ong, Rick Twee-Hee
author Cheng, Anthony Youzhi
Teo, Yik-Ying
Ong, Rick Twee-Hee
spellingShingle Cheng, Anthony Youzhi
Teo, Yik-Ying
Ong, Rick Twee-Hee
Bioinformatics
Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
Computational Mathematics
Computational Theory and Mathematics
Computer Science Applications
Molecular Biology
Biochemistry
Statistics and Probability
author_sort cheng, anthony youzhi
spelling Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/btu067 <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals Bioinformatics
doi_str_mv 10.1093/bioinformatics/btu067
facet_avail Online
Free
finc_class_facet Mathematik
Informatik
Biologie
Chemie und Pharmazie
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
institution DE-Ch1
DE-L229
DE-D275
DE-Bn3
DE-Brt1
DE-Zwi2
DE-D161
DE-Gla1
DE-Zi4
DE-15
DE-Pl11
DE-Rs1
DE-105
DE-14
imprint Oxford University Press (OUP), 2014
imprint_str_mv Oxford University Press (OUP), 2014
issn 1367-4811
1367-4803
issn_str_mv 1367-4811
1367-4803
language English
mega_collection Oxford University Press (OUP) (CrossRef)
match_str cheng2014assessingsinglenucleotidevariantdetectionandgenotypecallingonwholegenomesequencedindividuals
publishDateSort 2014
publisher Oxford University Press (OUP)
recordtype ai
record_format ai
series Bioinformatics
source_id 49
title Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_unstemmed Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_fullStr Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full_unstemmed Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_short Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_sort assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
topic Computational Mathematics
Computational Theory and Mathematics
Computer Science Applications
Molecular Biology
Biochemistry
Statistics and Probability
url http://dx.doi.org/10.1093/bioinformatics/btu067
publishDate 2014
physical 1707-1713
description <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation:  </jats:p> <jats:p>Contact:  twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information:  Supplementary data are available at Bioinformatics online.</jats:p>
container_issue 12
container_start_page 1707
container_title Bioinformatics
container_volume 30
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792341885809852418
geogr_code not assigned
last_indexed 2024-03-01T16:27:00.552Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Assessing+single+nucleotide+variant+detection+and+genotype+calling+on+whole-genome+sequenced+individuals&rft.date=2014-06-15&genre=article&issn=1367-4803&volume=30&issue=12&spage=1707&epage=1713&pages=1707-1713&jtitle=Bioinformatics&atitle=Assessing+single+nucleotide+variant+detection+and+genotype+calling+on+whole-genome+sequenced+individuals&aulast=Ong&aufirst=Rick+Twee-Hee&rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbtu067&rft.language%5B0%5D=eng
SOLR
_version_ 1792341885809852418
author Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee
author_facet Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee, Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee
author_sort cheng, anthony youzhi
container_issue 12
container_start_page 1707
container_title Bioinformatics
container_volume 30
description <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation:  </jats:p> <jats:p>Contact:  twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information:  Supplementary data are available at Bioinformatics online.</jats:p>
doi_str_mv 10.1093/bioinformatics/btu067
facet_avail Online, Free
finc_class_facet Mathematik, Informatik, Biologie, Chemie und Pharmazie
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHUwNjc
imprint Oxford University Press (OUP), 2014
imprint_str_mv Oxford University Press (OUP), 2014
institution DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14
issn 1367-4811, 1367-4803
issn_str_mv 1367-4811, 1367-4803
language English
last_indexed 2024-03-01T16:27:00.552Z
match_str cheng2014assessingsinglenucleotidevariantdetectionandgenotypecallingonwholegenomesequencedindividuals
mega_collection Oxford University Press (OUP) (CrossRef)
physical 1707-1713
publishDate 2014
publishDateSort 2014
publisher Oxford University Press (OUP)
record_format ai
recordtype ai
series Bioinformatics
source_id 49
spelling Cheng, Anthony Youzhi Teo, Yik-Ying Ong, Rick Twee-Hee 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/btu067 <jats:title>Abstract</jats:title> <jats:p>Motivation: Whole-genome sequencing (WGS) is now routinely used for the detection and identification of genetic variants, particularly single nucleotide polymorphisms (SNPs) in humans, and this has provided valuable new insights into human diversity, population histories and genetic association studies of traits and diseases. However, this relies on accurate detection and genotyping calling of the polymorphisms present in the samples sequenced. To minimize cost, the majority of current WGS studies, including the 1000 Genomes Project (1 KGP) have adopted low coverage sequencing of large number of samples, where such designs have inadvertently influenced the development of variant calling methods on WGS data. Assessment of variant accuracy are usually performed on the same set of low coverage individuals or a smaller number of deeply sequenced individuals. It is thus unclear how these variant calling methods would fare for a dataset of ∼100 samples from a population not part of the 1 KGP that have been sequenced at various coverage depths.</jats:p> <jats:p>Results: Using down-sampling of the sequencing reads obtained from the Singapore Sequencing Malay Project (SSMP), and a set of SNP calls from the same individuals genotyped on the Illumina Omni1-Quad array, we assessed the sensitivity of SNP detection, accuracy of genotype calls made and variant accuracy for six commonly used variant calling methods of GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools and SOAPsnp. The results indicate that at 5× coverage depth, the multi-sample callers of GATK and SAMtools yield the best accuracy particularly if the study samples are called together with a large number of individuals such as those from 1000 Genomes Project. If study samples are sequenced at a high coverage depth such as 30×, CASAVA has the highest variant accuracy as compared with the other variant callers assessed.</jats:p> <jats:p>Availability and implementation: </jats:p> <jats:p>Contact: twee_hee_ong@nuhs.edu.sg</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals Bioinformatics
spellingShingle Cheng, Anthony Youzhi, Teo, Yik-Ying, Ong, Rick Twee-Hee, Bioinformatics, Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals, Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
title Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_fullStr Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_full_unstemmed Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_short Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_sort assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
title_unstemmed Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals
topic Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
url http://dx.doi.org/10.1093/bioinformatics/btu067