Only biallelic SNPs are considered. For each sample, the following statistics are calculated :
INDIV : Sample name
N_SITES : Total number of SNPs
N_HET : Number of SNPs with heterozygous call (
0/1or0|1or1/0or1|0)N_ALT : Number of SNPs with alternate homozygous call (
1/1or1|1)N_REF : Number of SNPs with reference homozygous call (
0/0or0|0)N_MISS : Number of SNPs with missing call (
./.or.|.)P_HET : Percentage of heterozygous calls
P_ALT : Percentage of alternate homozygous calls
P_REF : Percentage of reference homozygous calls
P_MISS : Percentage of missing calls (missing rate)
Value
A data.frame of sample statistics.
References
Java implementation: https://github.com/gkanogiannis/BioInfoJava-Utils
Author
Anestis Gkanogiannis, anestis@gkanogiannis.com
Examples
my.istats <- vcf2istats(
inputFile =
system.file("extdata", "samples.vcf.gz", package = "fastreeR")
)