Skip to contents

Performs Hierarchical Clustering on a distance matrix (i.e. calculated with vcf2dist or fasta2dist) and generates a phylogenetic tree (complete linkage by default; single, complete, and average linkage are supported by the Java backend), as in dist2tree. The phylogenetic tree is then pruned with cutreeDynamic to get clusters (as in tree2clusters).

Usage

dist2clusters(
  inputDist,
  cutHeight = NULL,
  minClusterSize = 1,
  extra = TRUE,
  verbose = FALSE
)

Arguments

inputDist

Input distances file location (generated with vcf2dist or fasta2dist). File can be gzip compressed. Or a dist distances object.

cutHeight

Define at which height to cut tree. Default automatically defined.

minClusterSize

Minimum size of clusters. Default 1.

extra

Boolean whether to use extra parameters for the cutreeDynamic.

verbose

Logical. If TRUE, enables verbose output from the Java backend.

Value

A list of :

  • character vector of the generated phylogenetic tree in Newick format

  • character vector of the clusters. Each row contains data for a cluster, separated by space. The id of the cluster, the size of the cluster (number of elements) and the names of its elements, Cluster id 0 contains all the objects not assigned to a cluster (singletons). Example clusters output :

    03Sample1Sample2Sample3
    13Sample4Sample5Sample6
    22Sample7Sample8
    32Sample9Sample0

References

Java implementation: https://github.com/gkanogiannis/BioInfoJava-Utils

Author

Anestis Gkanogiannis, anestis@gkanogiannis.com

Examples

my.clust <- dist2clusters(
    inputDist =
        system.file("extdata", "samples.vcf.dist.gz", package = "fastreeR"),
    verbose = TRUE
)
#>  ..cutHeight not given, setting it to 0.0793  ===>  99% of the (truncated) height range in dendro.
#>  ..done.