Clustering::Density Namespace Reference

namespace for density-based clustering functions More...

Namespaces

 MPI
 MPI implementations of compute intensive functions.
 
 OpenCL
 OpenCL implementations of compute intensive functions.
 

Classes

struct  BoxGrid
 

Typedefs

using FreeEnergy = std::pair< std::size_t, float >
 matches frame id to free energy
 
using Neighbor = Clustering::Tools::Neighbor
 matches neighbor's frame id to distance
 
using Neighborhood = Clustering::Tools::Neighborhood
 map frame id to neighbors
 
using Box = std::array< int, 2 >
 encodes 2D box for box-assisted search algorithm
 
typedef std::map< float, std::vector< std::size_t > > Pops
 

Functions

BoxGrid compute_box_grid (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const float radius)
 
constexpr Box neighbor_box (const Box center, const int i_neighbor)
 
bool is_valid_box (const Box box, const BoxGrid &grid)
 
std::vector< std::size_t > calculate_populations (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const float radius)
 calculate population of n-dimensional hypersphere per frame for one fixed radius.
 
std::map< float, std::vector< std::size_t > > calculate_populations (const float *coords, const std::size_t n_rows, const std::size_t n_cols, std::vector< float > radii)
 
std::vector< float > calculate_free_energies (const std::vector< std::size_t > &pops)
 
std::vector< FreeEnergysorted_free_energies (const std::vector< float > &fe)
 
std::tuple< Neighborhood, Neighborhoodnearest_neighbors (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const std::vector< float > &free_energy)
 
std::set< std::size_t > high_density_neighborhood (const float *coords, const std::size_t n_cols, const std::vector< FreeEnergy > &sorted_fe, const std::size_t i_frame, const std::size_t limit, const float max_dist)
 
double compute_sigma2 (const Neighborhood &nh)
 
std::vector< std::size_t > assign_low_density_frames (const std::vector< std::size_t > &initial_clustering, const Neighborhood &nh_high_dens, const std::vector< float > &free_energy)
 
void screening_log (const double sigma2, const std::size_t first_frame_above_threshold, const std::vector< FreeEnergy > &fe_sorted)
 log output for screening steps
 
std::tuple< std::vector< std::size_t >, std::size_t, double, std::vector< FreeEnergy >, std::set< std::size_t >, std::size_t > prepare_initial_clustering (const std::vector< float > &free_energy, const Neighborhood &nh, const float free_energy_threshold, const std::size_t n_rows, const std::vector< std::size_t > initial_clusters)
 
std::vector< std::size_t > normalized_cluster_names (std::size_t first_frame_above_threshold, std::vector< std::size_t > clustering, std::vector< FreeEnergy > &fe_sorted)
 return clustered trajectory with new, distinct cluster names.
 
std::vector< std::size_t > sorted_cluster_names (std::vector< std::size_t > clustering)
 sorts the cluster by decreasing population and renames them from 1..N
 
bool compare2DVector (const std::pair< std::size_t, std::size_t > &p1, const std::pair< std::size_t, std::size_t > &p2)
 compare two two dimensional pairs by their second entry
 
bool has2digits (float val)
 check if float has at maximum two digits
 
bool lump_initial_clusters (const std::set< std::size_t > &local_nh, std::size_t &distinct_name, std::vector< std::size_t > &clustering, const std::vector< FreeEnergy > &fe_sorted, std::size_t first_frame_above_threshold)
 lump clusters based on distance threshold in screening process
 
void main (boost::program_options::variables_map args)
 user interface and controlling function for density-based geometric clustering. More...
 
std::vector< std::size_t > screening (const std::vector< float > &free_energy, const Neighborhood &nh, const float free_energy_threshold, const float *coords, const std::size_t n_rows, const std::size_t n_cols, const std::vector< std::size_t > initial_clusters)
 

Variables

constexpr int BOX_DIFF [9][2]
 
const int N_NEIGHBOR_BOXES = 9
 number of neigbor boxes in 2D grid (including center box).
 

Detailed Description

namespace for density-based clustering functions

This module contains all function for assigning each frame a free energy and a nearest neighbor. Further, it identifies clusters.


Class Documentation

◆ Clustering::Density::BoxGrid

struct Clustering::Density::BoxGrid

the full grid constructed for boxed-assisted nearest neighbor search with fixed distance criterion.

Definition at line 85 of file density_clustering.hpp.

Class Members
vector< Box > assigned_box matching frame id to the frame's assigned box
map< Box, vector< int > > boxes the boxes with a list of assigned frame ids
vector< int > n_boxes total number of boxes

Function Documentation

◆ assign_low_density_frames()

std::vector< std::size_t > Clustering::Density::assign_low_density_frames ( const std::vector< std::size_t > &  initial_clustering,
const Neighborhood nh_high_dens,
const std::vector< float > &  free_energy 
)

given an initial clustering computed from free energy cutoff screenings, assign all yet unclustered frames (those in 'state 0') to their geometrically next cluster. do this by starting at the lowest free energy of unassigned frames, then assigning the next lowest, etc. thus, all initial clusters will be filled with growing free energy, effectively producing microstates separated close to the free energy barriers.

Definition at line 345 of file density_clustering.cpp.

◆ calculate_free_energies()

std::vector< float > Clustering::Density::calculate_free_energies ( const std::vector< std::size_t > &  pops)

re-use populations to calculate local free energy estimate via $ G = -k_B T \ln(P)$.

Definition at line 197 of file density_clustering.cpp.

◆ calculate_populations()

std::map< float, std::vector< std::size_t > > Clustering::Density::calculate_populations ( const float *  coords,
const std::size_t  n_rows,
const std::size_t  n_cols,
const std::vector< float >  radii 
)

calculate populations of n-dimensional hypersphere per frame for different radii in one go. computationally much more efficient than running single-radius version for every radius.

Definition at line 126 of file density_clustering.cpp.

◆ compute_box_grid()

BoxGrid Clustering::Density::compute_box_grid ( const float *  coords,
const std::size_t  n_rows,
const std::size_t  n_cols,
const float  radius 
)

uses fixed radius to separate coordinate space in equally sized boxes for box-assisted nearest neighbor search.

Definition at line 41 of file density_clustering.cpp.

◆ compute_sigma2()

double Clustering::Density::compute_sigma2 ( const Neighborhood nh)

compute sigma2 as deviation of squared nearest-neighbor distances. sigma2 is given by E[x^2] > Var(x) = E[x^2] - E[x]^2, with x being the distances between nearest neighbors).

Definition at line 334 of file density_clustering.cpp.

◆ high_density_neighborhood()

std::set< std::size_t > Clustering::Density::high_density_neighborhood ( const float *  coords,
const std::size_t  n_cols,
const std::vector< FreeEnergy > &  sorted_fe,
const std::size_t  i_frame,
const std::size_t  limit,
const float  max_dist 
)

compute local neighborhood of a given frame. neighbor candidates are all frames below a given limit, effectively limiting the frames to the ones below a free energy cutoff.

Definition at line 292 of file density_clustering.cpp.

◆ is_valid_box()

bool Clustering::Density::is_valid_box ( const Box  box,
const BoxGrid grid 
)

returns true, if the box is a valid box in the grid. return false, if the box is outside of the grid.

Definition at line 97 of file density_clustering.cpp.

◆ main()

void Clustering::Density::main ( boost::program_options::variables_map  args)

user interface and controlling function for density-based geometric clustering.

Parameters
fileinput file with coordinates
free-energy-inputpreviously computed free energies (input)
free-energycomputed free energies (output)
populationcomputed populations (output)
outputclustered trajectory
radiilist of radii for free energy / population computations (input)
radiusradius for clustering (input)
nearest-neighbors-inputpreviously computed nearest neighbor list (input)
nearest-neighborsnearest neighbor list (output)
threshold-screeningoption for automated free energy threshold screening (input)
thresholdthreshold for single run with limited free energy (input)
only-initialif true, do not fill microstates up to barriers, but keep initial clusters below free energy cutoff (bool flag)

Definition at line 559 of file density_clustering.cpp.

◆ nearest_neighbors()

std::tuple< Neighborhood, Neighborhood > Clustering::Density::nearest_neighbors ( const float *  coords,
const std::size_t  n_rows,
const std::size_t  n_cols,
const std::vector< float > &  free_energy 
)

for every frame: compute the nearest neighbor (first tuple field) and the nearest neighbor with lower free energy, i.e. higher density (second tuple field).

Definition at line 230 of file density_clustering.cpp.

◆ neighbor_box()

constexpr Box Clustering::Density::neighbor_box ( const Box  center,
const int  i_neighbor 
)

returns neighbor box given by neighbor index (in 3D: 27 different neighbors, including center itself) and the given center box.

Definition at line 91 of file density_clustering.cpp.

◆ prepare_initial_clustering()

std::tuple< std::vector< std::size_t >, std::size_t, double, std::vector< FreeEnergy >, std::set< std::size_t >, std::size_t > Clustering::Density::prepare_initial_clustering ( const std::vector< float > &  free_energy,
const Neighborhood nh,
const float  free_energy_threshold,
const std::size_t  n_rows,
const std::vector< std::size_t >  initial_clusters 
)

prepare data for initial density clustering as used in the screening process

Definition at line 387 of file density_clustering.cpp.

◆ screening()

std::vector< std::size_t > Clustering::Density::screening ( const std::vector< float > &  free_energy,
const Neighborhood nh,
const float  free_energy_threshold,
const float *  coords,
const std::size_t  n_rows,
const std::size_t  n_cols,
const std::vector< std::size_t >  initial_clusters 
)

returns state trajectory for clusters given by a free energy threshold. frames with a local free energy estimate higher than the given threshold will not be clustered and remain in 'state 0'.

Definition at line 38 of file density_clustering_common.cpp.

◆ sorted_free_energies()

std::vector< FreeEnergy > Clustering::Density::sorted_free_energies ( const std::vector< float > &  fe)

returns the given free energies sorted lowest to highest. original indices are retained.

Definition at line 214 of file density_clustering.cpp.

Variable Documentation

◆ BOX_DIFF

constexpr int Clustering::Density::BOX_DIFF[9][2]
Initial value:
= {{-1, 1}, { 0, 1}, { 1, 1}
, {-1, 0}, { 0, 0}, { 1, 0}
, {-1,-1}, { 0,-1}, { 1,-1}}

encodes box differences in 2D, i.e. if you are at the center box, the 9 different tuples hold the steppings to the 9 spacial neighbors (including the center box itself).

Definition at line 64 of file density_clustering.hpp.