namespace for density-based clustering functions More...
Namespaces | |
MPI | |
MPI implementations of compute intensive functions. | |
OpenCL | |
OpenCL implementations of compute intensive functions. | |
Classes | |
struct | BoxGrid |
Typedefs | |
using | FreeEnergy = std::pair< std::size_t, float > |
matches frame id to free energy | |
using | Neighbor = Clustering::Tools::Neighbor |
matches neighbor's frame id to distance | |
using | Neighborhood = Clustering::Tools::Neighborhood |
map frame id to neighbors | |
using | Box = std::array< int, 2 > |
encodes 2D box for box-assisted search algorithm | |
typedef std::map< float, std::vector< std::size_t > > | Pops |
Functions | |
BoxGrid | compute_box_grid (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const float radius) |
constexpr Box | neighbor_box (const Box center, const int i_neighbor) |
bool | is_valid_box (const Box box, const BoxGrid &grid) |
std::vector< std::size_t > | calculate_populations (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const float radius) |
calculate population of n-dimensional hypersphere per frame for one fixed radius. | |
std::map< float, std::vector< std::size_t > > | calculate_populations (const float *coords, const std::size_t n_rows, const std::size_t n_cols, std::vector< float > radii) |
std::vector< float > | calculate_free_energies (const std::vector< std::size_t > &pops) |
std::vector< FreeEnergy > | sorted_free_energies (const std::vector< float > &fe) |
std::tuple< Neighborhood, Neighborhood > | nearest_neighbors (const float *coords, const std::size_t n_rows, const std::size_t n_cols, const std::vector< float > &free_energy) |
std::set< std::size_t > | high_density_neighborhood (const float *coords, const std::size_t n_cols, const std::vector< FreeEnergy > &sorted_fe, const std::size_t i_frame, const std::size_t limit, const float max_dist) |
double | compute_sigma2 (const Neighborhood &nh) |
std::vector< std::size_t > | assign_low_density_frames (const std::vector< std::size_t > &initial_clustering, const Neighborhood &nh_high_dens, const std::vector< float > &free_energy) |
void | screening_log (const double sigma2, const std::size_t first_frame_above_threshold, const std::vector< FreeEnergy > &fe_sorted) |
log output for screening steps | |
std::tuple< std::vector< std::size_t >, std::size_t, double, std::vector< FreeEnergy >, std::set< std::size_t >, std::size_t > | prepare_initial_clustering (const std::vector< float > &free_energy, const Neighborhood &nh, const float free_energy_threshold, const std::size_t n_rows, const std::vector< std::size_t > initial_clusters) |
std::vector< std::size_t > | normalized_cluster_names (std::size_t first_frame_above_threshold, std::vector< std::size_t > clustering, std::vector< FreeEnergy > &fe_sorted) |
return clustered trajectory with new, distinct cluster names. | |
std::vector< std::size_t > | sorted_cluster_names (std::vector< std::size_t > clustering) |
sorts the cluster by decreasing population and renames them from 1..N | |
bool | compare2DVector (const std::pair< std::size_t, std::size_t > &p1, const std::pair< std::size_t, std::size_t > &p2) |
compare two two dimensional pairs by their second entry | |
bool | has2digits (float val) |
check if float has at maximum two digits | |
bool | lump_initial_clusters (const std::set< std::size_t > &local_nh, std::size_t &distinct_name, std::vector< std::size_t > &clustering, const std::vector< FreeEnergy > &fe_sorted, std::size_t first_frame_above_threshold) |
lump clusters based on distance threshold in screening process | |
void | main (boost::program_options::variables_map args) |
user interface and controlling function for density-based geometric clustering. More... | |
std::vector< std::size_t > | screening (const std::vector< float > &free_energy, const Neighborhood &nh, const float free_energy_threshold, const float *coords, const std::size_t n_rows, const std::size_t n_cols, const std::vector< std::size_t > initial_clusters) |
Variables | |
constexpr int | BOX_DIFF [9][2] |
const int | N_NEIGHBOR_BOXES = 9 |
number of neigbor boxes in 2D grid (including center box). | |
Detailed Description
namespace for density-based clustering functions
This module contains all function for assigning each frame a free energy and a nearest neighbor. Further, it identifies clusters.
Class Documentation
◆ Clustering::Density::BoxGrid
struct Clustering::Density::BoxGrid |
the full grid constructed for boxed-assisted nearest neighbor search with fixed distance criterion.
Definition at line 85 of file density_clustering.hpp.
Class Members | ||
---|---|---|
vector< Box > | assigned_box | matching frame id to the frame's assigned box |
map< Box, vector< int > > | boxes | the boxes with a list of assigned frame ids |
vector< int > | n_boxes | total number of boxes |
Function Documentation
◆ assign_low_density_frames()
std::vector< std::size_t > Clustering::Density::assign_low_density_frames | ( | const std::vector< std::size_t > & | initial_clustering, |
const Neighborhood & | nh_high_dens, | ||
const std::vector< float > & | free_energy | ||
) |
given an initial clustering computed from free energy cutoff screenings, assign all yet unclustered frames (those in 'state 0') to their geometrically next cluster. do this by starting at the lowest free energy of unassigned frames, then assigning the next lowest, etc. thus, all initial clusters will be filled with growing free energy, effectively producing microstates separated close to the free energy barriers.
Definition at line 345 of file density_clustering.cpp.
◆ calculate_free_energies()
std::vector< float > Clustering::Density::calculate_free_energies | ( | const std::vector< std::size_t > & | pops | ) |
re-use populations to calculate local free energy estimate via $ G = -k_B T \ln(P)$.
Definition at line 197 of file density_clustering.cpp.
◆ calculate_populations()
std::map< float, std::vector< std::size_t > > Clustering::Density::calculate_populations | ( | const float * | coords, |
const std::size_t | n_rows, | ||
const std::size_t | n_cols, | ||
const std::vector< float > | radii | ||
) |
calculate populations of n-dimensional hypersphere per frame for different radii in one go. computationally much more efficient than running single-radius version for every radius.
Definition at line 126 of file density_clustering.cpp.
◆ compute_box_grid()
BoxGrid Clustering::Density::compute_box_grid | ( | const float * | coords, |
const std::size_t | n_rows, | ||
const std::size_t | n_cols, | ||
const float | radius | ||
) |
uses fixed radius to separate coordinate space in equally sized boxes for box-assisted nearest neighbor search.
Definition at line 41 of file density_clustering.cpp.
◆ compute_sigma2()
double Clustering::Density::compute_sigma2 | ( | const Neighborhood & | nh | ) |
compute sigma2 as deviation of squared nearest-neighbor distances. sigma2 is given by E[x^2] > Var(x) = E[x^2] - E[x]^2, with x being the distances between nearest neighbors).
Definition at line 334 of file density_clustering.cpp.
◆ high_density_neighborhood()
std::set< std::size_t > Clustering::Density::high_density_neighborhood | ( | const float * | coords, |
const std::size_t | n_cols, | ||
const std::vector< FreeEnergy > & | sorted_fe, | ||
const std::size_t | i_frame, | ||
const std::size_t | limit, | ||
const float | max_dist | ||
) |
compute local neighborhood of a given frame. neighbor candidates are all frames below a given limit, effectively limiting the frames to the ones below a free energy cutoff.
Definition at line 292 of file density_clustering.cpp.
◆ is_valid_box()
returns true, if the box is a valid box in the grid. return false, if the box is outside of the grid.
Definition at line 97 of file density_clustering.cpp.
◆ main()
void Clustering::Density::main | ( | boost::program_options::variables_map | args | ) |
user interface and controlling function for density-based geometric clustering.
- Parameters
-
file input file with coordinates free-energy-input previously computed free energies (input) free-energy computed free energies (output) population computed populations (output) output clustered trajectory radii list of radii for free energy / population computations (input) radius radius for clustering (input) nearest-neighbors-input previously computed nearest neighbor list (input) nearest-neighbors nearest neighbor list (output) threshold-screening option for automated free energy threshold screening (input) threshold threshold for single run with limited free energy (input) only-initial if true, do not fill microstates up to barriers, but keep initial clusters below free energy cutoff (bool flag)
Definition at line 559 of file density_clustering.cpp.
◆ nearest_neighbors()
std::tuple< Neighborhood, Neighborhood > Clustering::Density::nearest_neighbors | ( | const float * | coords, |
const std::size_t | n_rows, | ||
const std::size_t | n_cols, | ||
const std::vector< float > & | free_energy | ||
) |
for every frame: compute the nearest neighbor (first tuple field) and the nearest neighbor with lower free energy, i.e. higher density (second tuple field).
Definition at line 230 of file density_clustering.cpp.
◆ neighbor_box()
returns neighbor box given by neighbor index (in 3D: 27 different neighbors, including center itself) and the given center box.
Definition at line 91 of file density_clustering.cpp.
◆ prepare_initial_clustering()
std::tuple< std::vector< std::size_t >, std::size_t, double, std::vector< FreeEnergy >, std::set< std::size_t >, std::size_t > Clustering::Density::prepare_initial_clustering | ( | const std::vector< float > & | free_energy, |
const Neighborhood & | nh, | ||
const float | free_energy_threshold, | ||
const std::size_t | n_rows, | ||
const std::vector< std::size_t > | initial_clusters | ||
) |
prepare data for initial density clustering as used in the screening process
Definition at line 387 of file density_clustering.cpp.
◆ screening()
std::vector< std::size_t > Clustering::Density::screening | ( | const std::vector< float > & | free_energy, |
const Neighborhood & | nh, | ||
const float | free_energy_threshold, | ||
const float * | coords, | ||
const std::size_t | n_rows, | ||
const std::size_t | n_cols, | ||
const std::vector< std::size_t > | initial_clusters | ||
) |
returns state trajectory for clusters given by a free energy threshold. frames with a local free energy estimate higher than the given threshold will not be clustered and remain in 'state 0'.
Definition at line 38 of file density_clustering_common.cpp.
◆ sorted_free_energies()
std::vector< FreeEnergy > Clustering::Density::sorted_free_energies | ( | const std::vector< float > & | fe | ) |
returns the given free energies sorted lowest to highest. original indices are retained.
Definition at line 214 of file density_clustering.cpp.
Variable Documentation
◆ BOX_DIFF
constexpr int Clustering::Density::BOX_DIFF[9][2] |
encodes box differences in 2D, i.e. if you are at the center box, the 9 different tuples hold the steppings to the 9 spacial neighbors (including the center box itself).
Definition at line 64 of file density_clustering.hpp.