clustering
Class for clustering the correlation matrices.
MIT License Copyright © 2021-2024, Daniel Nagel, Georg Diez All rights reserved.
Clustering(*, mode='CPM', weighted=True, n_neighbors=None, resolution_parameter=None, n_clusters=None, seed=None)
¶
Bases: ClusterMixin
, BaseEstimator
Class for clustering a correlation matrix.
Parameters:
-
mode
(str
, default:'CPM'
) –the mode which determines the quality function optimized by the Leiden algorithm ('CPM', or 'modularity') or linkage clustering. - 'CPM': will use the constant Potts model on the full, weighted graph - 'modularity': will use modularity on a knn-graph - 'linkage': will use complete-linkage clustering - 'kmedoids': will use k-medoids clustering
-
weighted
(bool
, default:True
) –If True, the underlying graph has weighted edges. Otherwise, the graph is constructed using the adjacency matrix.
-
n_neighbors
(int
, default:None
) –This parameter specifies whether the whole matrix should be used, or a knn-graph, which reduces the required memory. The default depends on the
mode
- 'CPM':None
uses the full graph, and - 'modularity':None
uses square root of the number of features. -
resolution_parameter
(float
, default:None
) –Required for mode 'CPM' and 'linkage'. If None, the resolution parameter will be set to the third quartile of
X
forn_neighbors=None
and else to the mean value of the knn graph. -
n_clusters
(int
, default:None
) –Required for 'kmedoids'. The number of medoids which will constitute the later clusters.
-
seed
(int
, default:None
) –Use an integer to make the randomness of Leidenalg deterministic. By default uses a random seed if nothing is specified.
Attributes:
-
clusters_
(ndarray of shape (n_clusters, )
) –The result of the clustering process. A list of arrays, each containing all indices (features) corresponging to each cluster.
-
labels_
(ndarray of shape (n_features, )
) –Labels of each feature.
-
matrix_
(ndarray of shape (n_features, n_features)
) –Permuted matrix according to the determined clusters.
-
ticks_
(ndarray of shape (n_clusters, )
) –The cumulative number of features containing to the clusters. May be used as ticks for plotting
matrix_
. -
permutation_
(ndarray of shape (n_features, )
) –Permutation of the input features (corresponds to flattened
clusters_
). -
n_neighbors_
(int
) –Only avaiable when using knn graph. Indicates the number of nearest neighbors used for constructin the knn-graph.
-
resolution_param_
(float
) –Only for mode 'CPM' and 'linkage'. Indicates the resolution parameter used for the CPM based Leiden clustering.
-
linkage_matrix_
(ndarray of shape (n_clusters - 1, 4)
) –Only for mode 'linkage'. Contains the hierarchical clustering encoded as a linkage matrix, see scipy:spatial.distance.linkage.
Examples:
>>> import mosaic
>>> mat = np.array([[1.0, 0.1, 0.9], [0.1, 1.0, 0.1], [0.9, 0.1, 1.0]])
>>> clust = mosaic.Clustering()
>>> clust.fit(mat)
Clustering(resolution_parameter=0.7)
>>> clust.matrix_
array([[1. , 0.9, 0.1],
[0.9, 1. , 0.1],
[0.1, 0.1, 1. ]])
>>> clust.clusters_
array([list([2, 0]), list([1])], dtype=object)
Initialize Clustering class.
Source code in src/mosaic/clustering.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
|
fit(X, y=None)
¶
Clusters the correlation matrix by Leiden clustering on a graph.
Parameters:
-
X
(ndarray of shape (n_features, n_features)
) –Matrix containing the correlation metric which is clustered. The values should go from [0, 1] where 1 means completely correlated and 0 no correlation.
-
y
(Ignored
, default:None
) –Not used, present for scikit API consistency by convention.
Returns:
-
self
(object
) –Fitted estimator.
Source code in src/mosaic/clustering.py
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
fit_predict(X, y=None)
¶
Clusters the correlation matrix by Leiden clustering on a graph.
Parameters:
-
X
(ndarray of shape (n_features, n_features)
) –Matrix containing the correlation metric which is clustered. The values should go from [0, 1] where 1 means completely correlated and 0 no correlation.
-
y
(Ignored
, default:None
) –Not used, present for scikit API consistency by convention.
Returns:
-
labels
(ndarray of shape (n_samples,)
) –Cluster labels.
Source code in src/mosaic/clustering.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
|
score(X, y=None, sample_weight=None)
¶
Estimate silhouette_score of new correlation matrix.
Parameters:
-
X
(ndarray of shape (n_features, n_features)
) –New matrix containing the correlation metric to score. The values should go from [0, 1] where 1 means completely correlated and 0 no correlation.
-
y
(Ignored
, default:None
) –Not used, present for scikit API consistency by convention.
-
sample_weight
(Optional[ndarray]
, default:None
) –Not used, present for scikit API consistency by convention.
Returns:
-
score
(float
) –Silhouette score of new correlation matrix based on fitted labels.
Source code in src/mosaic/clustering.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 |
|