Contributing a new algorithm#
If you want to make a new clustering or dimensionality reduction algorithm available in the napari-clusters-plotter, please follow a few guidelines and specifications. First, fork the repository and create yourself a different branch to work on. AOnce this is set up, you can find all implemented allgorithms under src/napari_clusters_plotter/algorithms.py
, where you can add your algorithm, too.
New dimensionality reduction algorithm#
If you want to add your algorithm there, please make sure that it adheres to the following syntax:
def reduce_my_algorithm(
data: pd.DataFrame,
your_int_algorithm_parameter: int = 2,
your_float_algorithm_parameter: float = 0.1,
scale: bool = True
) -> FunctionWorker[pd.DataFrame]:
"""
Reduce the data using my algorithm
"""
@thread_worker(progress=True)
def _reduce_my_algorithm(
data: pd.DataFrame,
your_int_algorithm_parameter: int,
your_float_algorithm_parameter: float,
scale: bool
) -> FunctionWorker[pd.DataFrame]:
import your_module
from sklearn.preprocessing import StandardScaler
# Keep this code
non_nan_data = data.dropna()
if scale:
preprocessed = StandardScaler().fit_transform(non_nan_data.values)
else:
preprocessed = non_nan_data.values
# <<<Implement your algorithm here
...
reduced_data = your_module.fit_transform(preprocessed)
# <<<<
# Add NaN rows back - keep this part
result = pd.DataFrame(index=data.index, columns=range(n_components))
result.loc[non_nan_data.index] = reduced_data
return result
return _reduce_my_algorithm(
data,
your_int_algorithm_parameter,
your_float_algorithm_parameter,
scale)
Here’s a breakdown of what each part of this code does. The outer function reduce_my_algorithm
is what will later be visible to the napari clusters plotter. The inner function (_reduce_my_algorithm
) will be submitted to a napari threadworker, which allows for the algorithm to execute in a non-blocking fashion.
Once that is done, you need to make the method available for the clustering or dimensionality reduction widget, respectively. If your algorithm is a dimensionality reduction algorithm, you’ll find the relevant widget under src/napari_clusters_plotter/_dim_reduction_and_clustering.py
. You can add your algorithm to the Dimensionality reduction widget there as follows:
class DimensionalityReductionWidget(AlgorithmWidgetBase):
algorithms = {
"PCA": {
"callback": reduce_pca,
"column_string": "PC",
"doc_url": "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html",
},
"t-SNE": {
"callback": reduce_tsne,
"column_string": "t-SNE",
"doc_url": "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html",
},
"UMAP": {
"callback": reduce_umap,
"column_string": "UMAP",
"doc_url": "https://umap-learn.readthedocs.io/en/latest/",
},
"my-new-algorithm": {
"callback": reduce_my_algorithm,
"column_string": "acronym_of_my_algorithm",
"doc_url": "https://link-to-my-algorithm-that-explains-what-it-does.com"
}
}
def __init__(self, napari_viewer: napari.Viewer):
super().__init__(
napari_viewer,
DimensionalityReductionWidget.algorithms,
"Features to reduce:",
["PCA", "t-SNE", "UMAP", "some_acronym_for_your_algorithm"],
)
The relevant parts here are to add the above-implemented function as a callback to the widget. Secondly, you’ll need to provide an acronym for your algorithm. The reduced features will then appear in the list of features as ACRONYM_0
and ACRONYM_1
(e.g., PC_0
and PC_1
for PCA). Lastly, please add a link to a documentation page that describes how your algorithm works, what it does and what its parameters mean.
New clustering algorithm#
To add a new clustering algorithm, follow the steps above analogeously for the clustering widget. For the implementation of the algorithm itself, nothing changes.
Hint
Clustering algorithms are expected to return a single integer column!
Hint
In the napari-clusters-plotter convention, cluster ids start with 1 - the value 0 is reserved for unclustered data points. This means that if your algorithm returns a cluster id of 0, you should change it to 1 before returning the result.
This being said, a clustering algorithm should look like this:
def cluster_method(
data: pd.DataFrame, n_clusters: int = 3, scale: bool = True
) -> FunctionWorker[pd.Series]:
"""
Cluster the data using Spectral Clustering
"""
@thread_worker(progress=True)
def _cluster_method(
data: pd.DataFrame, some_parameter: int, scale: bool
) -> pd.Series:
from module import MyClusteringAlgorithm
# Remove NaN rows
non_nan_data = data.dropna()
if scale:
preprocessed = StandardScaler().fit_transform(non_nan_data)
else:
preprocessed = non_nan_data.values
# Perform Spectral Clustering (+1 to start clusters from 1)
clusterer = MyClusteringAlgorithm(some_parameter=some_parameter)
clusters = clusterer.fit_predict(preprocessed) + 1
# Add NaN rows back
result = pd.Series(index=data.index, dtype=int)
result.loc[non_nan_data.index] = clusters
return result
return _cluster_method(data, n_clusters, scale)