bblean.multiround#

Multi-round BitBirch workflow for clustering huge datasets in parallel

Functions

run_multiround_bitbirch

Perform (possibly parallel) multi-round BitBirch clustering

bblean.multiround.run_multiround_bitbirch(input_files, out_dir, n_features=None, input_is_packed=True, num_initial_processes=10, num_midsection_processes=None, initial_merge_criterion='diameter', branching_factor=254, threshold=0.3, midsection_threshold_change=0.0, tolerance=0.05, num_midsection_rounds=1, bin_size=10, max_tasks_per_process=1, refinement_before_midsection='full', split_largest_after_each_midsection_round=False, midsection_merge_criterion='tolerance-diameter', final_merge_criterion=None, mp_context=None, save_tree=False, save_centroids=True, max_fps=None, verbose=False, cleanup=True)[source]#

Perform (possibly parallel) multi-round BitBirch clustering

Warning

The functionality provided by this function is stable, but its API (the arguments it takes and its return values) may change in the future.