Skip to main content
The cluster command reads a features.csv produced by detect, groups the detected objects into clusters using K-Means, and automatically selects the optimal number of clusters with the elbow method.

Syntax

./run-cli cluster --input <path> [options]

Options

--input
string
required
Path to the features.csv file produced by the detect command. Short alias: -i.
--output-dir
string
default:"<input parent>/clusters"
Directory where clustering results are written. Short alias: -o.Defaults to a clusters/ subdirectory next to the input CSV file.
--images-dir
string
Directory containing the processed (annotated) images from the detection step. When provided, the command overlays cluster assignments on the detection images to produce cluster visualizations.
--max-k
number
default:"10"
Maximum number of clusters to evaluate when running the elbow method. The algorithm tests K values from 1 to this number and selects the K at the elbow of the WCSS curve.

Output files

For each image represented in the input CSV, the command creates a subdirectory under --output-dir:
<output-dir>/
└── <image-name>/
    ├── <image-name>_clustered.csv     ← features with added cluster column
    ├── elbow_method.png               ← WCSS vs K curve with elbow marked
    ├── silhouette_analysis.png        ← silhouette score vs K (when enabled)
    ├── cluster_distribution.png       ← centroid scatter coloured by cluster
    ├── morphological_scatter.png      ← area vs circularity scatter by cluster
    ├── clusters_visualization.png     ← cluster overlays on detection image (if --images-dir set)
    └── cluster_groups.png             ← bounding rectangles per cluster group (if --images-dir set)
The <image-name>_clustered.csv file is the required input for the analyze command.

Examples

# Cluster using defaults — output written next to features.csv
./run-cli cluster -i ./results/features.csv

# Specify an output directory explicitly
./run-cli cluster -i ./results/features.csv -o ./results/clusters

# Include detection images for cluster overlays
./run-cli cluster \
  -i ./results/features.csv \
  --images-dir ./results/detection \
  -o ./results/clusters

# Increase the maximum K for large, complex datasets
./run-cli cluster -i ./results/features.csv --max-k 20
If you used pipeline or the session-aware detect command, the session directory already contains features.csv at the expected path. Use archeo-cluster sessions --latest to get the session path quickly.