Skip to main content
This guide walks you through running Archeo-Cluster’s complete analysis pipeline on a set of images, from installation to viewing your results.

Prerequisites

  • Python 3.11 or later
  • uv package manager
  • A directory of archaeological images (JPEG, PNG)

Run the pipeline

1

Clone and install

Clone the repository and install all dependencies with uv sync:
git clone https://github.com/keviingarciah/archeo-cluster.git
cd archeo-cluster
uv sync
Verify the installation:
./run-cli --help
2

Prepare your images

Place your archaeological images in a directory. The included sample dataset is a good starting point:
ls ./dataset/
Archeo-Cluster processes all .jpg, .jpeg, and .png files in the input directory.
3

Run the complete pipeline

Use the pipeline command to run all three stages — detection, clustering, and spatial analysis — in one step:
./run-cli pipeline --input-dir ./dataset --color "#A98876"
The --color flag specifies the target color for artifact detection in hex format. The default #A98876 works well for ceramic fragments.You should see output like:
Running complete analysis pipeline
Session: /home/user/.local/share/archeo-cluster/sessions/dataset_20250320_143022

── Step 1: Object Detection ──
Detected 47 objects in 3 images

── Step 2: K-Means Clustering ──
Created 12 clusters across 3 images

── Step 3: Spatial Analysis ──
Analyzed image_01
Analyzed image_02
Analyzed image_03

── Performance Summary ──
┌──────────┬──────────────┬──────────────────┐
│ Stage    │ Duration (s) │ Peak Memory (MB) │
├──────────┼──────────────┼──────────────────┤
│ Detection│        2.341 │             84.2 │
│ Clustering│       0.812 │             52.1 │
│ Analysis │       0.445 │             38.7 │
│ Total    │       3.598 │            124.9 │
└──────────┴──────────────┴──────────────────┘

Pipeline complete!
4

Review output files

Each session stores its results in a dedicated directory. The output includes:
FileDescription
features.csvGeometric features extracted from each detected object
<image>/<image>_clustered.csvFeatures with cluster assignments
<image>/elbow_plot.pngK-selection elbow curve
<image>/cluster_scatter.pngCluster scatter plot
<image>/ann_results.csvAverage Nearest Neighbor index per cluster
<image>/ann_results.pngSpatial distribution map
<image>/<image>.geojsonGeoJSON export for QGIS
The results folder opens automatically when the pipeline completes. Use --no-open to disable this behavior.
5

View and manage sessions

List all past analysis sessions:
./run-cli sessions --list
Get the path to the latest session:
./run-cli sessions --latest
Use a named session to organize results:
./run-cli pipeline --input-dir ./dataset --session "site_A_excavation_2025"

Run stages individually

You can also run each stage of the pipeline separately for more control:
# Stage 1: Detect objects and extract features
./run-cli detect --input-dir ./dataset --color "#A98876" --min-area 50 --max-area 5000

# Stage 2: Cluster detected features
./run-cli cluster --input ./results/features.csv --max-k 10

# Stage 3: Run spatial analysis
./run-cli analyze --input ./results/clusters/image_01/image_01_clustered.csv

Python API

You can also drive the analysis programmatically from Python:
from archeo_cluster.core.detection import ObjectDetector
from archeo_cluster.core.clustering import KMeansAnalyzer
from archeo_cluster.core.spatial import run_spatial_analysis
from archeo_cluster.models import DetectionConfig, ClusteringConfig

# Stage 1: Detection
config = DetectionConfig(target_color="#A98876", min_area=50, max_area=5000)
detector = ObjectDetector(config)
results = detector.process_directory("./dataset", "./output/detection")

# Stage 2: Clustering
import pandas as pd
df = pd.DataFrame(results.to_feature_rows())
df.to_csv("./output/features.csv", index=False)

analyzer = KMeansAnalyzer(ClusteringConfig(max_k=10))
clusters = analyzer.process_features_csv("./output/features.csv", "./output/clusters")

# Stage 3: Spatial analysis
for result in clusters.results:
    clustered_csv = f"./output/clusters/{result.image_name}/{result.image_name}_clustered.csv"
    desc_stats, ann_results = run_spatial_analysis(clustered_csv, f"./output/analysis/{result.image_name}")
    for ann in ann_results:
        print(f"  Cluster {ann.cluster_id}: R={ann.r_index}{ann.interpretation}")

Next steps

CLI Reference

Explore all commands, flags, and default values.

Configuration

Customize detection, clustering, and paths with a config file.

Analysis Pipeline Guide

Deep dive into each stage of the pipeline.

Python API

Full API reference for programmatic usage.