Quickstart

This guide walks you through running Archeo-Cluster’s complete analysis pipeline on a set of images, from installation to viewing your results.

Prerequisites

Python 3.11 or later
uv package manager
A directory of archaeological images (JPEG, PNG)

Run the pipeline

Clone and install

Clone the repository and install all dependencies with uv sync:

git clone https://github.com/keviingarciah/archeo-cluster.git
cd archeo-cluster
uv sync

Verify the installation:

./run-cli --help

Prepare your images

Place your archaeological images in a directory. The included sample dataset is a good starting point:

ls ./dataset/

Archeo-Cluster processes all .jpg, .jpeg, and .png files in the input directory.

Run the complete pipeline

Use the pipeline command to run all three stages — detection, clustering, and spatial analysis — in one step:

./run-cli pipeline --input-dir ./dataset --color "#A98876"

The --color flag specifies the target color for artifact detection in hex format. The default #A98876 works well for ceramic fragments.You should see output like:

Running complete analysis pipeline
Session: /home/user/.local/share/archeo-cluster/sessions/dataset_20250320_143022

── Step 1: Object Detection ──
Detected 47 objects in 3 images

── Step 2: K-Means Clustering ──
Created 12 clusters across 3 images

── Step 3: Spatial Analysis ──
Analyzed image_01
Analyzed image_02
Analyzed image_03

── Performance Summary ──
┌──────────┬──────────────┬──────────────────┐
│ Stage    │ Duration (s) │ Peak Memory (MB) │
├──────────┼──────────────┼──────────────────┤
│ Detection│        2.341 │             84.2 │
│ Clustering│       0.812 │             52.1 │
│ Analysis │       0.445 │             38.7 │
│ Total    │       3.598 │            124.9 │
└──────────┴──────────────┴──────────────────┘

Pipeline complete!

Review output files

Each session stores its results in a dedicated directory. The output includes:

File	Description
`features.csv`	Geometric features extracted from each detected object
`<image>/<image>_clustered.csv`	Features with cluster assignments
`<image>/elbow_plot.png`	K-selection elbow curve
`<image>/cluster_scatter.png`	Cluster scatter plot
`<image>/ann_results.csv`	Average Nearest Neighbor index per cluster
`<image>/ann_results.png`	Spatial distribution map
`<image>/<image>.geojson`	GeoJSON export for QGIS

The results folder opens automatically when the pipeline completes. Use --no-open to disable this behavior.

View and manage sessions

List all past analysis sessions:

./run-cli sessions --list

Get the path to the latest session:

./run-cli sessions --latest

Use a named session to organize results:

./run-cli pipeline --input-dir ./dataset --session "site_A_excavation_2025"

Run stages individually

You can also run each stage of the pipeline separately for more control:

# Stage 1: Detect objects and extract features
./run-cli detect --input-dir ./dataset --color "#A98876" --min-area 50 --max-area 5000

# Stage 2: Cluster detected features
./run-cli cluster --input ./results/features.csv --max-k 10

# Stage 3: Run spatial analysis
./run-cli analyze --input ./results/clusters/image_01/image_01_clustered.csv

Python API

You can also drive the analysis programmatically from Python:

from archeo_cluster.core.detection import ObjectDetector
from archeo_cluster.core.clustering import KMeansAnalyzer
from archeo_cluster.core.spatial import run_spatial_analysis
from archeo_cluster.models import DetectionConfig, ClusteringConfig

# Stage 1: Detection
config = DetectionConfig(target_color="#A98876", min_area=50, max_area=5000)
detector = ObjectDetector(config)
results = detector.process_directory("./dataset", "./output/detection")

# Stage 2: Clustering
import pandas as pd
df = pd.DataFrame(results.to_feature_rows())
df.to_csv("./output/features.csv", index=False)

analyzer = KMeansAnalyzer(ClusteringConfig(max_k=10))
clusters = analyzer.process_features_csv("./output/features.csv", "./output/clusters")

# Stage 3: Spatial analysis
for result in clusters.results:
    clustered_csv = f"./output/clusters/{result.image_name}/{result.image_name}_clustered.csv"
    desc_stats, ann_results = run_spatial_analysis(clustered_csv, f"./output/analysis/{result.image_name}")
    for ann in ann_results:
        print(f"  Cluster {ann.cluster_id}: R={ann.r_index} — {ann.interpretation}")

Next steps

CLI Reference

Explore all commands, flags, and default values.

Configuration

Customize detection, clustering, and paths with a config file.

Analysis Pipeline Guide

Deep dive into each stage of the pipeline.

Python API

Full API reference for programmatic usage.

Get Started

CLI Reference

Configuration

Guides

Python API

Contributing

Prerequisites

Run the pipeline

Run stages individually

Python API

Next steps

CLI Reference

Configuration

Analysis Pipeline Guide

Python API

Get Started

CLI Reference

Configuration

Guides

Python API

Contributing

​Prerequisites

​Run the pipeline

​Run stages individually

​Python API

​Next steps

CLI Reference

Configuration

Analysis Pipeline Guide

Python API

Prerequisites

Run the pipeline

Run stages individually

Python API

Next steps