Skip to main content

Installation

Install Archeo-Cluster using uv and get your environment ready.

Quickstart

Run your first analysis pipeline in minutes.

CLI Reference

Complete reference for all CLI commands and options.

Python API

Use Archeo-Cluster programmatically in your Python scripts.

What is Archeo-Cluster?

Archeo-Cluster is a Python CLI tool for analyzing archaeological images. It implements a three-stage pipeline that automatically detects artifacts, groups them by similarity, and analyzes their spatial distribution patterns.
Images → Detection (OpenCV) → Clustering (K-Means) → Spatial Analysis (ANN)
Built for researchers who need to process large collections of archaeological photographs and extract quantitative data about artifact types and distribution without manual annotation.

How it works

1

Object Detection

Images are converted to HSV color space. Color-based thresholding isolates artifact regions. Morphological operations clean noise. OpenCV findContours identifies object boundaries and extracts geometric features: area, perimeter, centroid, circularity, and aspect ratio.
2

K-Means Clustering

Extracted features are normalized and fed into K-Means clustering. The elbow method automatically determines the optimal number of clusters (K) by analyzing within-cluster sum of squares (WCSS). Each artifact is assigned to a cluster based on feature similarity.
3

Spatial Analysis

The Average Nearest Neighbor (ANN) index computes the ratio of observed vs. expected mean nearest-neighbor distances. ANN < 1 indicates clustering, ANN > 1 indicates dispersion, ANN ≈ 1 indicates random distribution. Results export to GeoJSON for use in QGIS.

Key features

Color segmentation

HSV-based segmentation isolates artifacts by color. Configure the target color with any hex value to match ceramic fragments, stone tools, or other materials.

Automatic K selection

The elbow method and WCSS analysis automatically determine the optimal number of clusters — no manual tuning required.

Spatial statistics

Average Nearest Neighbor (ANN) index quantifies whether artifacts cluster, disperse, or distribute randomly across an excavation site.

GeoJSON export

Results export as GeoJSON for direct import into QGIS and other GIS tools for further spatial analysis.

Session management

Each analysis run is stored in a named session directory. Revisit, compare, and manage previous results without re-running the pipeline.

Python API

Every CLI command has a corresponding Python class. Use ObjectDetector, KMeansAnalyzer, and run_spatial_analysis directly in your scripts.

Quick example

# Install
git clone https://github.com/keviingarciah/archeo-cluster.git
cd archeo-cluster
uv sync

# Run the full pipeline
./run-cli pipeline --input-dir ./dataset --color "#A98876"
from archeo_cluster.core.detection import ObjectDetector
from archeo_cluster.core.clustering import KMeansAnalyzer
from archeo_cluster.models import DetectionConfig, ClusteringConfig

# Detect artifacts
config = DetectionConfig(target_color="#A98876", min_area=50, max_area=5000)
detector = ObjectDetector(config)
results = detector.process_directory("./dataset", "./output")

# Cluster by feature similarity
analyzer = KMeansAnalyzer(ClusteringConfig(max_k=10))
clusters = analyzer.process_features_csv("./output/features.csv", "./clusters")