STS 2026 : The 4th Semi-supervised Teeth Segmentation Challenge on Metal Artifact Reduction and Beyond

MICCAI 2026, Abu Dhabi, United Arab Emirates

October 4-8, 2026

Evaluation Protocol

STS 2026 is organized into three tasks: metal artifact CBCT teeth segmentation, CBCT-IOS registration and MMDental multimodal analysis. Together, they cover robust CBCT tooth analysis, cross-modal dental geometry fusion and multimodal clinical reasoning.

Evaluation Tasks

Task	Submission	Assessment
Task 1: Metal Artifact CBCT Teeth Segmentation	Tooth segmentation masks for CBCT scans affected by metal artifacts.	DSC and HD95 measure segmentation overlap and boundary accuracy.
Task 2: CBCT-IOS Registration	Rigid transformation matrix aligning the IOS crown surface to the CBCT volume.	MTE and MRE measure geometric alignment accuracy.
Task 3: MMDental Multimodal Analysis	Model outputs generated from tooth CBCT images and expert medical records.	Task-specific multimodal diagnosis, reporting and reasoning metrics will be released with the protocol.

Metrics

Dice Similarity Coefficient (DSC): segmentation overlap.
95% Hausdorff Distance (HD95): segmentation boundary accuracy.
Mean Translation Error (MTE): registration translation accuracy.
Mean Rotation Error (MRE): registration rotation accuracy.
Task 3 multimodal metrics will assess consistency between CBCT evidence and expert clinical records.

DSC and HD95 are intentionally kept separate for Task 1. HD95 is emphasized because metal artifact blooming mainly corrupts anatomical boundaries, and boundary fidelity is clinically important for artifact-affected CBCT analysis. DSC is used to report overlap quality.

Ranking Rules

Task 1 reports Rank_DSC and Rank_HD95 for metal artifact CBCT teeth segmentation.
Task 2 ranks teams by registration accuracy using MTE and MRE.
Task 3 will use a separate leaderboard based on the released MMDental multimodal protocol.
Missing or failed test-case results receive the worst possible score, such as DSC=0 or HD95=infinity.

Dataset Split

Split	Cases	Provided Data
Training (Labeled)	40	CBCT, IOS, segmentation masks and registration matrices.
Training (Unlabeled)	219	Raw CBCT and IOS data for semi-supervised learning.
Validation	20	Raw CBCT and IOS data; ground truth withheld for server evaluation.
Test	100	Raw CBCT and IOS data; hidden ground truth for final ranking.
Total	379	All cases include metallic restorations and metal artifacts.

The labeled training set and test set share the same metal artifact severity stratification: 30% mild, 40% moderate and 30% severe. This mirrored distribution keeps the task definition stable and evaluates robustness to metal artifacts rather than unexpected domain shift.

Statistical Analysis

The organizers will estimate 95% confidence intervals using bootstrap analysis, compare top teams with paired Wilcoxon signed-rank tests, and report variability with standard deviation, interquartile range and box-and-whisker plots. Additional analyses will include artifact severity stratification, semi-supervised learning efficacy and inter-rater reliability by artifact subgroup.

The latest public materials for STS 2026 will also be linked from the official GitHub repository.