ACM SIGMOD Availability & Reproducibility Initiative

ACM SIGMOD 2025 Badges and Reproducibility Results

To access the reproducibility reports, please click here.

Paper Title	Report	Badges Awarded
On the Feasibility and Benefits of Extensive Evaluation
A Lovász-Simonovits Theorem for Hypergraphs with Application to Local Clustering
Memento Filter: A Fast, Dynamic, and Robust Range Filter Honorable Mention for Best Artifact
Live Patching for Distributed In-Memory Key-Value Stores Best Artifact
An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQL
CRDV: Conflict-free Replicated Data Views Honorable Mention for Best Artifact
Deep Overlapping Community Search via Subspace Embedding
Disco: A Compact Index for LSM-trees
H-Rocks: CPU-GPU accelerated Heterogeneous RocksDB on Persistent Memory
InTime: Towards Performance Predictability In Byzantine Fault Tolerant Proof-of-Stake Consensus
Constant Optimization Driven Database System Testing
Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities
BCviz: A Linear-Space Index for Mining and Visualizing Cohesive Bipartite Subgraphs
T3: Accurate and Fast Performance Prediction for Relational Database Systems With Compiled Decision Trees
Parallel k-Core Decomposition: Theory and Practice
Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables
Self-Enhancing Video Data Management System for Compositional Events with Large Language Models
Galley: Modern Query Optimization for Sparse Tensor Programs
Are Database System Researchers Making Correct Assumptions about Transaction Workloads? Honorable Mention for Best Artifact
Alsatian: Optimizing Model Search for Deep Transfer Learning
RWalks: Random Walks as Attribute Diffusers for Filtered Vector Search
PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models
Fast and Scalable Data Transfer Across Data Systems
SuSe: Summary Selection for Regular Expression Subsequence Aggregation over Streams
Zombie Hashing: Reanimating Tombstones in Graveyard
Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing
SBSC: A fast Self-tuned Bipartite proximity graph-based Spectral Clustering Honorable Mention for Best Artifact
PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees
LpBound: Pessimistic Cardinality Estimation Using ℓp-Norms of Degree Sequences
HotStuff-1: Linear Consensus with One-Phase Speculation Honorable Mention for Best Artifact
Approximating Opaque Top-k Queries Honorable Mention for Best Artifact
Dangers of List Processing in Querying Property Graphs
A Structured Study of Multivariate Time-Series Distance Measures
Mnemosyne: Dynamic Workload-Aware BF Tuning via Accurate Statistics in LSM trees
SPARTAN: Data-Adaptive Symbolic Time-Series Approximation
Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality
Efficient and Accurate PageRank Approximation on Large Graphs
SketchQL: Video Moment Querying with a Visual Query Interface
Transforming RDF Graphs to Property Graphs using Standardized Schemas
Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs
GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAP
iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search
Disclosure-Compliant Query Answering
Understanding and Reusing Test Suites Across Database Systems
Online Marketplace: A Benchmark for Data Management in Microservices
Efficient Maximum s-Bundle Search via Local Vertex Connectivity
SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference
Efficiently Counting Triangles in Large Temporal Graphs
DISCES: Systematic Discovery of Event Stream Queries
Boosting OLTP Performance with Per-Page Logging on NVDIMM
Computing Approximate Graph Edit Distance via Optimal Transport
SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training
An Experimental Comparison of Tree-data Structures for Connectivity Queries on Fully-dynamic Undirected Graphs
Aster: Enhancing LSM-structures for Scalable Graph Database
Entity/Relationship Graphs Principled Design, Modeling, and Data Integrity Management of Graph Databases
DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation Graph
Progressive entity resolution: a design space exploration
Data Chunk Compaction in Vectorized Execution
Efficiently Processing Joins and Grouped Aggregations on GPUs
Moving on From Group Commit: Autonomous Commit Enables High Throughput and Low Latency on NVMe SSDs
GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support
Serf: Streaming Error-Bounded Floating-Point Compression
Accelerate Distributed Joins with Predicate Transfer
Automated Validating and Fixing of Text-to-SQL Translation with Execution Consistency
HoneyComb: A Parallel Worst-Case Optimal Join on Multicores
Computing Inconsistency Measures Under Differential Privacy
Athena: An Effective Learning-based Framework for Query Optimizer Performance Improvement
Rule-Based Graph Cleaning with GPUs on a Single Machine
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces
FAAQP: Fast and Accurate Approximate Query Processing based on Bitmap-augmented Sum-Product Network
Finding Logic Bugs in Graph-processing Systems via Graph-cutting
cuMatch: A GPU-based Memory-Efficient Worst-case Optimal Join Processing Method for Subgraph Queries with Complex Patterns
Low Rank Learning for Offline Query Optimization
PQCache: Product Quantization-based KVCache for Long Context LLM Inference
Fast Maximum Common Subgraph Search: A Redundancy-Reduced Backtracking Approach
Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search
Faster and Efficient Density Decomposition via Proportional Response with Exponential Momentum
Privacy and Accuracy-Aware AI/ML Model Deduplication
SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint
Fast Hypertree Decompositions via Linear Programming: Fractional and Generalized
Divide-and-Conquer: Scalable Shortest Path Counting on Large Road Networks
SHIELD: Encrypting Persistent Data of LSM-KVS from Monolithic to Disaggregated Storage
Logical and Physical Optimizations for SQL Query Execution over Large Language Models
Cracking SQL Barriers: An LLM-based Dialect Translation System
Credible Intervals for Knowledge Graph Accuracy Estimation
Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt?
PDX: A Data Layout for Vector Similarity Search
Aero: Adaptive Query Processing of ML Queries
Automating Vectorized Distributed Graph Computation
BSX: Subgraph Matching with Batch Backtracking Search
DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training
Dual-Hierarchy Labelling: Scaling Up Distance Queries on Dynamic Road Networks
Practical DB-OS Co-Design with Privileged Kernel Bypass
Multi-Level Graph Representation Learning Through Predictive Community-based Partitioning
Styx: Transactional Stateful Functions on Streaming Dataflows
AJOSC: Adaptive Join Order Selection for Continuous Queries
SPACE: Cardinality Estimation for Path Queries Using Cardinality-Aware Sequence-based Learning
Subgroup Discovery with Small and Alternative Feature Sets
How Good Are Learned Cost Models, Really? Insights from Query Optimization Tasks
NEXT: A New Secondary Index Framework for LSM-based Data Storage
Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees
Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
MIRAGE-ANNS: Mixed Approach Graph-based Indexing for Approximate Nearest Neighbor Search
Incremental Rule Discovery in Response to Parameter Updates
Revisiting Graph Analytics Benchmark

Paper Title	Report	Badges Awarded
Proximity Queries on Point Clouds using Rapid Construction Path Oracle
PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware
Optimizing Distributed Protocols with Query Rewrites
Hierarchical Cut Labelling - Scaling Up Distance Queries on Road Networks
Proving Query Equivalence Using Linear Integer Arithmetic
Faster Algorithms for Fair Max-Min Diversification in R^d
Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offs Best Artifact
Query Refinement for Diverse Top-k Selection
Grafite: Taming Adversarial Queries with Optimal Range Filters
FACET: Robust Counterfactual Explanation Analytics
CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models Honorable Mention for Best Artifact
Temporal JSON Keyword Search
DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance
PECJ: Stream Window Join on Disorder Data Streams with Proactive Error Compensation
NOCAP: Near-Optimal Correlation-Aware Partitioning Joins
Dias: Dynamic Rewriting of Pandas Code Honorable Mention for Best Artifact
Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach
MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores (presented in SIGMOD 2023)
PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer Replacement
ALP: Adaptive lossless floating-point compression Best Artifact
A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations
On The Reasonable Effectiveness of Relational Diagrams: Explaining Relational Query Patterns and the Pattern Expressiveness of Relational Languages
StarfishDB: A Query Execution Engine for Relational Probabilistic Programming
On Efficient Large Sparse Matrix Chain Multiplication
Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet
Revisiting B-tree Compression: An Experimental Study
Spruce: a Fast yet Space-saving Structure for Dynamic Graph Storage
Cabin: a Compressed Adaptive Binned Scan Index
NOC-NOC: Towards Performance-optimal Distributed Transactions
Anchor: A Library for Building Secure Persistent Memory Systems
Origin-Destination Travel Time Oracle for Map-based Services
Implementation Strategies for Views over Property Graphs
Correlation Joins over Time Series Data Streams Utilizing Complementary Dimension Reduction and Transformation
Homomorphic Compression: Making Text Processing on Compression Unlimited
RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
Generation of Training Examples for Tabular Natural Language Inference
Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control
Approximate Sketches
SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search
Relative Keys: Putting Feature Explanation into Context
Memory-Efficient and Flexible Detection of Heavy Hitters in High-Speed Networks
AirIndex: Versatile Index Tuning Through Data and Storage
Counterfactual Explanation at Will, with Zero Privacy Leakage
Sub-optimal Join Order Identification with L1-error
CAVE: Concurrency-Aware Graph Processing on SSDs
Optimizing Time Series Queries with Versions
LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries
Rethinking Learned Cost Models: Why Start from Scratch?
Efficient k-Clique Listing: An Edge-Oriented Branching Strategy
AS-Parser: Log Parsing Based on Adaptive Segmentation
High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Interpolation
view-based explanations for graph neural networks
High-Ratio Compression for Machine-Generated Data
Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms
Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation
CodeS: Towards Building Open-source Language Models for Text-to-SQL
HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs
SWIX: A Memory-efficient Sliding Window Learned Index
Understanding the Performance Implications of the Design Principles in Storage-Disaggregated Databases
ThalamusDB: Approximate Query Processing on Multi-Modal Data
A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and Interaction
Scalable Distributed Inverted List Indexes in Disaggregated Memory
ADGNN: Towards Scalable GNN Training with Aggregation-Difference Aware Sampling
Selectivity Estimation for Queries Containing Predicates over Set-Valued Attributes
Practical Dynamic Extension for Sampling Indexes
Materialized View Selection & View-Based Query Planning for Regular Path Queries
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Efficient Core Maintenance in Large Bipartite Graphs
PLATON: Top-down R-tree Packing with Learned Partition Policy
In-Database Data Imputation
Saga: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications
CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure

Paper Title	Report	Badges Awarded
Polaris: Enabling Transaction Priority in Optimistic Concurrency Control
SafeBound: A Practical System for Generating Cardinality Bounds
Efficient Star-based Truss Maintenance on Dynamic Graphs
Efficient GPU-Accelerated Subgraph Matching
T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph
Efficient and Effective Algorithms for Generalized Densest Subgraph Discovery
GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example
AWARE: Workload-aware, Redundancy-exploiting Linear Algebra
iFlipper: Label Flipping for Individual Fairness
ML2DAC: Meta-learning to Democratize AutoML for Clustering Analysis
LightCTS: A Lightweight Framework for Correlated Time Series Forecasting
DeltaBoost: Gradient Boosting Decision Trees with Efficient Machine Unlearning Honorable Mention for Best Artifact
A Universal Question-Answering Platform for Knowledge Graphs
InfiniFilter: Expanding Filters to Infinity and Beyond Best Artifact
EAR-Oracle: On Efficient Indexing for Distance Queries between Arbitrary Points on Terrain Surface
Efficiently Computing Join Orders with Heuristic Search
Ready to Leap (by Co-Design)? Join Order Optimisation on Quantum Hardware
A Step Toward Deep Online Aggregation Honorable Mention for Best Artifact
High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations
MRVs: Enforcing Numeric Invariants in Parallel Updates to Hotspots with Randomized Splitting
Using Cloud Functions as Accelerator for Elastic Data Analytics
Parallel Strong Connectivity Based on Faster Reachability
INEv: In-Network Evaluation for Event Stream Processing
Discovering Similarity Inclusion Dependencies
DAMR: Dynamic Adjacency Matrix Representation Learning for Multivariate Time Series Imputation
Learned Data-aware Image Representations of Line Charts for Similarity Search
Scapin: Scalable Graph Structure Perturbation by Augmented Influence Maximization
ForestTI: A Scalable Inverted-Index-Oriented Timeseries Management System with Flexible Memory Efficiency
FactorJoin: A New Cardinality Estimation Framework for Join Queries
Detock: High Performance Multi-region Transactions at Scale
Most Expected Winner: An Interpretation of Winners over Uncertain Voter Preferences
Dumpy: A Compact and Adaptive Index for Large Data Series Collections
ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at Scale
Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations
Pea Hash: A Performant Extendible Adaptive Hashing Index
Prerequisite-driven Fair Clustering on Heterogeneous Information Networks
On Querying Connected Components in Large Temporal Graphs
Efficient and Effective Attributed Hypergraph Clustering via 𝐾-Nearest Neighbor Augmentation
LightTS: Lightweight Time Series Classification with Adaptive Ensemble Distillation
How To Optimize My Blockchain? A Multi-Level Recommendation Approach
dsJSON: A Distributed SQL JSON Processor
When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent Memories
CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression
Query-Guided Resolution in Uncertain Databases
LightRW: FPGA Accelerated Graph Dynamic Random Walks
Exploiting Structure in Regular Expression Queries
Grep: A Graph Learning Based Database Partitioning System
Maximum k-Biplex Search on Bipartite Graphs: A Symmetric-BK Branching Approach

Paper Title	Report	Badges Awarded
JEDI: These aren't the JSON documents you're looking for ... Best Artifact
CoLES: Contrastive Learning for Event Sequences with Self-Supervision
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
AutoMon: Automatic Distributed Monitoring for Arbitrary Multivariate Functions
Conjunctive Queries with Comparisons
WeTune: Automatic Discovery and Verification of Query Rewrite Rules
Hybrid Deterministic and Nondeterministic Execution of Transactions in Actor Systems
Controlled Intentional Degradation in Analytical Video Systems
Cooperative Route Planning Framework for Multiple Distributed Assets in Maritime Applications
Explaining Link Prediction Systems based on Knowledge Graph Embeddings
dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series Classification
HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach Honorable Mention for Best Artifact
DataPrism: Exposing Disconnect between Data and Systems
GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams
Rank Aggregation with Proportionate Fairness
Givens QR Decomposition over Relational Databases
TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data
Finding Label and Model Errors in Perception Data With Learned Observation Assertions
Secure and Policy-Compliant Query Processing on Heterogeneous Computational Storage Architecture
Interpretable Data-Based Explanations for Fairness Debugging
Efficient Algorithms for Maximal k-Biplex Enumeration
Efficient Massively Parallel Join Optimization for Large Queries
Redundancy Elimination in Distributed Matrix Computation
Efficient Answering of Historical What-if Queries
HET-GMP: a Graph-based System Approach to Scaling Large Embedding Model Training
SAM: Database Generation from Query Workload with Supervised Autoregressive Model
CompactWalks: Taming Knowledge-Graph Embeddings With Domain- and Task-Specific Pathways

Paper Title	Report	Badges Awarded
Parallelizing Intra-Window Join on Multicores: An Experimental Study
REDS: Rule Extraction for Discovering Scenarios
At-the-time and Back-in-time Persistent Sketches
LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems
One WITH RECURSIVE is Worth Many GOTOs
Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems Best Artifact
ExDRa: Exploratory Data Science on Federated Raw Data
MxTasks: How to Make Efficient Synchronization and Prefetching Easy
Efficient Graph Summarization using Weighted LSH at Billion-Scale
Putting Things into Context: Rich Explanations for Query Answers using Join Graphs
TreeToaster: Towards an IVM-Optimized Compiler
EIRES: Efficient Integration of Remote Data in Event Stream Processing
SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging
Adaptive Compression for Fast Scans on String Columns
Tuplex: Data Science in Python at Native Code Speed
Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence
Parallel Index-based Structural Graph Clustering and its Approximation
Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds
COMPASS: Online Sketch-based Query Optimization for In-Memory Databases
Clonos: Consistent Causal Recovery for Highly-Available Streaming Dataflows
Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads
Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Efficient Exploration of Interesting Aggregates in RDF Graphs
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Paper Title	Report	Badges Awarded
QueryVis: Logic-based Diagrams help Users Understand Complicated SQL Queries Faster Best Artifact
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks
Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs
Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study
Factorized Graph Representations for Semi-Supervised Learning from Sparse Data Best Artifact
Functional-Style SQL UDFs With a Capital 'F' Best Artifact
Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring
BugDoc: Algorithms to Debug Computational Processes
Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data
Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing
Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects
LISA: A Learned Index Structure for Spatial Data
Timely Reporting of Heavy Hitters using External Memory
Theoretically-Efficient and Practical Parallel DBSCAN
Towards Interpretable and Learnable Risk Analysis for Entity Resolution
BinDex: A Two-Layered Index for Fast and Robust Scans

Badges and Reproducibility Reports

ACM SIGMOD 2025 Badges and Reproducibility Results

ACM SIGMOD 2024 Badges and Reproducibility Results

ACM SIGMOD 2023 Badges and Reproducibility Results

ACM SIGMOD 2022 Badges and Reproducibility Results

ACM SIGMOD 2021 Badges and Reproducibility Results

ACM SIGMOD 2020 Badges and Reproducibility Results