ACM SIGMOD Availability & Reproducibility Initiative

ACM SIGMOD ARI (Availability & Reproducibility Initiative)

SIGMOD 2026 ARI

Stay tuned! SIGMOD 2026 ARI is coming soon. Submission details will be announced here.

February 2026: SIGMOD 2025 ARI Results

SIGMOD 2025 ARI is now completed. The reviews are finalized and the results have been communicated to the authors.
The awarded badges and the details of SIGMOD 2025 papers that participated in SIGMOD ARI are available here.

Thank you to all authors and to the Reproducibility Committee for their hard work.

The Best Artifact award is also announced. Congratulations to the winners!

Please note that the artifacts and the reproducibility reports will soon be made available.

Ad hoc SIGMOD ARI submissions

Every year, shortly after the CR submission of accepted papers in SIGMOD, all authors are invited to submit their artifacts in a subsequent Call for Artifacts. Authors of papers published at past SIGMOD conferences who want to submit an availability or reproducibility package at later time please reach out to the ACM SIGMOD ARI Chairs to facilitate the submission.

SIGMOD & PACMMOD Artifact Links

Quick links to reproduced and artifacts-available papers:

README

Quick guides for authors and reviewers.

What is SIGMOD Availability & Reproducibility?

SIGMOD Availability & Reproducibility has three goals:

Highlight the impact of database research papers.
Enable easy dissemination of research results.
Enable easy sharing of code and experimentation set-ups.

In short, the goal is to assist in building a culture where sharing results, code, and scripts of database research is the norm rather than an exception. The challenge is to do this efficiently, which means building technical expertise on how to do better research via creating repeatable and shareable research. The SIGMOD Availability & Reproducibility Committee is here to help you with this.

The SIGMOD Availability & Reproducibility effort works in coordination with the PVLDB Reproducibility to encourage the database community to develop a culture of sharing and cross-validation.

How it works?

On the high-level, the workflow of the SIGMOD Availability and Reproducibility process is shown below. For more details keep scrolling to the Process and Guidelines.

Why should I be part of this?

You will be making it easy for other researchers to compare with your work, to adopt and extend your research. This instantly means more recognition directly visible through ACM badges for your work and higher impact.

Taking part in the SIGMOD Availability & Reproducibility process allows you to (i) host your data, scripts and code in the ACM digital library as well to make them available to a broad audience, which will award the ACM Artifacts Available and/or the ACM Artifacts Evaluated - Reusable labels, and (ii) take the ACM Results Reproduced label once your results are independently reproduced. All three badges are embedded in your pdf in the ACM Digital Library.

Successful papers will be advertised at DBworld and the list of award winners are maintained in the main SIGMOD website. In addition, the official ACM Digital Library maintains all reproduced SIGMOD papers and all SIGMOD papers with available artifacts.

ACM SIGMOD Best Artifact Award

The award recognizes the best papers in terms of reproducibility. Every year, up to three fully reproduced papers with high-quality artifacts are recognized with the "Best Artifact" award, and the awards are presented during the awards session of the SIGMOD conference (next year). The criteria are as follows:

Reproducibility (ideal: all results can be verified)
Ease of Reproducibility (ideal: just works)
Portability (ideal: linux, mac, windows)
Replicability (ideal: can change workloads, queries, data and get similar behavior with published results)

The awards are selected by the Reproducibility Awards Committee, chaired by Dennis Shasha. The committee is formed after all submissions are received so that there are no conflicts. Decisions are made based on scores that reviewers assign to each paper for all factors described above.

How much overhead is it?

At first, making research shareable seems like an extra overhead for authors. You just had your paper accepted in a major conference; why should you spend more time on it? The answer is to have more impact!

If you ask any experienced researcher in academia or in industry, they will tell you that they in fact already follow the reproducibility principles on a daily basis! Not as an afterthought, but as a way of doing good research.

Maintaining easily reproducible experiments, simply makes working on hard problems much easier by being able to repeat your analysis for different data sets, different hardware, different parameters, etc. Like other leading system designers, you will save significant amounts of time because you will minimize the set up and tuning effort for your experiments. In addition, such practices will help bring new students up to speed after a project has lain dormant for a few months.

Ideally availability of artifacts reproducibility should be close to zero effort.

Criteria and Process

Artifact Availability & Evaluation

Each submission should contain: (1) A prototype system provided as a white box (source, configuration files, build environment) or a black-box system fully specified. (2) Input Data: Either the process to generate the input data should be made available, or when the data is not generated, the actual data itself or a link to the data should be provided. (3) The set of experiments (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data. (4) For full reproducibility submissions, the scripts needed to transform the raw data into the graphs included in the paper. By providing the artifacts you are awarded the "Artifacts Available" badge. By providing artifacts that are of a quality that significantly exceeds minimal functionality and are clearly documented, well-structured, and facilitate reuse you are awarded the "Artifacts Evaluated - Reusable" badge.

Reproducibility

The central results and claims of the paper should be supported by the submitted experiments, meaning we can recreate result data and graphs that demonstrate similar behavior with that shown in the paper. Typically when the results are about response times, the exact numbers will depend on the underlying hardware. We do not expect to get identical results with the paper unless it happens that we get access to identical hardware. Instead, what we expect to see is that the overall behavior matches the conclusions drawn in the paper, e.g., that a given algorithm is significantly faster than another one, or that a given parameter affects negatively or positively the behavior of a system. By having all core results reproduced by an independent reproducibility reviewer you are awarded the "Results Reproduced" badge.

Process

Availability Review

The artifacts of each paper are reviewed by one artifacts reviewer to ensure the quality of the submission. The role of the artifact availability reviewer is to compile, deploy, and test the artifacts and communicate directly with the authors for any issues face during this process. The ultimate goal is to resolve any issues so that all artifacts are made publicly available.

Reproducibility Review

After the initial availability review phase, each paper is reviewed by one database group. The process happens in communication with the reviewers so that authors and reviewers can iron out any lingering technical issues. The end result is a short report which describes the result of the process. For successful papers the report is maintained in the Reproducibility Reports page.

Packaging Guidelines

Every case is slightly different. Sometimes the Availability & Reproducibility committee can simply rerun software (e.g., rerun some existing benchmark). At other times, obtaining raw data may require special hardware (e.g., sensors in the arctic). In the latter case, the committee will not be able to reproduce the acquisition of raw data, but then you can provide the committee with a protocol, including detailed procedures for system set-up, experiment set-up, and measurements.

Whenever raw data acquisition can be produced, the following information should be provided.

Environment

Authors should explicitly specify the OS and tools that should be installed as the environment. Such specification should include dependencies with specific hardware features (e.g., 25 GB of RAM are needed) or dependencies within the environment (e.g., the compiler that should be used must be run with a specific version of the OS).

System

System setup is one of the most challenging aspects when repeating experiments. System setup will be easier to conduct if it is automatic rather than manual. Authors should test that the system they distribute can actually be installed in a new environment. The documentation should detail every step in system setup:

How to obtain the system?
How to configure the environment if need be (e.g., environment variables, paths)?
How to compile the system? (existing compilation options should be mentioned)
How to use the system? (What are the configuration options and parameters to the system?)
How to make sure that the system is installed correctly?

The above tasks should be achieved by executing a set o scripts provided by the authors that will download needed components (systems, libraries), initialize the environment, check that software and hardware is compatible, and deploy the system.

Tools

The committee strongly suggests using one of the following tools to streamline the process of reproducibility. These tools can be used to capture the environment, the input files, the expected output files, and the required libraries in a container-like suite. This will help both the authors and the evaluators to seamlessly rerun experiments under specific environments and settings. If using all these tools proves to be difficult for a particular paper, the committee will work with the authors to find the proper solution based on the specifics of the paper and the environment needed. Below is a list of the tools recommended by the SIGMOD Reproducibility Committee.

Docker containers
ReproZip
Jupiter Notebook
GitHub repositories with clearly outlined instructions in the ReadMe file

If your artifacts require cloud deployment, we strongly suggest to create a submission using one of the open tools for reproducible science. A partial list includes:

More tools are available here: https://reproduciblescience.org/reproducibility-directory/.

Experiments

Given a system, the authors should provide the complete set of experiments to reproduce the paper's results. Typically, each experiment will consist of the following parts.

A setup phase where parameters are configured and data is loaded.
A running phase where a workload is applied and measurements are taken.
A clean-up phase where the system is prepared to avoid interference with the next round of experiments.

The authors should document (i) how to perform the setup, running and clean-up phases, and (ii) how to check that these phases complete as they should. The authors should document the expected effect of the setup phase (e.g., a cold file cache is enforced) and the different steps of the running phase, e.g., by documenting the combination of command line options used to run a given experiment script.

Experiments should be automatic, e.g., via a script that takes a range of values for each experiment parameter as arguments, rather than manual, e.g., via a script that must be edited so that a constant takes the value of a given experiment parameter.

Graphs and Plots

For each graph in the paper, the authors should describe how the graph is obtained from the experimental measurements. The submission should contain the scripts (or spreadsheets) that are used to generate the graphs. We strongly encourage authors to provide scripts for all their graphs. The authors are free to choose from their favorite plotting tool using a tool such as Gnuplot, Matlab, Matplotlib, R, or Octave.

Ideal Reproducibility Submission

At a minimum the authors should provide a complete set of scripts to install the system, produce the data, run experiments and produce the resulting graphs along with a detailed Readme file that describes the process step by step so it can be easily reproduced by a reviewer.

The ideal reproducibility submission consists of a master script that:

installs all systems needed,
generates or fetches all needed input data,
reruns all experiments and generates all results,
generates all graphs and plots, and finally,
recompiles the sources of the paper

... to produce a new PDF for the paper that contains the new graphs. It is possible!