In the high-stakes world of top-tier AI conferences like ICLR (International Conference on Learning Representations), getting a paper accepted is a career-defining moment. Authors spend months optimizing algorithms, ablation studies, and prose. They expect rigor. They expect tough questions.

They do not expect 40 weaknesses and 40 questions from a single reviewer.

A recent incident involving an ICLR 2026 submission has set the Machine Learning community on fire, highlighting a growing crisis in academic peer review: the suspicion that AI is now reviewing AI, and doing a terrible job of it.

The Incident: “Reviewer #2” on Steroids

The drama unfolded when an author took to Reddit to share a bewildering experience. A reviewer had posted a critique containing an exhaustive, seemingly generated list of 40 distinct weaknesses followed by 40 distinct questions.

For context, a typical thorough review might list 3-5 major weaknesses and perhaps a handful of clarifying questions. Eighty individual points of contention is not just harsh; it is statistically improbable for a human reviewer to generate without significant redundancy or hallucination.

The “Hardware Hallucination”

The absurdity of the review became apparent in the specifics. Among the critiques was a demand that the authors demonstrate reproducibility across an exhaustive list of hardware architectures. The reviewer reportedly asked for experiments to be run on Volta, Ampere, Hopper, and Blackwell GPU clusters.

This kind of request betrays a fundamental lack of understanding of how academic labs operate (where budget is finite) and suggests a “completionist” logic often found in Large Language Models (LLMs) prompted to be “comprehensive.”

The “Inner Calm” Defense

When the authors pushed back against this impossible wall of text, the reviewer’s response turned from technical to strangely philosophical. Instead of addressing the scientific rebuttal, the reviewer allegedly replied with:

In the current impetuous and intricate society, if one aspires to be a scholar, it is imperative to attain inner calm. Scientific research demands tranquility, particularly peace of mind."

This phrase, described by community members as reading like a translated excerpt from a “Chinese cultivation novel,” was the smoking gun for many. It suggested the text was either fully generated by an LLM with a specific cultural training bias or heavily translated without context.

The Diagnosis: Is it AI?

The community consensus is stark: This is almost certainly an AI-generated review.

Several “tells” point to this conclusion:

  • Structure: The perfect symmetry (40 weaknesses, 40 questions) suggests a prompt like “List as many weaknesses as possible” or “Generate a comprehensive list of 40 questions.”

  • Nitpicking: The review obsessed over administrative details, such as copyright licenses for open-source datasets (e.g., demanding specific license types like MIT vs. CC BY-NC for standard datasets), rather than the paper’s core scientific contribution.

  • Hallucinated Standards: The hardware requirements (testing on unreleased or prohibitively expensive chips) reflect an LLM’s tendency to associate “rigor” with “listing every related entity it knows,” regardless of feasibility.

The Broader Problem: The Ouroboros of AI

This incident is funny, but it signals a dangerous feedback loop. We have reached a point where:

  • Researchers use AI to write code and polish papers.

  • Reviewers, overwhelmed by the exponential growth of submissions, use AI to generate reviews.

  • Meta-reviewers (Area Chairs) might eventually use AI to summarize the AI-generated reviews.

If an AI writes a paper and an AI reviews it, does the human understanding matter?

The “40 Questions” incident at ICLR 2026 is a wake-up call. It exposes the fragility of the current peer review system, which relies on the good faith and unpaid labor of humans who are increasingly checking out. If conferences cannot detect and filter out low-effort, generated slop—especially slop that demands 40 impossible things before breakfast—the credibility of the field risks collapsing under its own weight.

For now, the community watches and waits to see if the Area Chairs will step in, or if we all need to start cultivating “inner calm” to survive the review cycle.

References