Stella

Benchmarking the Unseen Universe of Mathematics

AI models are benchmarked on mathematical folklore, not frontier research. We believe they should be challenged by the real frontier of research: the vast, unseen "dark matter" of technical lemmas that form the logical backbone of every great result.

A Benchmark of Authentic Mathematics

The benchmarks that guide AI in mathematics are fundamentally broken. They measure performance on a narrow class of problems, failing to capture the actual intellectual labor of a research mathematician.

To get closer to the real practice of research, aiming for Millennium Prize problems is a non-starter. Progress doesn't happen in giant leaps; it happens one clever idea at a time. That's why we focus on technical lemmas: they are the tangible, tactical steps that represent the actual, day-to-day work of creating new mathematics.

Stella, standing for Set of Technical Lemmas is an open, community-driven initiative. Our mission is to build the first benchmark curated by and for mathematicians that values the tangible, tactical steps that are behind every results.

What is a Technical Lemma?

Definition. (Technical Lemma)

A technical lemma is a statement that is part of a more complex theory, and has minimal theoretical overhead.

A graduate student in an adjacent field should be able to understand the question without needing hundreds of pages of prerequisites, making it a perfect, tractable challenge.

Some Examples of Technical Lemmas

Contiguity of Probability Measures

Fix a sequence of finite or countable measurable spaces \((\Omega_n)_{n\ge 1}\) and contiguous probability measures \(\eta_n\) and \(\tilde\eta_n\) on \((\Omega_n)_{n\geq 1}\). Fix a family \(\{a_{x,n}:n\ge 1,x\in \Omega_n\}\) of uniformly bounded non-negative real numbers. Then, \[\sum_{x\in \Omega_n} a_{x,n}\, \eta_n(x)\xrightarrow[n\to \infty]{} 0\iff \sum_{x\in \Omega_n} a_{x,n}\, \tilde\eta_n(x)\xrightarrow[n\to \infty]{} 0\,.\]

Graph Theory & Probability · Hollom et al., Monotonicity of Random Regular Graphs

Bounds on a Perturbation Measure

Let \(\mulambda\) be the perturbation measure. We have \begin{align} \label{mulambda0} & \mulambda \preceq \begin{cases} \frac{1}{l} & |x| \leq 2l, \\ \frac{\sqrt{\lambda} l}{x^2 \sqrt{\lambda-|x|}} & |x| \geq 2l \end{cases}, \\ \label{mulambda1} & \mulambda^{(\1)} \preceq \begin{cases} \frac{1}{l^2} & |x| \leq 2l, \\ \frac{l}{|x| \sqrt{\lambda} (\lambda - |x|)^{3/2}} + \frac{\sqrt{\lambda} l}{|x|^3 \sqrt{\lambda-|x|}} & |x| \geq 2l \end{cases}. \end{align}

Random Matrix Theory · T. Leblé, CLT for the Sine-beta process

“Turnstile Lab's mission is to build the shared infrastructure the research community needs to thrive in an AI-native world. Our first step is the Stella initiative: a community-curated benchmark to create the common ground for human and machine reasoning.”

— Léo Dreyfus-Schmidt, Turnstile Labs

Scientific Council

To ensure its scientific integrity and guide its mission, Stella is advised by a founding council of world-renowned experts.

Thomas Leblé (Paris Cité)
Edouard Oyallon (Sorbonne University)
Julien Sabin (Rennes University)
and more to be announced...

Become a Contributor

Your insight is our most valuable asset. The dark matter of your work, the clever argument, the non-obvious step, the lemma that challenged you, can help guide the future of AI. Join us in building a benchmark that reflects the true nature of mathematical discovery.