A Benchmark of Authentic Mathematics
The benchmarks that guide AI in mathematics are fundamentally broken. They measure performance on a narrow class of problems, failing to capture the actual intellectual labor of a research mathematician.
To get closer to the real practice of research, aiming for Millennium Prize problems is a non-starter. Progress doesn't happen in giant leaps; it happens one clever idea at a time. That's why we focus on technical lemmas: they are the tangible, tactical steps that represent the actual, day-to-day work of creating new mathematics.
Stella, standing for Set of Technical Lemmas is an open, community-driven initiative. Our mission is to build the first benchmark curated by and for mathematicians that values the tangible, tactical steps that are behind every results.
What is a Technical Lemma?
A technical lemma is a statement that is part of a more complex theory, and has minimal theoretical overhead.
A graduate student in an adjacent field should be able to understand the question without needing hundreds of pages of prerequisites, making it a perfect, tractable challenge.
Some Examples of Technical Lemmas
Contiguity of Probability Measures
Fix a sequence of finite or countable measurable spaces \((\Omega_n)_{n\ge 1}\) and contiguous probability measures \(\eta_n\) and \(\tilde\eta_n\) on \((\Omega_n)_{n\geq 1}\). Fix a family \(\{a_{x,n}:n\ge 1,x\in \Omega_n\}\) of uniformly bounded non-negative real numbers. Then, \[\sum_{x\in \Omega_n} a_{x,n}\, \eta_n(x)\xrightarrow[n\to \infty]{} 0\iff \sum_{x\in \Omega_n} a_{x,n}\, \tilde\eta_n(x)\xrightarrow[n\to \infty]{} 0\,.\]
Bounds on a Perturbation Measure
Let \(\mulambda\) be the perturbation measure. We have \begin{align} \label{mulambda0} & \mulambda \preceq \begin{cases} \frac{1}{l} & |x| \leq 2l, \\ \frac{\sqrt{\lambda} l}{x^2 \sqrt{\lambda-|x|}} & |x| \geq 2l \end{cases}, \\ \label{mulambda1} & \mulambda^{(\1)} \preceq \begin{cases} \frac{1}{l^2} & |x| \leq 2l, \\ \frac{l}{|x| \sqrt{\lambda} (\lambda - |x|)^{3/2}} + \frac{\sqrt{\lambda} l}{|x|^3 \sqrt{\lambda-|x|}} & |x| \geq 2l \end{cases}. \end{align}
“Turnstile Lab's mission is to build the shared infrastructure the research community needs to thrive in an AI-native world. Our first step is the Stella initiative: a community-curated benchmark to create the common ground for human and machine reasoning.”
Scientific Council
To ensure its scientific integrity and guide its mission, Stella is advised by a founding council of world-renowned experts.
Become a Contributor
Your insight is our most valuable asset. The dark matter of your work, the clever argument, the non-obvious step, the lemma that challenged you, can help guide the future of AI. Join us in building a benchmark that reflects the true nature of mathematical discovery.