Decipher Deep Math: Numeric Rounding Behaviors in LLMs

Abstract

This research investigates how language models understand and process numerical rounding tasks through linear probing techniques. We analyze the internal representations of various model architectures to understand how they encode proximity to multiples of 5 and 10. Our study implements streaming linear probes that process activations in batches rather than storing entire activation matrices, enabling memory-efficient analysis across multiple architectures including Transformer-based models (Qwen, Dream) and State Space Models (Mamba). Through layer-wise analysis, we identify which layers in different architectures best encode numerical proximity information and reveal significant differences between "thinking" and "non-thinking" model variants.

Bibliography

Core Methodology

Approximation in Empirical Science

Generalizing Empirical Adequacy I: Multiplicity and Approximation – Sebastian Lutz – Proposes a broadened concept of empirical adequacy within constructive empiricism, emphasizing multiplicity and approximation in theoretical–observational relations. Synthese (2014)
Scientific Hypothesis Generation by Large Language Models: Laboratory Validation in Breast Cancer Treatment – A. Abdel-Rehim, H. Zenil, O. Orhobor, M. Fisher, R. J. Collins, E. Bourne, G. W. Fearnley, E. Tate, H. X. Smith, L. N. Soldatova, and R. D. King – Demonstrates LLMs generating novel, experimentally validated drug combination hypotheses for breast cancer treatment. Journal of the Royal Society Interface (2025)

Psychology Study on Approximation

Children's Number Line Estimation Strategies – M. Li, J. Yang, and X. Ye – Explores how children use different number line strategies in bounded and unbounded tasks, showing developmental shifts in reference-point use. Frontiers in Psychology (2024)

Linear Probe

Understanding Intermediate Layers Using Linear Classifier Probes – G. Alain & Y. Bengio – Introduces 'probe' classifiers to examine feature separability and diagnostic behavior across neural network layers. arXiv (2016)

Models

Qwen3 Technical Report – Qwen Team – Presents the Qwen3 family of LLMs featuring dense and Mixture-of-Expert (MoE) models with integrated "thinking" and "non-thinking" modes. arXiv (2025)
Dream 7B: Diffusion Large Language Models – J. Ye, Z. Xie, L. Zheng, J. Gao, Z. Wu, X. Jiang, Z. Li, and L. Kong – Introduces Dream 7B, a diffusion-based LLM with iterative denoising generation, excelling in math, coding, and planning tasks. arXiv (2025)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces – A. Gu & T. Dao – Proposes the Mamba architecture, a state-space model that scales linearly and outperforms transformers in long-sequence modeling. arXiv (2023)

Model's Numerical Encoding Behaviors

Language Models Encode Numbers Using Digit Representations in Base 10 – A. A. Levy & M. Geva – Reveals that LLMs encode numbers via digit-wise base-10 representations, explaining systematic numeric errors. ACL Anthology (2025)
Language Model Probabilities are Not Calibrated in Numeric Contexts – C. Lovering, M. Krumdick, V. D. Lai, V. Reddy, S. Ebner, N. Kumar, R. Koncel-Kedziorski, and C. Tanner – Examines how LMs fail to calibrate probabilities in numeric contexts, even in simple reasoning tasks. ACL Anthology (2025)

Early Stopping Opportunities

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks – S. Teerapittayanon, B. McDanel, and H. T. Kung – Proposes BranchyNet, enabling faster inference by allowing early exits from intermediate layers of neural networks. arXiv (2017)

BibTeX

@article{decipher_deep_math_2025,
  title={Decipher Deep Math: Numeric Rounding Behaviors in LLMs},
  author={Anonymous authors - will reveal laters},
  journal={DeepMath},
  year={2025},
  note={Supplemental material for DeepMath 2025 Abstract}
}