Minerva was built on the Pathways Language Model with more mathematical datasets like arXiv, text containing LaTeX and MathJax
Google AI developed a deep learning language model called Minerva, which could solve quantitative mathematical problems using step-by-step reasoning.
In the recently published paper on Minerva, researchers explained this deep learning model’s development. They achieved a state-of-the-art solution by training a deep learning model on a large training dataset that contains quantitative reasoning with symbolic expressions. The final model, Minerva, could solve quantitative mathematical problems on STEM reasoning tasks.
Minerva parses the question using natural language processing and mathematical notation processing techniques. It recalls the relevant formulas, constants, and step-by-step numerical calculation solutions. It generates solutions that include symbolic manipulation and numerical computation without relying on a calculator to get the final answers. By generating different explanations for the problem with different assigned probabilities, Minerva used majority voting to select the definitive answer. The following picture shows a sample of Minerva’s output for a quantitative mathematical problem.
Minerva was built on the Pathways Language Model (PaLM, 540-billion parameter, densely activated, transformer language model) with more mathematical datasets like arXiv, text containing LaTeX and MathJax, or other mathematical formats. Symbolic mathematical notations are preserved in the training dataset to train the model on symbolic data. This process is shown in the following diagram.
To benchmark Minerva’s performance, STEM benchmarks ranging from grade school to graduate level were used. Researchers used datasets like MATH (High school math competition level problems), MMLU-STEM (massive multitask language understanding benchmark focused on STEM, covering topics like engineering, chemistry, math, and physics at high school and college level), and GSM8k (grade school math problems involving basic arithmetic operations solvable by a talented middle school student).
One of the critical limitations of Minerva is that the model’s answers could not be evaluated automatically.
“Our approach to quantitative reasoning is not grounded in formal mathematics. Minerva parses questions and generates answers using a mix of natural language and LaTeX mathematical expressions, with no explicit underlying mathematical structure. This approach has an important limitation: the model’s answers cannot be automatically verified. Even when the final answer is known and verified, the model can arrive at a correct final answer using incorrect reasoning steps, which cannot be automatically detected. This limitation is not present in formal methods for theorem proving (e.g., Coq, Isabelle, HOL, Lean, Metamath, and Mizar),” said the company.
To evangelise NLP models for quantitative reasoning, Google AI shared an interactive sample explorer for the public to explore Minerva’s capabilities.