#Autoformalization - social.coop

Recent searches

Search options

Only available when logged in.

0 posts0 participants0 posts today

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

Readings shared February 13, 2025. https://jaalonso.github.io/vestigium/posts/2025/02/13-readings_shared_02-13-25 #AI #Autoformalization #Coq #ITP #IsabelleHOL #LLMs #LeanProver #LogicProgramming #Math #Programming #Prolog #Rocq

Vestigium · Feb 13Readings shared February 13, 2025The readings shared in Bluesky on 13 February 2025 are Is mathematics obsolete? ~ Jeremy Avigad. #Math #ITP #LeanProver #AI #LLMs A Coq formalization of unification modulo exclusive-or. ~ Yichi Xu, D

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

FOLIO: Natural language reasoning with first-order logic. ~ Simeng Han et als. https://arxiv.org/abs/2209.00840 #LLMs #Logic #Autoformalization #Reasoning

arXiv.orgFOLIO: Natural Language Reasoning with First-Order LogicWe present FOLIO, a human-annotated, open-domain, and logically complex and diverse dataset for reasoning in natural language (NL), equipped with first order logic (FOL) annotations. FOLIO consists of 1,435 examples (unique conclusions), each paired with one of 487 sets of premises which serve as rules to be used to deductively reason for the validity of each conclusion. The logical correctness of premises and conclusions is ensured by their parallel FOL annotations, which are automatically verified by our FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO automatically constitute a new NL-FOL translation dataset using FOL as the logical form. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models (BERT, RoBERTa) and few-shot prompting on large language models (GPT-NeoX, OPT, GPT-3, Codex). For NL-FOL translation, we experiment with GPT-3 and Codex. Our results show that one of the most capable Large Language Model (LLM) publicly available, GPT-3 davinci, achieves only slightly better than random results with few-shot prompting on a subset of FOLIO, and the model is especially bad at predicting the correct truth values for False and Unknown conclusions. Our dataset and code are available at https://github.com/Yale-LILY/FOLIO.

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

Towards a mathematics formalisation assistant using Large Language Models. ~ Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, Anand Tadipatri. https://arxiv.org/abs/2211.07524 #AI #LLMs #ITP #LeanProver #Math #Autoformalization

arXiv.orgTowards a Mathematics Formalisation Assistant using Large Language ModelsMathematics formalisation is the task of writing mathematics (i.e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program. It is a thriving activity today, however formalisation remains cumbersome. In this paper, we explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover. We find that with careful input-dependent prompt selection and postprocessing, Codex is able to formalise short mathematical statements at undergrad level with nearly 75\% accuracy for $120$ theorem statements. For proofs quantitative analysis is infeasible and we undertake a detailed case study. We choose a diverse set of $13$ theorems at undergrad level with proofs that fit in two-three paragraphs. We show that with a new prompting strategy Codex can formalise these proofs in natural language with at least one out of twelve Codex completion being easy to repair into a complete proof. This is surprising as essentially no aligned data exists for formalised mathematics, particularly for proofs. These results suggest that large language models are a promising avenue towards fully or partially automating formalisation.

José A. Alonso @Jose_A_Alonso@mathstodon.xyz

Using large language models for (de-)formalization and natural argumentation exercises for beginner's students. ~ Merlin Carl. https://arxiv.org/abs/2304.06186 #LLMs #Autoformalization #Logic #Education

arXiv.orgUsing large language models for (de-)formalization and natural argumentation exercises for beginner's studentsWe describe two systems that use text-davinci-003, a large language model, for the automatized correction of (i) exercises in translating back and forth between natural language and the languages of propositional logic and first-order predicate logic and (ii) exercises in writing simple arguments in natural language in non-mathematical scenarios.

Drag & drop to upload