Scrapes all the constants from any module in Lean!
You can #print names of constants in Lean 4 as follows:
#print Nat.add_le_addOutput:
theorem Nat.add_le_add : ∀ {a b c d : Nat}, a ≤ b → c ≤ d → a + c ≤ b + d :=
fun {a b c d} h₁ h₂ ↦ Nat.le_trans (Nat.add_le_add_right h₁ c) (Nat.add_le_add_left h₂ b)
The information associated with constant declarations is stored in the ConstantInfo type.
inductive ConstantInfo where
| axiomInfo (val : AxiomVal)
| defnInfo (val : DefinitionVal)
| thmInfo (val : TheoremVal)
| opaqueInfo (val : OpaqueVal)
| quotInfo (val : QuotVal)
| inductInfo (val : InductiveVal)
| ctorInfo (val : ConstructorVal)
| recInfo (val : RecursorVal)
deriving Inhabited- Extracts all the constants in the modules set by the user
- Filters out constants the user doesn't want (based on
ConstantInfo) - Finds the docstrings associated with the constants (if any)
- Takes the output of running
#printon those constants as a string - Constructs a Json object for each constant with all the above details (format of Json object given below)
- Writes all the Json objects in a
.jsonlfile
Format:
{
"docstring":"string",
"info":"string",
"name":"string",
"print":"string"
}Example:
{
"docstring":"The set of closed points. ",
"info":"defn",
"name":"closedPoints",
"print":"private theorem SimplexCategoryGenRel.simplicialInsert_length._proof_1_1 : ∀ (a : Nat),\n Eq (SimplexCategoryGenRel.simplicialInsert a List.nil).length (instHAdd.hAdd List.nil.length 1) :=\n⋯"
}(the example given above has setProofs set to false)
Clone this repo.
Go to LeanScraper/Inputs.lean and set your inputs:
- setModules (the modules you want to extract)
- setFilter (the filter on types of constants)
- setProofs (if you want proofs in the
"print"key or not) - setDataOutFilePath (the output
.jsonlfile path)
More details can be found in the comments above the inputs in LeanScraper/Inputs.lean.
(The defaults set in LeanScraper/Inputs.lean extract all constants from Lean and Mathlib, keep only the definitions, theorems and inductive types, and write their data into output.jsonl, with the proofs set to false in the "print" key output)
Run the following commands:
lake exe cache get
lake buildThen run the following command to start the scraping process:
lake exe ScraperIt could take anywhere between 2 min to 20 min to run, based on your system.
The data will be extracted into the output file you set.
Running lake exe Scraper with the same output file in LeanScraper/Inputs.lean will overwrite any data in that file.