Assessor

Assessor, as an adaptive colloquium, generates challenging training samples for training of symbolic regression systems like Boolformer. Assessor plays the crucial role of self-play, as in AlphaZero, yet is tailored to the asymmetry of symbolic regression: 👉 Crafting puzzles is simpler than solving them (post on X)

The setup runs like a GAN. Assessor, a transformer based on nanoGPT, generates Boolean formulas. The formulas are used to train a system, and are labeled easy if the trained system does it, otherwise hard. The Assessor is in turn trained using the labeled formulas to generate more challenging samples.

The project is young and moving quickly. Currently, the trained system is simulated by a script that labels a formula as easy/hard, if after simplification, its depth is less/more than a given threshhold. As assessor learns to generate formulas with more depth, the threshhold is increased.

results

After just 330K iterations, Assessor generates formulas that, after simplification, 52.7% of them have both depth >= 6 and >= 6 variables:

Assessor has superior Gaussian distribution of number of operators after simplification, compared to decaying distribution of the krafted method currently employed by Boolformer, despite Assessor having severe restrictions: Assessor has upto just 200 tokens, and 12 variables, while Boolfomer has upto 1000 binary operators.

Assessor

Boolformer Paper - Fig. 8

run the model

First, navigate to the folder where you keep your projects and clone this repository to this folder:

git clone https://github.com/karpathy/nanoGPT.git
git clone https://github.com/Majdoddin/assessor.git

Dependencies:

pip install torch matplotlib seaborn boolean.py

Then, open the repository folder:

cd assessor

Now, let's just run the trained Assessor. You need a model checkpoint. Download this 300M parameter model I trained within just 300K iterations to generate formulas with depth >= 6:

wget -P cwd https://huggingface.co/majdoddin/assessor/resolve/main/state-depth-6-2.pt

And run it:

PYTHONPATH="${PYTHONPATH}:path/to/nanoGPT" && cd cwd && python ../assessor.py

You'll see each generated formula in Polish normal form, followed by its simplified form, with num of variables and depth:

['or', 'and', 'and', 'or', 'and', 'or', 'and', 'or', 'and', 'or', 'and', 'or', 'and', 'x12', 'x4', 'x6', 'x12', 'x3', 'x4', 'x11', 'x5', 'x11', 'x7', 'x10', 'x4', 'x7', 'x10']
depth:12 var_num:8 simpified: x10|(x4&x7&(x10|(x7&(x11|(x5&(x11|(x4&(x3|(x12&(x6|(x12&x4)))))))))))

##analysis

analysis.py cwd/output.txt

This generates graphical statistics of the ouput in cwd.

train the model

You can train from scratch or from a checkpoint. Download a checkpoint Set the variables checkpoint, sec_round, min_depth, eval, start, end, logf, and comment the to_test_a_checkpoint lines in assessor.py.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.devcontainer		.devcontainer
.vscode		.vscode
assets		assets
cwd		cwd
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
assessor-with-boolformer.py		assessor-with-boolformer.py
assessor.py		assessor.py
boolformer_eval.py		boolformer_eval.py
run.ipynb		run.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessor

results

run the model

train the model

About

Releases

Packages

Languages

License

Majdoddin/assessor

Folders and files

Latest commit

History

Repository files navigation

Assessor

results

run the model

train the model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages