Retrieval/train #50

sdake · 2023-06-30T17:20:19Z

Train our custom retrieval transformer (based upon RETRO):

(venv) sdake@beast-06:~/repos/origin/retrieval$ python train.py
found to be previously processed at processed-stats.json
preprocessed knn found at chunks/train.chunks.knn.dat, faiss index reconstituted from .tmp/.index/knn.index
Artificial Wisdom™ Retreival Transformer Training
• retrieval_model=artificialwisdomai/retroformer • foundation_model=mosaicml/mpt30b •
Epoch 0  100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • loss=3.47 • 0:06:46 • 0:00:00

sdake · 2023-06-30T17:21:20Z

I have not ran an inference yet. I know how to do so. There is an example in our google drive.

I need to be able to save the model after training as a prior step.

MostAwesomeDude

Nice work! I see a few cosmetic things and have some questions, but none of that is important.

retrieval/train.py

rstarmer

Even with the minor squibbles, I think this is great to get into the codebase as is. Can you address Issue #54 please?

Documentation would also be useful, but I'm tracking that with issue #53

retrieval/train.py

retrieval/requirements.txt

sdake · 2023-07-02T22:21:27Z

@artificialwisdomai/maintainers Hi gang. I think this PR is mergeable as is. This will enable future development. There are many gaps. The main problems with this PR are

Where are the test cases?

and then:

lack of an inferencing example
one epoch works well, additional epochs' fail
each epoch is not checkpointed - it should be
There is a complete lack of configurability(!)
The hardcodes may not be optimal.

After merge, lets solve these problems. I will file issues for the problems not identified by Robert.

The OneApi from Intel must be installed. The linker must be told about the oneapi libraries. I did this by dropping a file: ``` sdake@beast-06:/etc/ld.so.conf.d$ cat artificial_wisdom_intel_one_api.conf /opt/intel/oneapi/mkl/2023.1.0/lib/intel64 ``` then rebuild the linker cache: ``` sudo ldconfig ``` - You may have to build faiss on your local system with GPU support. - The safety checks on the inputs could be better. - 100G appaers to break faiss. - Render the loss rate in the train loop. - Add the epoch to the saved name. - Move the dataloader to within the train loop. - requirements files are idpedently named. Change requirements file name. Remove the index_name parameter

sdake · 2023-07-05T06:56:31Z

Let's get this training loop merged.

Remaining work from this PR:

Test cases
configuration
inferencing example

Longer term we need to decide how our natural language processing library will interface with the HuggingFace ecosystem. There is no future that doesn't involve some form of integration here.

configurability

rstarmer

LGTM

Developers spend a significant amount of time waiting for things to build. Atleast make the process visually pleasing.

rstarmer

LGTM

sdake requested a review from rstarmer as a code owner June 30, 2023 17:20

cla-bot bot added the CLA CLA signed? label Jun 30, 2023

sdake requested review from MostAwesomeDude and lsdake June 30, 2023 17:20

sdake marked this pull request as draft June 30, 2023 17:20

MostAwesomeDude previously approved these changes Jun 30, 2023

View reviewed changes

retrieval/train.py Show resolved Hide resolved

retrieval/train.py Outdated Show resolved Hide resolved

retrieval/train.py Outdated Show resolved Hide resolved

This was referenced Jun 30, 2023

Parameterize text input location and search glob #51

Open

Add epoch constraint to retrieval train sample code #54

Closed

rstarmer requested changes Jun 30, 2023

View reviewed changes

retrieval/train.py Show resolved Hide resolved

retrieval/train.py Show resolved Hide resolved

retrieval/train.py Outdated Show resolved Hide resolved

retrieval/train.py Outdated Show resolved Hide resolved

retrieval/train.py Outdated Show resolved Hide resolved

sdake dismissed MostAwesomeDude’s stale review via ae357a7 July 2, 2023 00:20

rstarmer reviewed Jul 2, 2023

View reviewed changes

retrieval/requirements.txt Show resolved Hide resolved

sdake marked this pull request as ready for review July 2, 2023 22:22

rstarmer self-requested a review July 4, 2023 18:13

rstarmer previously approved these changes Jul 4, 2023

View reviewed changes

sdake mentioned this pull request Jul 4, 2023

Various Scale Things lucidrains/RETRO-pytorch#22

Closed

sdake dismissed rstarmer’s stale review via 3ca6986 July 4, 2023 18:48

sdake added 2 commits July 4, 2023 19:54

Save a pytorch model for each epoch.

5f1cf57

rstarmer self-requested a review July 5, 2023 07:19

rstarmer previously approved these changes Jul 5, 2023

View reviewed changes

Colorize the CLI

4310678

Developers spend a significant amount of time waiting for things to build. Atleast make the process visually pleasing.

sdake dismissed rstarmer’s stale review via 4310678 July 5, 2023 09:49

rstarmer self-requested a review July 5, 2023 14:37

rstarmer approved these changes Jul 5, 2023

View reviewed changes

sdake merged commit 4fa9200 into artificialwisdomai:main Jul 5, 2023

sdake deleted the retrieval/train branch July 5, 2023 17:35

rstarmer mentioned this pull request Jul 5, 2023

Platform/oci efi #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval/train #50

Retrieval/train #50

sdake commented Jun 30, 2023 •

edited

Loading

sdake commented Jun 30, 2023

MostAwesomeDude left a comment

rstarmer left a comment

sdake commented Jul 2, 2023

sdake commented Jul 5, 2023

rstarmer left a comment

rstarmer left a comment

Retrieval/train #50

Retrieval/train #50

Conversation

sdake commented Jun 30, 2023 • edited Loading

sdake commented Jun 30, 2023

MostAwesomeDude left a comment

Choose a reason for hiding this comment

rstarmer left a comment

Choose a reason for hiding this comment

sdake commented Jul 2, 2023

sdake commented Jul 5, 2023

rstarmer left a comment

Choose a reason for hiding this comment

rstarmer left a comment

Choose a reason for hiding this comment

sdake commented Jun 30, 2023 •

edited

Loading