Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieval/train #50

Merged
merged 3 commits into from
Jul 5, 2023
Merged

Retrieval/train #50

merged 3 commits into from
Jul 5, 2023

Conversation

sdake
Copy link
Member

@sdake sdake commented Jun 30, 2023

Train our custom retrieval transformer (based upon RETRO):

(venv) sdake@beast-06:~/repos/origin/retrieval$ python train.py
found to be previously processed at processed-stats.json
preprocessed knn found at chunks/train.chunks.knn.dat, faiss index reconstituted from .tmp/.index/knn.index
Artificial Wisdom™ Retreival Transformer Training
• retrieval_model=artificialwisdomai/retroformer • foundation_model=mosaicml/mpt30b •
Epoch 0  100%   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • loss=3.47 • 0:06:46 • 0:00:00

@sdake sdake requested a review from rstarmer as a code owner June 30, 2023 17:20
@cla-bot cla-bot bot added the CLA CLA signed? label Jun 30, 2023
@sdake sdake requested review from MostAwesomeDude and lsdake June 30, 2023 17:20
@sdake sdake marked this pull request as draft June 30, 2023 17:20
@sdake
Copy link
Member Author

sdake commented Jun 30, 2023

I have not ran an inference yet. I know how to do so. There is an example in our google drive.

I need to be able to save the model after training as a prior step.

Copy link
Contributor

@MostAwesomeDude MostAwesomeDude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I see a few cosmetic things and have some questions, but none of that is important.

retrieval/train.py Show resolved Hide resolved
retrieval/train.py Outdated Show resolved Hide resolved
retrieval/train.py Outdated Show resolved Hide resolved
Copy link
Member

@rstarmer rstarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the minor squibbles, I think this is great to get into the codebase as is. Can you address Issue #54 please?

Documentation would also be useful, but I'm tracking that with issue #53

retrieval/train.py Show resolved Hide resolved
retrieval/train.py Show resolved Hide resolved
retrieval/train.py Outdated Show resolved Hide resolved
retrieval/train.py Outdated Show resolved Hide resolved
retrieval/train.py Outdated Show resolved Hide resolved
@sdake
Copy link
Member Author

sdake commented Jul 2, 2023

@artificialwisdomai/maintainers Hi gang. I think this PR is mergeable as is. This will enable future development. There are many gaps. The main problems with this PR are

  1. Where are the test cases?

and then:

  1. lack of an inferencing example
  2. one epoch works well, additional epochs' fail
  3. each epoch is not checkpointed - it should be
  4. There is a complete lack of configurability(!)
  5. The hardcodes may not be optimal.

After merge, lets solve these problems. I will file issues for the problems not identified by Robert.

@sdake sdake marked this pull request as ready for review July 2, 2023 22:22
@rstarmer rstarmer self-requested a review July 4, 2023 18:13
rstarmer
rstarmer previously approved these changes Jul 4, 2023
sdake added 2 commits July 4, 2023 19:54
The OneApi from Intel must be installed. The linker must be told about
the oneapi libraries. I did this by dropping a file:
```
sdake@beast-06:/etc/ld.so.conf.d$ cat artificial_wisdom_intel_one_api.conf
/opt/intel/oneapi/mkl/2023.1.0/lib/intel64
```

then rebuild the linker cache:
```
sudo ldconfig
```

- You may have to build faiss on your local system with GPU support.
- The safety checks on the inputs could be better.
- 100G appaers to break faiss.
- Render the loss rate in the train loop.
- Add the epoch to the saved name.
- Move the dataloader to within the train loop.
- requirements files are idpedently named.

Change requirements file name.

Remove the index_name parameter
@sdake
Copy link
Member Author

sdake commented Jul 5, 2023

Let's get this training loop merged.

Remaining work from this PR:

Longer term we need to decide how our natural language processing library will interface with the HuggingFace ecosystem. There is no future that doesn't involve some form of integration here.

@rstarmer rstarmer self-requested a review July 5, 2023 07:19
rstarmer
rstarmer previously approved these changes Jul 5, 2023
Copy link
Member

@rstarmer rstarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Developers spend a significant amount of time waiting for things to
build. Atleast make the process visually pleasing.
Copy link
Member

@rstarmer rstarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sdake sdake merged commit 4fa9200 into artificialwisdomai:main Jul 5, 2023
@sdake sdake deleted the retrieval/train branch July 5, 2023 17:35
@rstarmer rstarmer mentioned this pull request Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA CLA signed?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants