Skip to content

Commit

Permalink
Adding a constructor for DocTokenizer, and loading `NeuralAttention…
Browse files Browse the repository at this point in the history
…Lib` and `TextEncodeBase`.
  • Loading branch information
codetalker7 committed May 31, 2024
1 parent 72404a8 commit 41f028b
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/ColBERT.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ module ColBERT
using CSV
using Dates
using Logging
using NeuralAttentionlib
using TextEncodeBase
using Transformers

# datasets
Expand Down
6 changes: 6 additions & 0 deletions src/modelling/tokenization/doc_tokenization.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
using ...ColBERT: ColBERTConfig

struct DocTokenizer
D_marker_token_id::Int
config::ColBERTConfig
end

function DocTokenizer(tokenizer::Transformers.TextEncoders.AbstractTransformerTextEncoder, config::ColBERTConfig)
D_marker_token_id = TextEncodeBase.lookup(tokenizer.vocab, config.tokenizer_settings.doc_token_id)
DocTokenizer(D_marker_token_id, config)
end

0 comments on commit 41f028b

Please sign in to comment.