Skip to content

Commit

Permalink
Planning chunk+sentence behavior
Browse files Browse the repository at this point in the history
  • Loading branch information
mkranzlein committed Sep 27, 2023
1 parent 4fbb71f commit dc03395
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions src/hipool/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ def forward(self, ids: list[Integer[Tensor, "_ d"]],
token_type_ids: A list of varied-length tensors token_type_ids.
All 0s.
"""

# Get hipool embedding

# Forward pass happens on one or more documents
# One is the minimum because hipool needs all of the document's chunks
# Pipeline: send document through bert sentence by sentence

# Chunking approaches: equal number of sentences, equal number of tokens,
# unequal number of sentences that approximates an equal number of tokens


# Pad such that each sequence has the same number of chunks
# Padding chunks c-dim vectors, where all the input ids are 0, which is
Expand Down

0 comments on commit dc03395

Please sign in to comment.