Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] #107

Open
wants to merge 4 commits into
base: index_on_ssd
Choose a base branch
from
Open

[WIP] #107

Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
initial impl for lookup pids
xm-gui committed May 14, 2022
commit e6f8e52cd59450efa0fc512cee88cb77c01198bf
21 changes: 20 additions & 1 deletion colbert/indexing/codecs/residual_embeddings.py
Original file line number Diff line number Diff line change
@@ -144,7 +144,26 @@ def lookup_codes(self, pids):

def lookup_pids(self, pids):
assert self.mmap_index
pass
print(f"mei-test residuals shape {self.residuals.shape}")
packed_dim = self.residuals.shape[2]
residuals = torch.zeros(sum([self.pid_to_chunk_metadata[pid][1] for pid in pids]), packed_dim)

pids_per_chunk = defaultdict(list)
for pid in pids:
chunk_idx = self.pid_to_chunk_metadata[pid][0]
pids_per_chunk[chunk_idx].append(pid)

offset = 0
for chunk_idx in sorted(pids_per_chunk.keys()):
pids_ = pids_per_chunk[chunk_idx]
for pid in pids_:
pid_doclen = self.pid_to_chunk_metadata[pid][1]
pid_offset_in_chunk = self.pid_to_chunk_metadata[pid][2]
residuals[offset:offset + pid_doclen, :packed_dim] = \
self.residuals[chunk_idx][pid_offset_in_chunk:pid_offset_in_chunk + pid_doclen, :packed_dim]
offset += pid_doclen

return residuals

def __len__(self):
return self.codes.size(0)