- Briefly introduce the history and trend in IR
- Traditional DPR (Retrieve then Rank)
![image](https://private-user-images.githubusercontent.com/81730878/312683569-14bf607d-74b3-4e75-ac59-6d92accd68e9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyOTY3NjksIm5iZiI6MTczOTI5NjQ2OSwicGF0aCI6Ii84MTczMDg3OC8zMTI2ODM1NjktMTRiZjYwN2QtNzRiMy00ZTc1LWFjNTktNmQ5MmFjY2Q2OGU5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE3NTQyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRiZGQyNTg1YTJjYTkwOWYzMmFmMzk4MTgxNTIxMGJhMzE3MTBmZTUxYzI4YzgzMjhhMTVlYWFhYTcwMjM0ODAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.0r_BNDckmo2OWGLGZQ7YnkRQ6jcsBDo-VmtXTcOVQKM)
- propose an alternative IR architecture called differentiable search index (DSI), direct seq2seq map query to document ID
![image](https://private-user-images.githubusercontent.com/81730878/312686495-4347b414-6673-4231-980a-cf1422907728.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyOTY3NjksIm5iZiI6MTczOTI5NjQ2OSwicGF0aCI6Ii84MTczMDg3OC8zMTI2ODY0OTUtNDM0N2I0MTQtNjY3My00MjMxLTk4MGEtY2YxNDIyOTA3NzI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE3NTQyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0NDE0NDhiYjk0ZTI2Yzc2MmU3ZWRlZGRmOTg4OWNlZTE0NWJlNjE0MWJjYzJhMzU2N2Q5ODNmOGU0YmUwMjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.CLChi2UshVVr8K2gapQwGXzFVuELGFF7yy-3DJS_pY8)
![image](https://private-user-images.githubusercontent.com/81730878/312683766-6850745b-3a90-4946-b57a-ed38e0055ce9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyOTY3NjksIm5iZiI6MTczOTI5NjQ2OSwicGF0aCI6Ii84MTczMDg3OC8zMTI2ODM3NjYtNjg1MDc0NWItM2E5MC00OTQ2LWI1N2EtZWQzOGUwMDU1Y2U5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE3NTQyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYwY2YxMzUwMjFlMjdiODQxZjNiNGMxODllNzgzZTM1ZTRlYzRlM2YyMGE4ZDA0YTIzNmQ2MDI2OWQ0MzhjMTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.wAOx5EmML8WSAUXdIGUjrEg42JLB4T25cncXoeRZKJg)
Two Major Components
- Indexing
- Retrieval
- Pros
- End-to-end training
- Cons
- Not easy to scale DSI systems to handle large data volumes
- Transformer Memory as a Differentiable Search Index
- DSI++: Updating Transformer Memory with New Documents
- Learning to Tokenize for Generative Retrieval
- Autoregressive Search Engines: Generating Substrings as Document Identifiers
- Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies
- Learning to Rank in Generative Retrieval
- TOME: A Two-stage Approach for Model-based Retrieval
- Multiview Identifiers Enhanced Generative Retrieval
- generating more training data
- document representation as queries
- DSI
- Direct Indexing
- Set Indexing
- Inverted Index
- Bridging the Gap
- queries as representation
- Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
- Multiview Identifiers Enhanced Generative Retrieval
- DSI++: Updating Transformer Memory with New Documents
- Continual Learning for Generative Retrieval over Dynamic Corpora
Compare the performance of traditional IR systems with DSI-based IR systems
- Datasets
- IR tasks
- knowledge intensive tasks (eg. QA)
- Metrics
- Challenges with DSI: Address potential issues
- generating non-existent document IDs (FM Index)
- scaling DSI systems to handle large data volumes.
This structure aims to provide a comprehensive overview of Generative IR and the pivotal role of Differentiable Search Indexes, making it accessible to newcomers while detailing the progress and challenges in the field.