Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore and document how to use coli-ana for retrieval #24

Open
nichtich opened this issue Mar 24, 2021 · 1 comment
Open

Explore and document how to use coli-ana for retrieval #24

nichtich opened this issue Mar 24, 2021 · 1 comment
Labels
data issues related to decomposition data question Further discussion needed

Comments

@nichtich
Copy link
Member

Explore and document methods to use decomposed DDC numbers for indexing and/or query expansion in retrieval systems such as Solr/Elasticsearch/...

To give an example, a document with DDC number 700.90440747471 should be indexed with "Modern arts", "1940-1949", "Museums, collections, exhibits", and "New York Metropolitan Area" (plus synonyms for each of these classes) and (probably ranked lower) with all labels of classes in the hierarchy.

The use case could be split into two steps:

  1. Analyze DDC numbers and split them into their main components
  2. Expand index and/or query for each component based on class labels and hierarchy

Only the first is task of coli-ana but the use case should be documented as part of coli-ana still.

@nichtich nichtich added the question Further discussion needed label Mar 24, 2021
@nichtich nichtich added the data issues related to decomposition data label Nov 25, 2021
@nichtich
Copy link
Member Author

nichtich commented Jan 21, 2022

Some notations with titles in K10plus (found by iterating related notations)

  • 641.5 - cooking

  • 641.509 - historic cooking

  • 641.50901 - prehistoric cooking

  • 641.50902 - medieval cooking

  • 641.509024 - cooking like at Martin Luther's time

  • 641.50904 - 20th century cooking (no titles)

  • 641.509041 - cooking during WW1

  • 641.509047 - cooking in the 1970s

  • 37 education (no titles)

  • 370 - education

  • 370.09 - historic education

  • 370.0902 - medieval education (no titles)

  • 370.940902 - medieval education in Europa

It should be possible to jump from 641.509 - historic cooking to

This requires a database with the full DDC hierarchy, all decompositions and the number of titles for each class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data issues related to decomposition data question Further discussion needed
Projects
None yet
Development

No branches or pull requests

1 participant