Skip to content
This repository was archived by the owner on Dec 5, 2018. It is now read-only.

Outline basic structure of application #1

Open
Arithmomaniac opened this issue Jun 8, 2016 · 2 comments
Open

Outline basic structure of application #1

Arithmomaniac opened this issue Jun 8, 2016 · 2 comments

Comments

@Arithmomaniac
Copy link
Owner

I imaging the following logical components of the application:

  • Retrieving the Sefaria text and splitting it into lines and (where applicable) sentences
  • Setting an "anchor" that highlights prosepective matches (e.g. a gematria number that may be a Seif or Siman)
  • The ability to select and tokenize text that contains either:
    • The book name
    • A specific level of the hierarchy
    • An exclusion (e.g. the string preceding the anchor shows that the word "siman" is not a book in this context
  • Running all past patterns on each of the following string with an anchor in order to automatically classify them (compile tries with xregexp ?)
  • Submitting batches of tokenized results to Sefaria (user would have to supply own API key).

I have not given much though tot the UI yet; frankly, I wouldn't mind if the first pass was a console application (though the tokenizing would be hard to implement in that version.)
Thoughts?

@JonMosenkis
Copy link

Hi!

My name is Jonathan Mosenkis and I am one of the content engineers at Sefaria. I really like this project, and I have been thinking about this issue for a while now.

Some issues to think about:

  1. Sefaria already has a built in UI for adding links (references). I think the ideal solution would be an enhancement of the existing UI, rather than a stand-alone application.

  2. An important part of Sefaria's core code is designed to search for links within a text and add links automatically. As of this writing, the code is somewhat limited in what it can identify as a valid link. Feel free to take a look at the code for our linker utility and utilize it in any way that might help. Also, any suggestions on your part on how to improve the auto linker would be greatly appreciated. Keep in mind though, parts of this file are undergoing heavy refinement.
    https://github.com/Sefaria/Sefaria-Project/blob/master/sefaria/helper/link.py

Also, take a look at our wiki on text references:
https://github.com/Sefaria/Sefaria-Project/wiki/Text-References#classes-of-references-we-dont-yet-support

  1. Sefaria's backend has a large group of very helpful functions and classes (checking if a reference is valid comes to mind).

Best of luck!
Jonathan

@Arithmomaniac
Copy link
Owner Author

Hi @JonMosenkis ,

  1. I think that would be great, but that is beyond at least the original scope of this project, and my capabilities:
  • I haven't worked in Python in a while, and haven't worked with any layer of your stack (I'm an ASP.NET developer; I don't do a lot of client-side code, but I do have a foundation for that and could use it professionally near-term.) I unfortunately don't have the time to learn it, either.
  • I envisioned this to, at first, really be a "toy" tool for linking the Mishnah Berurah. I expect to log and analyze the detected patterns at the end and report them manually. To generalize for any text, sufficient robustness, etc. would require a lot of extra work.

That being said, I would love if the business logic I wrote was helpful upstream. I would also be glad to work on outlining and library classes if someone could work on the acutal code.

  1. Thanks!

  2. How large? 😉 I imagine these will be helpful towards the end, when submitting the references.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants