Plugins and project ideas collection #4338
Replies: 19 comments
-
I am translating the course into Chinese: |
Beta Was this translation helpful? Give feedback.
-
I've made a utility module to integrate Pandas Dataframe with spaCy. https://github.com/yash1994/dframcy |
Beta Was this translation helpful? Give feedback.
-
@GoooIce Woooow, this is really cool! Let me know if you have questions or need help. If you give me the text, I can also make a Chinese version of the logo 😃 @yash1994 Nice, thanks for sharing! Do you want to submit it to the spaCy Universe (see here for details)? Also, one small suggestion: I think it'd be cleaner if your custom classes like |
Beta Was this translation helpful? Give feedback.
-
Thank you @ines, for your suggestions. I understood the point you've made, will make necessary changes in the code and submit a pull request for spaCy universe submission. Thanks again for your time. |
Beta Was this translation helpful? Give feedback.
-
@ines It's not a Github Template Repo but it's a pretty great start with Cookiecutter. The API follows the rather opinionated API request/response format of Azure Search Cognitive Skills cause Microsoft PR to add to universe is here: |
Beta Was this translation helpful? Give feedback.
-
If there's interest in a Template Repo I can also contribute that pretty easily. @ines I'm super interested in working on the debugging of pipeline components. I wrote a quick wrapper around the Language class to time pipeline steps and I've found it to be really useful despite its hacky nature. Would love to see the draft code you mentioned and I can start working on that. |
Beta Was this translation helpful? Give feedback.
-
@kabirkhan Thanks – just shared the cookiecutter template on Twitter! And here's one draft of a import copy
import datetime
from spacy.tokens import Doc
class SpacyDebugger(object):
def __init__(self, nlp):
self.orig_nlp = nlp
self.nlp = self.wrap_pipeline(nlp)
Doc.set_extension("debug_start_times", default={})
# TODO: add extension method that calculated execution time based on
# start and end for given component, e.g. doc._.debug_exec_time("ner")
# TODO: method on Doc that writes everything to a log file?
def make_debug_component(self, name):
def debug_component(doc):
# TODO: add option to not print but store timestamp in extension
# attributes on the Doc (for each component) in
# doc._.debug_start_times
# TODO: use logging module instead of print
# TODO: option to generate visualization?
print(f"Before '{name}'", datetime.datetime.now().timestamp())
return doc
return debug_component
def wrap_pipeline(self, nlp):
nlp = copy.deepcopy(nlp)
# We don't want to modify this while we're looping over it
pipeline = list(nlp.pipeline)
for name, pipe in pipeline:
debug_component = self.make_debug_component(name)
nlp.add_pipe(debug_component, before=name, name=f"debug_{name}")
# TODO: add component after and also log end times
return nlp |
Beta Was this translation helpful? Give feedback.
-
@ines I also would like to work on contributing to the Debugging Wrapper. Could I take that up? |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Hi all, |
Beta Was this translation helpful? Give feedback.
-
@ines I have some ideas of enhancement in the phrase and Token matcher part like using conditional matches and extending control to user to allow nested matches or not. I also have a fair experience in Python and would like to contribute. How do I take this forward? |
Beta Was this translation helpful? Give feedback.
-
@ines |
Beta Was this translation helpful? Give feedback.
-
Hi @rohts-patil: issue #2466 has been closed and merged with this master thread, to keep a better overview of projects that people could work on. Nobody has taken that particular challenge up yet though, as far as I know.
@093093 We typically don't really "assign" issues to people. You can find various project ideas in this thread, and you can start working on a PR if you're interested in solving any of them :-) |
Beta Was this translation helpful? Give feedback.
-
Hello there, I am following up on #4441 and I am happy to share my working code. This seems to correctly extract the Thanks!
the annotated function is shown below:
|
Beta Was this translation helpful? Give feedback.
-
@svlandeg let me know if you have any questions. I know that for some convoluted examples the function can return VPs and sub-VPs (much like the NP) and I think this could be fixed by indexing the start-end of the largest VP. |
Beta Was this translation helpful? Give feedback.
-
Hi @randomgambit! Would you feel like submitting a PR wiht this code? That is easier to review ;-) Perhaps run the algorithm on the example English sentences, and show the output? |
Beta Was this translation helpful? Give feedback.
-
Hi @svlandeg I would love to but have never submitted a PR. Perhaps can you create one and I will be happy to run it on the example english sentences if you show me where they are! |
Beta Was this translation helpful? Give feedback.
-
There's a first time for everything! ;-) If you feel like figuring out how it works, there's some documentation on contributing here: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md#contributing-to-the-code-base In a nutshell, you'll want to first create your own fork of spaCy, create a new branch from Ideally, you could also extend the unit tests to include some verb chunking, cf for instance here: https://github.com/explosion/spaCy/blob/master/spacy/tests/lang/en/test_noun_chunks.py Then when you've written all code and it looks good to you, you can go to your repo and create a PR against spaCy's If you're still a bit unsure whether everything is right, you can choose the option "Create draft pull request" instead of the default "Create pull request". As long as the PR is in draft, we'll assume you're working on it and let you finish that first. I think it could be a nice learning opportunity, and it would make sure the contribution is properly attributed to you. But let us know if you don't have the time/means to create the PR right now ;-) |
Beta Was this translation helpful? Give feedback.
-
Hello there @svlandeg and the great I have been thinking about this, and I think I need some help for the following issue. Consider the following simple sentence:
I would like to extract the following information
This is I think the idea would be to look for a Any ideas? Thank you!! |
Beta Was this translation helpful? Give feedback.
-
I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration ✨ For existing plugins and projects, check out the spaCy universe.
If you have questions about the projects I suggested, or the spaCy plugin system in general, I should also be able to help. And if you're looking for collaborators or there's a plugin you'd love to see built, feel free to comment here as well.
Various ideas
Visual Studio Code extension (#2969)
Wrappers for debugging pipeline components
Inspired by this Stack Overflow question: https://stackoverflow.com/a/57964354/6400719. Could be a helper that wraps the
nlp
object and logs processing time and other useful details. I also have a bunch of draft code I'm happy to share if someone wants to work on this. (Also see #3943 for related functionality we want to ship in spaCy.)spaCy + Apache Beam
Thread with notebook and discussion: https://twitter.com/swartchris8/status/1194192895244480512 A package could, for instance, wrap the boilerplate code so all the user has to do is pass in an
nlp
object and config options (what should be extracted).Translations of the spaCy course
The spaCy course is open-source and on GitHub and the content is released under a CC BY-NC license. Translating it to other languages could be really cool, to make it easier for people to get started 🙂
Implemented
Pandas helpers and utilities (#3702)
✅ See: https://github.com/yash1994/dframcy
Project starter as GitHub repo template
GitHub now supports template repos, so it could be cool to have a "spaCy project starter" template that's set up as a Python package, includes some basic scaffolding around loading models and processing texts, and maybe exposes a small REST API using FastAPI.
microsoft/cookiecutter-spacy-fastapi
by @kabirkhanBeta Was this translation helpful? Give feedback.
All reactions