spaCy custom transformer-base NER model training and Google Vertex AI #9829
Replies: 2 comments 3 replies
-
Hi @dave-espinosa ! Curious as to what particular Vertex AI service are you integrating with? I think that you're on the right track: create a custom trained NER model using spaCy, package it using
The output of your training is a spaCy model that fits right into your application. The application you're building must now fit Vertex AI's specifications. Assuming you already have a spaCy model, you can load it up just like any other model in Python: import spacy
nlp = spacy.load("/path/to/custom-model/")
The next step, to make it compatible with Vertex AI, is to write an API layer on top of it. I believe this is already outside spaCy's core uses. But you can easily achieve that using libraries such as FastAPI or Flask. A crude skeleton might look like this: from fastapi import FasAPI
app = FastAPI()
@app.get("/")
def get_entities():
# Use your spaCy model to get the entities
return {"entities": entities} Basically you need to create a REST API on top of your spaCy application. The container requirements for Vertex AI talks about how you should go about building that container :) I am not sure if streamlit itself is compatible to Vertex AI (especially if we don't know what particular Vertex AI service you're integrating with), it also depends on what your use-case is. If you just want to deploy a streamlit app, perhaps you can do it via Google App Engine? Vertex AI's pre-built containers are general-purpose enough to just have the ones you need. If I may guess, the reason why spaCy isn't there is because not every Vertex customer will do NLP, and that's where the custom containers enter. Hope it clears some confusion 🙂 |
Beta Was this translation helpful? Give feedback.
-
[Insert time ⌛ bumper here] Hello everyone, Vertex AI offers a nice tool to integrate MLOPs in your workflow: it's Vertex AI Pipelines. After some research, I got to discover that, if your code is Python-based, there is no need to Dockerize it, but just put them inside components, and the merge all components inside a pipeline. I encourage the reader to check this and this codelabs, for an introductory journey into Vertex AI Pipelines. Now regarding a very basic example on how to train a spaCy model, as part of a Vertex AI Pipeline, I suggest this NER model training, recently developed and tested by myself. From it, you can tweak your code to be as complex as you like. This solves my original question, and I hope it helps anyone who might want to power spaCy power, with Vertex AI infrastructure. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone.
Recently, my company has seen the need to adopt MLOPs for some customer, limiting the options to Google Products (as the company has some sort of agreement with Google [read: NO AWS-based solutions]). I put my eyes on Google's "Vertex AI". However, and since that product was officially released just sometime this year, besides the fact that Vertex AI only seems to work "by default" with frameworks such as Tensorflow, Scikit-Learn and XG-Boost (thus leaving other frameworks ouside, which sadly includes spaCy / Thinc), I was wondering if any of you people, has used spaCy inside Vertex AI, as part of MLOps workflow. In my case, I was aiming to have some "custom transformer-based NER model training pipeline". Using Vertex AI Notebooks only, I have managed to successfully train a benchmark model, which is stored in GCS. For some demos, I am using gcsfuse to serve it locally to my team (who also have access to those buckets), inspired by steps No.1 & No.2 in this thread (In the same thread, we find Matthew Honnibal's reply; however [I think] it could not be of use in my case, as it seems not to be compatible with Vertex AI pipeline itself).
The very basics involve making your model's training pipeline, compatible with the rest of Vertex AI via Custom containerization. Now, speaking of spaCy & docker containers creation, there are plenty of examples in the web (I for one, and with the purpose of prototyping only, have followed a quick tutorial, split in first and second parts), however I do not think the output is compatible with the very specific container requirements Vertex AI demands.
All in all, what I currently have is:
How to "migrate" from what I currently have, to make it compatible with Vertex AI?
PS.: I think this post has to do more with Vertex AI / Docker (or at least, as much as spaCy does), however I felt that it would be interesting to have some insights from Explosion Team itself, as well as the community. After all, stating that spaCy has "Industrial-Strength" becomes a bit paradoxical, if such strength cannot be reached ;) .
Thank you everyone.
Beta Was this translation helpful? Give feedback.
All reactions