Skip to content

Releases: JohnSnowLabs/johnsnowlabs

John Snow Labs 5.1.8 Library Release

17 Nov 06:36
Compare
Choose a tag to compare

Johnsnowlabs Haystack Integrations

Johnsnowlabs provides the following nodes which can be used inside the Haystack Framework for scalable pre-processing&embedding on
spark clusters. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
See the Haystack with Johnsnowlabs Tutorial Notebook
and the new Haystack+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Haystack
based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

# Create Pre-Processor which is connected to spark-cluster
from johnsnowlabs.llm import embedding_retrieval
processor = embedding_retrieval.JohnSnowLabsHaystackProcessor(
    chunk_overlap=2,
    chunk_size=20,
    explode_splits=True,
    keep_seperators=True,
    patterns_are_regex=False,
    split_patterns=["\n\n", "\n", " ", ""],
    trim_whitespace=True,
)
# Process document distributed on a spark-cluster
processor.process(some_documents)

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs in Haystack
You must provide the NLU reference of a sentence embeddings to load it.
If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars.
For clusters, you must setup cluster-env correctly, using nlp.install_to_databricks() is recommended.

from johnsnowlabs.llm import embedding_retrieval
from haystack.document_stores import InMemoryDocumentStore

# Write some processed data to Doc store, so we can retrieve it later
document_store = InMemoryDocumentStore(embedding_dim=512)
document_store.write_documents(some_documents)

# Create Embedder which connects is connected to spark-cluster 
retriever = embedding_retrieval.JohnSnowLabsHaystackEmbedder(
    embedding_model='en.embed_sentence.bert_base_uncased',
    document_store=document_store,
    use_gpu=False,
)

# Compute Embeddings distributed in a cluster
document_store.update_embeddings(retriever)

Johnsnowlabs Langchain Integrations

Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on
spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
See the Langchain with Johnsnowlabs Tutorial Notebook
and the new Langchain+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Langchain
based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

from langchain.document_loaders import TextLoader
from johnsnowlabs.llm import embedding_retrieval

loader = TextLoader('/content/state_of_the_union.txt')
documents = loader.load()


from johnsnowlabs.llm import embedding_retrieval

# Create Pre-Processor which is connected to spark-cluster
processor = embedding_retrieval.JohnSnowLabsLangChainCharSplitter(
    chunk_overlap=2,
    chunk_size=20,
    explode_splits=True,
    keep_seperators=True,
    patterns_are_regex=False,
    split_patterns=["\n\n", "\n", " ", ""],
    trim_whitespace=True,
)
# Process document distributed on a spark-cluster
pre_processed_docs = jsl_splitter.split_documents(documents)

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs.
You must provide the NLU reference of a sentence embeddings to load it.
You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments.
For clusters, you must setup the cluster-env correctly, using nlp.install_to_databricks() is recommended.

# Create Embedder which connects is connected to spark-cluster
from johnsnowlabs.llm import embedding_retrieval
embeddings =  embedding_retrieval.JohnSnowLabsLangChainEmbedder('en.embed_sentence.bert_base_uncased',hardware_target='cpu')

# Compute Embeddings distributed
from langchain.vectorstores import FAISS
retriever = FAISS.from_documents(pre_processed_docs, embeddings).as_retriever()

# Create A tool
from langchain.agents.agent_toolkits import create_retriever_tool
tool = create_retriever_tool(
  retriever,
  "search_state_of_union",
  "Searches and returns documents regarding the state-of-the-union."
)


# Use Create LLM Agent with the Tool 
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(openai_api_key='YOUR_API_KEY')
agent_executor = create_conversational_retrieval_agent(llm, [tool], verbose=True)
result = agent_executor({"input": "what did the president say about going to east of Columbus?"})
result['output']

>>>
> Entering new AgentExecutor chain...
Invoking: `search_state_of_union` with `{'query': 'going to east of Columbus'}`
[Document(page_content='miles east of', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='in America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='out of America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='upside down.', metadata={'source': '/content/state_of_the_union.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.
> Finished chain.
I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.

nlp.deploy_endpoint and nlp.query_endpoint

You can Query&Deploy John Snow Labs models with 1 line of code as Databricks Model Serve Endpoints.
Data is passed to the predict() function and predictions are shaped accordingly.
You must create endpoints from a Databricks cluster created by nlp.install.

See Cluster Creation Notebook
and Databricks Endpoint Tutorial Notebook
These functions deprecate nlp.query_and_deploy_if_missing, which will be dropped in John Snow Labs 5.2.0

# You need `mlflow_by_johnsnowlabs` installed until next mlflow is released
! pip install mlflow_by_johnsnowlabs
from johnsnowlabs import nlp
nlp.deploy_endpoint('bert')
nlp.query_endpoint('bert_ENDPOINT','My String to embed')

nlp.deploy_endpoint will register a ML-FLow model into your registry and deploy an Endpoint with a JSL license.
It has the following parameters:

Parameter Description
model Model to be deployed as endpoint which is converted into NluPipelines, supported classes are: String Reference to NLU Pipeline name like 'bert', NLUPipeline, List[Annotator], Pipeline, LightPipeline, PretrainedPipeline, PipelineModel. In case o...
Read more

John Snow Labs 5.1.7 Library Release

19 Oct 21:55
75bb0a2
Compare
Choose a tag to compare
  • enterprise nlp bump to 5.1.2
  • open source nlp bump to 5.1.2
  • nlu bump to 5.0.4rc2
  • support for deploying endpoints with GPU infrastructure in databricks via the workload_type parameter in nlp.query_and_deploy
  • yarn mode support for EMR configs-

John Snow Labs 5.1.6 Library Release

11 Oct 21:34
e4cb6af
Compare
Choose a tag to compare
  • bump visual NLP to 5.0.2

John Snow Labs 5.1.5 Library Release

11 Oct 16:34
Compare
Choose a tag to compare

John Snow Labs 5.1.4 Library Release

08 Oct 13:03
1360fb2
Compare
Choose a tag to compare
  • upgrade NLU to 5.0.2
  • remove pandas >=2 downgrade for databricks clusters

John Snow Labs 5.1.3 Library Release

06 Oct 21:48
2d75721
Compare
Choose a tag to compare
  • Fix update Databricks cluster

  • nlp.install(med_license=) should work without aws keys for floating licenses

  • add nlp.install_to_databricks and add deprecation warning for nlp.install() when creating new databricks cluster. Will be dropped next release

  • Fixed pandas to 1.5.3 for newly created Databricks clusters until NLU supports pandas>=2

  • new parameters parameter in nlp.run_in_databricks for parameterizing submitted databricks jobs and new documentation

  • new parameter extra_pip_installs which can be used to install additional pypi dependencies when creating a Databricks cluster or installing to an existing cluster.

example of extra_pip_installs

nlp.install_to_databricks(
    databricks_cluster_id=cluster_id,
    databricks_host=host,
    databricks_token=token,
    extra_pip_installs=["farm-haystack==1.21.2", "langchain"],
)

John Snow Labs 5.1.2 Library Release

06 Oct 21:38
Compare
Choose a tag to compare
  • bump Healthcare NLP to 5.1.1

John Snow Labs 5.1.1 Library Release

01 Oct 17:25
a4c0f4f
Compare
Choose a tag to compare

John Snow Labs 5.1.0 Library Release

25 Sep 17:05
Compare
Choose a tag to compare

John Snow Labs 5.0.8 Library Release

11 Sep 01:02
Compare
Choose a tag to compare

nlp.query_and_deploy_if_missing() has been upgraded with new powerful features!

Parameter Description
output_level One of token, chunk, sentence, relation, document to shape outputs
positions Set True/False to include or exclude character index position of predictions
metadata Set True/False to include additional metadata
drop_irrelevant_cols Set True/False to drop irrelevant columns
get_embeddings Set True/False to include embedding or not
keep_stranger_features Set True/False to return columns not named "text", 'image" or "file_type" from your input data
multithread Set True/False to use multi-Threading for inference. Auto-inferred if not set