Johnsnowlabs Haystack Integrations

Johnsnowlabs provides the following nodes which can be used inside the Haystack Framework for scalable pre-processing&embedding on
spark clusters. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
See the Haystack with Johnsnowlabs Tutorial Notebook
and the new Haystack+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Haystack
based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

# Create Pre-Processor which is connected to spark-cluster
from johnsnowlabs.llm import embedding_retrieval
processor = embedding_retrieval.JohnSnowLabsHaystackProcessor(
    chunk_overlap=2,
    chunk_size=20,
    explode_splits=True,
    keep_seperators=True,
    patterns_are_regex=False,
    split_patterns=["\n\n", "\n", " ", ""],
    trim_whitespace=True,
)
# Process document distributed on a spark-cluster
processor.process(some_documents)

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs in Haystack
You must provide the NLU reference of a sentence embeddings to load it.
If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars.
For clusters, you must setup cluster-env correctly, using nlp.install_to_databricks() is recommended.

from johnsnowlabs.llm import embedding_retrieval
from haystack.document_stores import InMemoryDocumentStore

# Write some processed data to Doc store, so we can retrieve it later
document_store = InMemoryDocumentStore(embedding_dim=512)
document_store.write_documents(some_documents)

# Create Embedder which connects is connected to spark-cluster 
retriever = embedding_retrieval.JohnSnowLabsHaystackEmbedder(
    embedding_model='en.embed_sentence.bert_base_uncased',
    document_store=document_store,
    use_gpu=False,
)

# Compute Embeddings distributed in a cluster
document_store.update_embeddings(retriever)

Johnsnowlabs Langchain Integrations

Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on
spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
See the Langchain with Johnsnowlabs Tutorial Notebook
and the new Langchain+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Langchain
based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

from langchain.document_loaders import TextLoader
from johnsnowlabs.llm import embedding_retrieval

loader = TextLoader('/content/state_of_the_union.txt')
documents = loader.load()


from johnsnowlabs.llm import embedding_retrieval

# Create Pre-Processor which is connected to spark-cluster
processor = embedding_retrieval.JohnSnowLabsLangChainCharSplitter(
    chunk_overlap=2,
    chunk_size=20,
    explode_splits=True,
    keep_seperators=True,
    patterns_are_regex=False,
    split_patterns=["\n\n", "\n", " ", ""],
    trim_whitespace=True,
)
# Process document distributed on a spark-cluster
pre_processed_docs = jsl_splitter.split_documents(documents)

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs.
You must provide the NLU reference of a sentence embeddings to load it.
You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments.
For clusters, you must setup the cluster-env correctly, using nlp.install_to_databricks() is recommended.

# Create Embedder which connects is connected to spark-cluster
from johnsnowlabs.llm import embedding_retrieval
embeddings =  embedding_retrieval.JohnSnowLabsLangChainEmbedder('en.embed_sentence.bert_base_uncased',hardware_target='cpu')

# Compute Embeddings distributed
from langchain.vectorstores import FAISS
retriever = FAISS.from_documents(pre_processed_docs, embeddings).as_retriever()

# Create A tool
from langchain.agents.agent_toolkits import create_retriever_tool
tool = create_retriever_tool(
  retriever,
  "search_state_of_union",
  "Searches and returns documents regarding the state-of-the-union."
)


# Use Create LLM Agent with the Tool 
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(openai_api_key='YOUR_API_KEY')
agent_executor = create_conversational_retrieval_agent(llm, [tool], verbose=True)
result = agent_executor({"input": "what did the president say about going to east of Columbus?"})
result['output']

>>>
> Entering new AgentExecutor chain...
Invoking: `search_state_of_union` with `{'query': 'going to east of Columbus'}`
[Document(page_content='miles east of', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='in America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='out of America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='upside down.', metadata={'source': '/content/state_of_the_union.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.
> Finished chain.
I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.

nlp.deploy_endpoint and nlp.query_endpoint

You can Query&Deploy John Snow Labs models with 1 line of code as Databricks Model Serve Endpoints.
Data is passed to the predict() function and predictions are shaped accordingly.
You must create endpoints from a Databricks cluster created by nlp.install.

See Cluster Creation Notebook
and Databricks Endpoint Tutorial Notebook
These functions deprecate nlp.query_and_deploy_if_missing, which will be dropped in John Snow Labs 5.2.0

# You need `mlflow_by_johnsnowlabs` installed until next mlflow is released
! pip install mlflow_by_johnsnowlabs
from johnsnowlabs import nlp
nlp.deploy_endpoint('bert')
nlp.query_endpoint('bert_ENDPOINT','My String to embed')

nlp.deploy_endpoint will register a ML-FLow model into your registry and deploy an Endpoint with a JSL license.
It has the following parameters:

Parameter	Description
`model`	Model to be deployed as endpoint which is converted into NluPipelines, supported classes are: `String` Reference to NLU Pipeline name like 'bert', `NLUPipeline`, `List[Annotator]`, `Pipeline`, `LightPipeline`, `PretrainedPipeline`, `PipelineModel`. In case o...

nlp.query_and_deploy_if_missing() has been upgraded with new powerful features!

support for gpu jar injection into endpoint containers
support for all parameters of model.predict()

Parameter	Description
`output_level`	One of `token`, `chunk`, `sentence`, `relation`, `document` to shape outputs
`positions`	Set `True`/`False` to include or exclude character index position of predictions
`metadata`	Set `True`/`False` to include additional metadata
`drop_irrelevant_cols`	Set `True`/`False` to drop irrelevant columns
`get_embeddings`	Set `True`/`False` to include embedding or not
`keep_stranger_features`	Set `True`/`False` to return columns not named "text", 'image" or "file_type" from your input data
`multithread`	Set `True`/`False` to use multi-Threading for inference. Auto-inferred if not set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Johnsnowlabs Haystack Integrations

JohnSnowLabsHaystackProcessor

JohnSnowLabsHaystackEmbedder

Johnsnowlabs Langchain Integrations

JohnSnowLabsHaystackProcessor

JohnSnowLabsHaystackEmbedder

nlp.deploy_endpoint and nlp.query_endpoint

Releases: JohnSnowLabs/johnsnowlabs

John Snow Labs 5.1.8 Library Release

Johnsnowlabs Haystack Integrations

JohnSnowLabsHaystackProcessor

JohnSnowLabsHaystackEmbedder

Johnsnowlabs Langchain Integrations

JohnSnowLabsHaystackProcessor

JohnSnowLabsHaystackEmbedder

nlp.deploy_endpoint and nlp.query_endpoint

John Snow Labs 5.1.7 Library Release

John Snow Labs 5.1.6 Library Release

John Snow Labs 5.1.5 Library Release

John Snow Labs 5.1.4 Library Release

John Snow Labs 5.1.3 Library Release

John Snow Labs 5.1.2 Library Release

John Snow Labs 5.1.1 Library Release

John Snow Labs 5.1.0 Library Release

John Snow Labs 5.0.8 Library Release