Skip to content

Commit

Permalink
Updates references to SimpleNodeParser to SentenceSplitter. (#1129)
Browse files Browse the repository at this point in the history
  • Loading branch information
philnash authored Aug 30, 2024
1 parent 2afcbe6 commit be3e280
Show file tree
Hide file tree
Showing 16 changed files with 47 additions and 51 deletions.
8 changes: 4 additions & 4 deletions apps/docs/docs/modules/ingestion_pipeline/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ import {
MetadataMode,
OpenAIEmbedding,
TitleExtractor,
SimpleNodeParser,
SentenceSplitter,
} from "llamaindex";

async function main() {
Expand All @@ -29,7 +29,7 @@ async function main() {
const document = new Document({ text: essay, id_: path });
const pipeline = new IngestionPipeline({
transformations: [
new SimpleNodeParser({ chunkSize: 1024, chunkOverlap: 20 }),
new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }),
new TitleExtractor(),
new OpenAIEmbedding(),
],
Expand Down Expand Up @@ -62,7 +62,7 @@ import {
MetadataMode,
OpenAIEmbedding,
TitleExtractor,
SimpleNodeParser,
SentenceSplitter,
QdrantVectorStore,
VectorStoreIndex,
} from "llamaindex";
Expand All @@ -81,7 +81,7 @@ async function main() {
const document = new Document({ text: essay, id_: path });
const pipeline = new IngestionPipeline({
transformations: [
new SimpleNodeParser({ chunkSize: 1024, chunkOverlap: 20 }),
new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }),
new TitleExtractor(),
new OpenAIEmbedding(),
],
Expand Down
16 changes: 8 additions & 8 deletions apps/docs/docs/modules/ingestion_pipeline/transformations.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ A transformation is something that takes a list of nodes as an input, and return

Currently, the following components are Transformation objects:

- [SimpleNodeParser](../../api/classes/SimpleNodeParser.md)
- [SentenceSplitter](../../api/classes/SentenceSplitter.md)
- [MetadataExtractor](../documents_and_nodes/metadata_extraction.md)
- [Embeddings](../embeddings/index.md)

Expand All @@ -13,10 +13,10 @@ Currently, the following components are Transformation objects:
While transformations are best used with with an IngestionPipeline, they can also be used directly.

```ts
import { SimpleNodeParser, TitleExtractor, Document } from "llamaindex";
import { SentenceSplitter, TitleExtractor, Document } from "llamaindex";

async function main() {
let nodes = new SimpleNodeParser().getNodesFromDocuments([
let nodes = new SentenceSplitter().getNodesFromDocuments([
new Document({ text: "I am 10 years old. John is 20 years old." }),
]);

Expand All @@ -34,15 +34,15 @@ main().catch(console.error);

## Custom Transformations

You can implement any transformation yourself by implementing the `TransformerComponent`.
You can implement any transformation yourself by implementing the `TransformComponent`.

The following custom transformation will remove any special characters or punctutaion in text.
The following custom transformation will remove any special characters or punctutation in text.

```ts
import { TransformerComponent, Node } from "llamaindex";
import { TransformComponent, TextNode } from "llamaindex";

class RemoveSpecialCharacters extends TransformerComponent {
async transform(nodes: Node[]): Promise<Node[]> {
export class RemoveSpecialCharacters extends TransformComponent {
async transform(nodes: TextNode[]): Promise<TextNode[]> {
for (const node of nodes) {
node.text = node.text.replace(/[^\w\s]/gi, "");
}
Expand Down
5 changes: 2 additions & 3 deletions apps/docs/docs/modules/node_parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ sidebar_position: 4
The `NodeParser` in LlamaIndex is responsible for splitting `Document` objects into more manageable `Node` objects. When you call `.fromDocuments()`, the `NodeParser` from the `Settings` is used to do this automatically for you. Alternatively, you can use it to split documents ahead of time.

```typescript
import { Document, SimpleNodeParser } from "llamaindex";
import { Document, SentenceSplitter } from "llamaindex";

const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

Settings.nodeParser = nodeParser;
```
Expand Down Expand Up @@ -93,6 +93,5 @@ The output metadata will be something like:
## API Reference
- [SimpleNodeParser](../api/classes/SimpleNodeParser.md)
- [SentenceSplitter](../api/classes/SentenceSplitter.md)
- [MarkdownNodeParser](../api/classes/MarkdownNodeParser.md)
10 changes: 5 additions & 5 deletions apps/docs/docs/modules/query_engines/router_query_engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import {
OpenAI,
RouterQueryEngine,
SimpleDirectoryReader,
SimpleNodeParser,
SentenceSplitter,
SummaryIndex,
VectorStoreIndex,
Settings,
Expand All @@ -34,11 +34,11 @@ const documents = await new SimpleDirectoryReader().loadData({

## Service Context

Next, we need to define some basic rules and parse the documents into nodes. We will use the `SimpleNodeParser` to parse the documents into nodes and `Settings` to define the rules (eg. LLM API key, chunk size, etc.):
Next, we need to define some basic rules and parse the documents into nodes. We will use the `SentenceSplitter` to parse the documents into nodes and `Settings` to define the rules (eg. LLM API key, chunk size, etc.):

```ts
Settings.llm = new OpenAI();
Settings.nodeParser = new SimpleNodeParser({
Settings.nodeParser = new SentenceSplitter({
chunkSize: 1024,
});
```
Expand Down Expand Up @@ -104,14 +104,14 @@ import {
OpenAI,
RouterQueryEngine,
SimpleDirectoryReader,
SimpleNodeParser,
SentenceSplitter,
SummaryIndex,
VectorStoreIndex,
Settings,
} from "llamaindex";

Settings.llm = new OpenAI();
Settings.nodeParser = new SimpleNodeParser({
Settings.nodeParser = new SentenceSplitter({
chunkSize: 1024,
});

Expand Down
4 changes: 2 additions & 2 deletions examples/agent/multi_document_agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ import {
OpenAI,
OpenAIAgent,
QueryEngineTool,
SentenceSplitter,
Settings,
SimpleNodeParser,
SimpleToolNodeMapping,
SummaryIndex,
VectorStoreIndex,
Expand Down Expand Up @@ -43,7 +43,7 @@ async function main() {
for (const title of wikiTitles) {
console.log(`Processing ${title}`);

const nodes = new SimpleNodeParser({
const nodes = new SentenceSplitter({
chunkSize: 200,
chunkOverlap: 20,
}).getNodesFromDocuments([countryDocs[title]]);
Expand Down
4 changes: 2 additions & 2 deletions examples/extractors/keywordExtractor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ import {
Document,
KeywordExtractor,
OpenAI,
SimpleNodeParser,
SentenceSplitter,
} from "llamaindex";

(async () => {
const openaiLLM = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: "banana apple orange pear peach watermelon" }),
Expand Down
4 changes: 2 additions & 2 deletions examples/extractors/questionsAnsweredExtractor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ import {
Document,
OpenAI,
QuestionsAnsweredExtractor,
SimpleNodeParser,
SentenceSplitter,
} from "llamaindex";

(async () => {
const openaiLLM = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({
Expand Down
4 changes: 2 additions & 2 deletions examples/extractors/summaryExtractor.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import {
Document,
OpenAI,
SimpleNodeParser,
SentenceSplitter,
SummaryExtractor,
} from "llamaindex";

(async () => {
const openaiLLM = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({
Expand Down
4 changes: 2 additions & 2 deletions examples/extractors/titleExtractor.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import { Document, OpenAI, SimpleNodeParser, TitleExtractor } from "llamaindex";
import { Document, OpenAI, SentenceSplitter, TitleExtractor } from "llamaindex";

import essay from "../essay";

(async () => {
const openaiLLM = new OpenAI({ model: "gpt-3.5-turbo-0125", temperature: 0 });

const nodeParser = new SimpleNodeParser({});
const nodeParser = new SentenceSplitter({});

const nodes = nodeParser.getNodesFromDocuments([
new Document({
Expand Down
7 changes: 2 additions & 5 deletions examples/jupyter/nodeparser.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@
"metadata": {},
"outputs": [],
"source": [
"import {\n",
" Document,\n",
" SimpleNodeParser\n",
"} from \"npm:llamaindex\";"
"import { Document, SentenceSplitter } from \"npm:llamaindex\";"
]
},
{
Expand Down Expand Up @@ -45,7 +42,7 @@
}
],
"source": [
"const nodeParser = new SimpleNodeParser();\n",
"const nodeParser = new SentenceSplitter();\n",
"const nodes = nodeParser.getNodesFromDocuments([\n",
" new Document({ text: \"I am 10 years old. John is 20 years old.\" }),\n",
"]);\n",
Expand Down
4 changes: 2 additions & 2 deletions examples/lowlevel.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ import {
Document,
NodeWithScore,
ResponseSynthesizer,
SimpleNodeParser,
SentenceSplitter,
TextNode,
} from "llamaindex";

(async () => {
const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();
const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: "I am 10 years old. John is 20 years old." }),
]);
Expand Down
4 changes: 2 additions & 2 deletions examples/pipeline/ingestion.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import {
IngestionPipeline,
MetadataMode,
OpenAIEmbedding,
SimpleNodeParser,
SentenceSplitter,
} from "llamaindex";

async function main() {
Expand All @@ -18,7 +18,7 @@ async function main() {
const document = new Document({ text: essay, id_: path });
const pipeline = new IngestionPipeline({
transformations: [
new SimpleNodeParser({ chunkSize: 1024, chunkOverlap: 20 }),
new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }),
new OpenAIEmbedding(),
],
});
Expand Down
4 changes: 2 additions & 2 deletions examples/routerQueryEngine.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import {
OpenAI,
RouterQueryEngine,
SentenceSplitter,
Settings,
SimpleDirectoryReader,
SimpleNodeParser,
SummaryIndex,
VectorStoreIndex,
} from "llamaindex";
Expand All @@ -12,7 +12,7 @@ import {
Settings.llm = new OpenAI();

// Update node parser
Settings.nodeParser = new SimpleNodeParser({
Settings.nodeParser = new SentenceSplitter({
chunkSize: 1024,
});

Expand Down
4 changes: 2 additions & 2 deletions examples/summaryIndex.ts
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
import {
Document,
SentenceSplitter,
Settings,
SimpleNodeParser,
SummaryIndex,
SummaryRetrieverMode,
} from "llamaindex";

import essay from "./essay";

// Update node parser
Settings.nodeParser = new SimpleNodeParser({
Settings.nodeParser = new SentenceSplitter({
chunkSize: 40,
});

Expand Down
10 changes: 5 additions & 5 deletions packages/llamaindex/tests/MetadataExtractors.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import {
TitleExtractor,
} from "llamaindex/extractors/index";
import { OpenAI } from "llamaindex/llm/openai";
import { SimpleNodeParser } from "llamaindex/nodeParsers/index";
import { SentenceSplitter } from "llamaindex/nodeParsers/index";
import { afterAll, beforeAll, describe, expect, test, vi } from "vitest";
import {
DEFAULT_LLM_TEXT_OUTPUT,
Expand Down Expand Up @@ -45,7 +45,7 @@ describe("[MetadataExtractor]: Extractors should populate the metadata", () => {
});

test("[MetadataExtractor] KeywordExtractor returns excerptKeywords metadata", async () => {
const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: DEFAULT_LLM_TEXT_OUTPUT }),
Expand All @@ -64,7 +64,7 @@ describe("[MetadataExtractor]: Extractors should populate the metadata", () => {
});

test("[MetadataExtractor] TitleExtractor returns documentTitle metadata", async () => {
const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: DEFAULT_LLM_TEXT_OUTPUT }),
Expand All @@ -83,7 +83,7 @@ describe("[MetadataExtractor]: Extractors should populate the metadata", () => {
});

test("[MetadataExtractor] QuestionsAnsweredExtractor returns questionsThisExcerptCanAnswer metadata", async () => {
const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: DEFAULT_LLM_TEXT_OUTPUT }),
Expand All @@ -103,7 +103,7 @@ describe("[MetadataExtractor]: Extractors should populate the metadata", () => {
});

test("[MetadataExtractor] SumamryExtractor returns sectionSummary metadata", async () => {
const nodeParser = new SimpleNodeParser();
const nodeParser = new SentenceSplitter();

const nodes = nodeParser.getNodesFromDocuments([
new Document({ text: DEFAULT_LLM_TEXT_OUTPUT }),
Expand Down
6 changes: 3 additions & 3 deletions packages/llamaindex/tests/ingestion/IngestionCache.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import {
IngestionCache,
getTransformationHash,
} from "llamaindex/ingestion/IngestionCache";
import { SimpleNodeParser } from "llamaindex/nodeParsers/index";
import { SentenceSplitter } from "llamaindex/nodeParsers/index";
import { beforeAll, describe, expect, test } from "vitest";

describe("IngestionCache", () => {
Expand All @@ -32,7 +32,7 @@ describe("getTransformationHash", () => {

beforeAll(() => {
nodes = [new TextNode({ text: "some text", id_: "some id" })];
transform = new SimpleNodeParser({
transform = new SentenceSplitter({
chunkOverlap: 10,
chunkSize: 1024,
});
Expand Down Expand Up @@ -66,7 +66,7 @@ describe("getTransformationHash", () => {
const result1 = getTransformationHash(nodes, transform);
const result2 = getTransformationHash(
nodes,
new SimpleNodeParser({
new SentenceSplitter({
chunkOverlap: 10,
chunkSize: 512,
}),
Expand Down

0 comments on commit be3e280

Please sign in to comment.