Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Docs and Samples #5

Merged
merged 7 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .devcontainer/base/configure.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

# Install .NET Aspire Workload
# See documentation for more details
# https://learn.microsoft.com/dotnet/aspire/fundamentals/setup-tooling?tabs=linux&pivots=vscode
if command -v dotnet &> /dev/null
then
echo "dotnet is installed."

# Specify the workload you want to install
WORKLOAD="aspire"

# Update workloads
sudo dotnet workload update

# Install the workload
sudo dotnet workload install $WORKLOAD

echo "Workload '$WORKLOAD' has been installed."
else
echo "dotnet is not installed. Please install dotnet first."
fi

# Download data files for examples
#!/bin/bash

# URL of the file to download
FILE_URL="https://arxiv.org/pdf/1706.03762"

# Directory where you want to place the downloaded file
TARGET_DIR="./samples/data"

# Name of the file after downloading
FILE_NAME="attention-is-all-you-need.pdf"

# Create the target directory if it doesn't exist
mkdir -p $TARGET_DIR

# Download the file and place it in the target directory
curl -o $TARGET_DIR/$FILE_NAME $FILE_URL

echo "File downloaded to $TARGET_DIR/$FILE_NAME"
22 changes: 22 additions & 0 deletions .devcontainer/base/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "LlamaParse .NET DevContainer",
"image": "mcr.microsoft.com/devcontainers/base:debian",
"features": {
"ghcr.io/devcontainers/features/git:1": {},
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/devcontainers/features/dotnet:2": {
"version": "8.0"
}
},
"customizations": {
"vscode": {
"extensions": [
"ms-vscode-remote.vscode-remote-extensionpack",
"ms-azuretools.vscode-docker",
"ms-dotnettools.csharp",
"ms-dotnettools.dotnet-interactive-vscode"
]
}
},
"postCreateCommand": "./.devcontainer/base/configure.sh"
}
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -396,3 +396,6 @@ FodyWeavers.xsd

# JetBrains Rider
*.sln.iml

# Don't commit samples data files
/samples/data/*
25 changes: 25 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Contributing to LlamaIndex .NET

## Issues

Found bugs or have feature requests? File an issue.

## Documentation & Samples

We encourage community submitted samples. All of our samples are in the [samples](./samples/README.md) directory.

To create new samples, submit a pull request.

## Development

### Project Structure

- `LlamaIndex.Core`: Core types and abstractions for LlamaIndex.
- `LlamaIndex.Core.Tests`: Unit tests for `LlamaIndex.Core`.
- `LlamaParse`: LlamaParse .NET client SDK
- `LlamaParse.Test`: Unit tests for LlamaParse .NET client SDK

### Configuration

1. Install [.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0)
1. Install [Visual Studio](https://visualstudio.microsoft.com/downloads/) or [Visual Studio Code](https://code.visualstudio.com/Download)
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 Diego Colombo
Copyright (c) 2024 LlamaIndex

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
7 changes: 7 additions & 0 deletions Llamaindex.sln
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Llamaindex.ServiceDefaults"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "LlamaParseAspire", "samples\Aspire\LlamaParseAspire\LlamaParseAspire.csproj", "{F721E5DD-3C0E-41A3-B030-913E4AE187F5}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ParseDocuments", "samples\GettingStarted\ParseDocuments\ParseDocuments.csproj", "{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -57,6 +59,10 @@ Global
{F721E5DD-3C0E-41A3-B030-913E4AE187F5}.Debug|Any CPU.Build.0 = Debug|Any CPU
{F721E5DD-3C0E-41A3-B030-913E4AE187F5}.Release|Any CPU.ActiveCfg = Release|Any CPU
{F721E5DD-3C0E-41A3-B030-913E4AE187F5}.Release|Any CPU.Build.0 = Release|Any CPU
{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1}.Debug|Any CPU.Build.0 = Debug|Any CPU
{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1}.Release|Any CPU.ActiveCfg = Release|Any CPU
{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand All @@ -69,6 +75,7 @@ Global
{E6CAE94F-626F-4348-A062-EADA9CABDB7A} = {D41C4A39-8A5E-488C-A2AE-5B164F79B07C}
{D31D3C9F-5055-4914-90C6-566E0EA46877} = {D41C4A39-8A5E-488C-A2AE-5B164F79B07C}
{F721E5DD-3C0E-41A3-B030-913E4AE187F5} = {D41C4A39-8A5E-488C-A2AE-5B164F79B07C}
{B0CD869E-5DAD-4EA7-AB5D-68A3516DF8E1} = {D41C4A39-8A5E-488C-A2AE-5B164F79B07C}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {A523EAFD-F1D2-429A-97E2-F86406625D67}
Expand Down
69 changes: 67 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,67 @@
# llamaindex.net
llamaindex interfaces for .net
# LlamaIndex.NET

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/colombod/llamaindex.net)

[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://codespaces.new/colombod/llamaindex.net)

LlamaIndex.NET contains core types for working with LlamaIndex and client SDKs.

At this time, the following are supported:

- LlamaParse client SDK for .NET

## What is LlamaIndex?

[LlamaIndex](https://llamaindex.ai/) is a data framework for LLM applications.

[LlamaCloud](https://docs.llamaindex.ai/en/stable/llama_cloud/) is a managed platfor for data parsing and ingestion. It consists of the following components:

- [**LlamaParse**](https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/): self-serve document parsing API
- **Ingestion and Retreival API**: Connect to 10+ data sources and sinks. Easily setup a data pipeline that can handle large volumes of data and incremental updates.
- **Evaluations and observability**: Run and track evaluations on your data and model

## Important Links

- Documentation: [https://docs.llamaindex.ai/en/stable/](https://docs.llamaindex.ai/en/stable/)
- Twitter: [https://twitter.com/llama_index](https://twitter.com/llama_index)
- Discord: [https://discord.gg/dGcwcsnxhU](https://discord.gg/dGcwcsnxhU)

## Contributing

Interested in contributing? See our [Contribution Guide](./CONTRIBUTING.md) for more details.

## Example Usage

Install the LlamaParse .NET SDK.

You can find samples in the [samples directory](./samples/README.md).

### Parse documents using the LlamaParse .NET SDK

```csharp
using LlamaParse;

// Initialize LlamaParse client
var parseConfig = new Configuration
{
ApiKey = "YOUR-API-KEY";
};

var client = new LlamaParseClient(new HttpClient(), parseConfig);

// Get file info
var fileInfo = new FileInfo("attention-is-all-you-need.pdf");

// Parse document and format result as JSON
var documents = new List<RawResult>();
await foreach(var document in client.LoadDataRawAsync(fileInfo, ResultType.Json)
{
documents.Add(document);
}

// Output to console
foreach(var document in documents)
{
Console.WriteLine(document);
}
```
65 changes: 65 additions & 0 deletions samples/Aspire/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,68 @@ This sample shows how to add a [LlamaParse](https://docs.llamaindex.ai/en/stable
"ApiKey": "ADD-YOUR-KEY-HERE"
}
```

## Guide

1. In your Web API project `LlamaParseAspire`, add the following code to register the `LlamaParseClient`.

```csharp
builder.AddLlamaParseClient(builder.Configuration.GetSection("LlamaParse").Get<Configuration>()!);
```

1. Use the `LlamaParseClient` just like you would in any other application. In this case, the `/parse` endpoint handler takes a file as input, uses LlamaParse to extract the data, and returns the parsed results back to the user for further downstream processing.

```csharp
var fileUploadHandler = async (LlamaParseClient client, IFormFile file) =>
{
var fileName = file.FileName;

// Read the file into a byte array
using var ms = new MemoryStream();
file.CopyTo(ms);

var inMemoryFile = new InMemoryFile(ms.ToArray(), fileName);

var sb = new StringBuilder();
await foreach (var doc in client.LoadDataAsync(inMemoryFile))
{
if(doc is ImageDocument)
{
continue;
}
else
{
sb.AppendLine(doc.Text);
}
}
return Results.Ok(sb.ToString());
};
```

## Enable telemetry

The LlamaParse .NET client SDK contains OpenTelemetry instrumentation to log traces and metrics related to LlamaParse jobs.

To enable it:

1. Add the following code to the `ConfigureOpenTelemetry` method in the `*.ServiceDefaults` project.

```csharp
builder.Services.AddOpenTelemetry()
.WithMetrics(metrics =>
{
//other metrics code...

// Add a meter for the LlamaParse namespace
metrics.AddMeter("LlamaParse");
})
.WithTracing(tracing =>
{
//other tracing code...

// Add a source for the LlamaParse namespace
tracing.AddSource("LlamaParse");
});
```

Now that this is configured, traces and metrics will begin to display in the Aspire dasboard. For more details, on [Aspire telemetry](https://learn.microsoft.com/dotnet/aspire/fundamentals/telemetry) and the [dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/overview?tabs=bash), see the documentation.
18 changes: 18 additions & 0 deletions samples/GettingStarted/ParseDocuments/ParseDocuments.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Version="8.0.0" />
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\..\..\src\LlamaParse\LlamaParse.csproj" />
</ItemGroup>

</Project>
45 changes: 45 additions & 0 deletions samples/GettingStarted/ParseDocuments/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
using LlamaParse;
using System.Diagnostics.Contracts;
using System.Text;
using System.Text.Json;
using System.Text.Json.Serialization;

// Configure LlamaParse client
var apiKey = Environment.GetEnvironmentVariable("LLAMACLOUD_API_KEY");

var parseConfig = new Configuration()
{
ApiKey = apiKey?? string.Empty
};

var llamaParseClient = new LlamaParseClient(new HttpClient(), parseConfig);

// Get document
var client = new HttpClient();
var documentData = await client.GetByteArrayAsync("https://arxiv.org/pdf/1706.03762");

// Parse documents
var document = new InMemoryFile(documentData, "attention-is-all-you-need.pdf");
var parsedDocs = llamaParseClient.LoadDataRawAsync(document, ResultType.Json);

// Output parse results
await foreach (var parsedDoc in parsedDocs)
{
var serializerOptions = new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true
};

var result = JsonSerializer.Deserialize<ParseResult>(parsedDoc.Result, serializerOptions);

foreach(var page in result.Pages)

Check warning on line 35 in samples/GettingStarted/ParseDocuments/Program.cs

View workflow job for this annotation

GitHub Actions / build

Dereference of a possibly null reference.

Check warning on line 35 in samples/GettingStarted/ParseDocuments/Program.cs

View workflow job for this annotation

GitHub Actions / build

Dereference of a possibly null reference.

Check warning on line 35 in samples/GettingStarted/ParseDocuments/Program.cs

View workflow job for this annotation

GitHub Actions / build

Dereference of a possibly null reference.
{
Console.WriteLine($"Page {page.Page}");
Console.WriteLine("-------------------");
Console.WriteLine(page.Text);
Console.WriteLine("-------------------");
}
}

public record ParseResult(PageContent[] Pages);
public record PageContent(int Page, string Text);
Loading