Skip to content

Commit

Permalink
fix(docs): set path style to github pages
Browse files Browse the repository at this point in the history
  • Loading branch information
HQarroum committed Feb 7, 2024
1 parent cb9bdda commit 9d04629
Show file tree
Hide file tree
Showing 35 changed files with 63 additions and 62 deletions.
1 change: 1 addition & 0 deletions docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import starlight from '@astrojs/starlight';
// https://astro.build/config
export default defineConfig({
site: process.env.ASTRO_SITE,
base: '/project-lakechain',
markdown: {
gfm: true
},
Expand Down
4 changes: 2 additions & 2 deletions docs/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The TAR inflate processor makes it possible to extract, on-the-fly, the content

### 🗄️ Inflating Archives

To use this middleware, you import it in your CDK stack and connect it to a data source that provides TAR archives, such as the [S3 Trigger](/triggers/s3-event-trigger) if your TAR archives are stored in S3.
To use this middleware, you import it in your CDK stack and connect it to a data source that provides TAR archives, such as the [S3 Trigger](/project-lakechain/triggers/s3-event-trigger) if your TAR archives are stored in S3.

> ℹ️ The below example shows how to create a pipeline that inflates TAR archives uploaded to an S3 bucket.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The Zip inflate processor makes it possible to extract the content of Zip archiv

### 🗄️ Inflating Archives

To use this middleware, you import it in your CDK stack and connect it to a data source that provides Zip archives, such as the [S3 Trigger](/triggers/s3-event-trigger) if your Zip archives are stored in S3.
To use this middleware, you import it in your CDK stack and connect it to a data source that provides Zip archives, such as the [S3 Trigger](/project-lakechain/triggers/s3-event-trigger) if your Zip archives are stored in S3.

> ℹ️ The below example shows how to create a pipeline that inflates Zip archives uploaded to an S3 bucket.
Expand Down
4 changes: 2 additions & 2 deletions docs/src/content/docs/audio-processing/bark-synthesizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The Bark synthesizer middleware synthesizes input text documents into voices usi

### 🐶 Synthesizing Text

To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline and connect it to a data source that provides input documents, such as the [S3 Trigger](/triggers/s3-event-trigger).
To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline and connect it to a data source that provides input documents, such as the [S3 Trigger](/project-lakechain/triggers/s3-event-trigger).

```typescript
import { BarkSynthesizer } from '@project-lakechain/bark-synthesizer';
Expand Down Expand Up @@ -53,7 +53,7 @@ class Stack extends cdk.Stack {

#### Input Language

The Bark synthesizer needs to know what is the source language of the text to be able to select the appropriate voice for the text-to-speech synthesis. The first location used by the middleware to infer the source language is the document metadata. If a previous middleware, such as the [NLP Text Processor](/text-processing/nlp-text-processor), has already detected the language of the document, the synthesizer will use that information. If no language was specified, the Bark synthesizer will assume the input document language to be english.
The Bark synthesizer needs to know what is the source language of the text to be able to select the appropriate voice for the text-to-speech synthesis. The first location used by the middleware to infer the source language is the document metadata. If a previous middleware, such as the [NLP Text Processor](/project-lakechain/text-processing/nlp-text-processor), has already detected the language of the document, the synthesizer will use that information. If no language was specified, the Bark synthesizer will assume the input document language to be english.

> ℹ️ Below is an example showcasing how to use the NLP Text processor to detect the language of input text documents to enrich their metadata before the Bark synthesizer is invoked.
Expand Down
4 changes: 2 additions & 2 deletions docs/src/content/docs/audio-processing/polly-synthesizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The Polly Synthesizer allows to synthesize speech from text using the [Amazon Po

To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.

> 💁 In the below example, we use the [NLP Processor](/text-processing/nlp-text-processor) to detect the language of the text and then use the Polly synthesizer to convert the text to speech using the detected language.
> 💁 In the below example, we use the [NLP Processor](/project-lakechain/text-processing/nlp-text-processor) to detect the language of the text and then use the Polly synthesizer to convert the text to speech using the detected language.
```typescript
import { PollySynthesizer } from '@project-lakechain/polly-synthesizer';
Expand Down Expand Up @@ -64,7 +64,7 @@ class Stack extends cdk.Stack {

#### Language Override

Amazon Polly needs to know the source language of the text to be able to associate with with a voice that is fit for synthesizing the text. In the previous example, we've used the [NLP Processor](/text-processing/nlp-text-processor) to detect the language of the text and then use the Polly synthesizer to convert the text to speech using the detected language.
Amazon Polly needs to know the source language of the text to be able to associate with with a voice that is fit for synthesizing the text. In the previous example, we've used the [NLP Processor](/project-lakechain/text-processing/nlp-text-processor) to detect the language of the text and then use the Polly synthesizer to convert the text to speech using the detected language.

You can however manually override the source language of the text if your source documents share a common known language that is [supported by Amazon Polly](https://docs.aws.amazon.com/polly/latest/dg/SupportedLanguage.html).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ title: Firehose Connector

---

The Firehose storage connector makes it possible to forward [CloudEvents](/general/events) emitted by one or multiple middlewares in a pipeline to a user-defined Kinesis Firehose delivery stream. This connector allows to nicely decouple the processing of your documents with third-party applications that can consume processed documents from a delivery stream.
The Firehose storage connector makes it possible to forward [CloudEvents](/project-lakechain/general/events) emitted by one or multiple middlewares in a pipeline to a user-defined Kinesis Firehose delivery stream. This connector allows to nicely decouple the processing of your documents with third-party applications that can consume processed documents from a delivery stream.

> 💁 This connector only forwards the CloudEvents emitted by middlewares to the delivery stream, and not the documents themselves.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ title: OpenSearch

---

The OpenSearch storage connector enables developers to automatically push [CloudEvents](/general/events) to an [OpenSearch](https://opensearch.org/) domain, and index documents at scale within their pipelines. This connector uses [AWS Firehose](https://aws.amazon.com/firehose/) to buffer events and store them in batch to OpenSearch using a serverless architecture.
The OpenSearch storage connector enables developers to automatically push [CloudEvents](/project-lakechain/general/events) to an [OpenSearch](https://opensearch.org/) domain, and index documents at scale within their pipelines. This connector uses [AWS Firehose](https://aws.amazon.com/firehose/) to buffer events and store them in batch to OpenSearch using a serverless architecture.

---

Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/connectors/s3-storage-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ title: S3 Connector

---

The S3 storage connector makes it possible to capture the result of one or multiple middlewares in a pipeline and store their results in a user-defined S3 bucket destination. This connector supports storing both the [CloudEvents](/general/events) emitted by middlewares, but also optionally copy the output document itself to the destination bucket.
The S3 storage connector makes it possible to capture the result of one or multiple middlewares in a pipeline and store their results in a user-defined S3 bucket destination. This connector supports storing both the [CloudEvents](/project-lakechain/general/events) emitted by middlewares, but also optionally copy the output document itself to the destination bucket.

---

Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/connectors/sqs-storage-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ title: SQS Connector

The SQS storage connector makes it possible to capture the result of one or multiple middlewares in a pipeline and store their results in a user-defined SQS queue. This connector allows to nicely decouple the processing of your documents with third-party applications that can consume processed documents from a queue.

> 💁 This connector only forwards the [CloudEvents](/general/events) emitted by middlewares to the SQS queue, and not the documents themselves.
> 💁 This connector only forwards the [CloudEvents](/project-lakechain/general/events) emitted by middlewares to the SQS queue, and not the documents themselves.
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ The Bedrock embedding processor does not modify or alter source documents in any

Both the Titan and Cohere embedding models have limits on the number of input tokens they can process. For more information, you can consult the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/limits.html) to understand these limits.

> 💁 To limit the size of upstream text documents, we recommend to use a text splitter to chunk text documents before they are passed to this middleware, such as the [Recursive Character Text Splitter](/text-splitters/recursive-character-text-splitter).
> 💁 To limit the size of upstream text documents, we recommend to use a text splitter to chunk text documents before they are passed to this middleware, such as the [Recursive Character Text Splitter](/project-lakechain/text-splitters/recursive-character-text-splitter).

Furthermore, this middleware applies a throttling of 10 concurrently processed documents from its input queue to ensure that it does not exceed the limits of the embedding models it uses — see [Bedrock Quotas](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html) for more information.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ The Sentence Transformers middleware does not modify or alter source documents i

Sentence Transformer models have limits on the number of input tokens they can process. For more information, you can consult the documentation of the specific model you are using to understand these limits.

> 💁 To limit the size of upstream text documents, we recommend to use a text splitter to chunk text documents before they are passed to this middleware, such as the [Recursive Character Text Splitter](/text-splitters/recursive-character-text-splitter).
> 💁 To limit the size of upstream text documents, we recommend to use a text splitter to chunk text documents before they are passed to this middleware, such as the [Recursive Character Text Splitter](/project-lakechain/text-splitters/recursive-character-text-splitter).
<br>

Expand Down
6 changes: 3 additions & 3 deletions docs/src/content/docs/general/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ title: Concepts

## 🏗 Pipelines

Project Lakechain articulates around the concept of *pipelines* which are the unit of execution of any document processing job. Pipelines get executed by *triggers* based on document events emitted by a data source. For example, you can declare a pipeline that is triggered every time you upload a new document on an S3 bucket by using the [S3 Trigger](/triggers/s3-event-trigger).
Project Lakechain articulates around the concept of *pipelines* which are the unit of execution of any document processing job. Pipelines get executed by *triggers* based on document events emitted by a data source. For example, you can declare a pipeline that is triggered every time you upload a new document on an S3 bucket by using the [S3 Trigger](/project-lakechain/triggers/s3-event-trigger).

![Lakechain Pipeline](../../../assets//s3-trigger.png)

Each pipeline can be composed of one or many components that we call *middlewares*. Each middleware in a pipeline can perform actions on documents, such as analyzing the document to extract metadata, or apply transformations on that document. They provide the foundation for composing complex pipelines, and the freedom to developers to interchange between different middlewares to quickly create experiments and A/B test different approaches to compare their performance and accuracy.

### Your First Pipeline

Let's say you want to create an audio transcription pipeline that takes audio recordings as an input, and produces structured text transcriptions as an output. With Lakechain you can compose your pipeline by leveraging the [Transcribe Audio Processor](/audio-processing/transcribe-audio-processor) middleware to do just that.
Let's say you want to create an audio transcription pipeline that takes audio recordings as an input, and produces structured text transcriptions as an output. With Lakechain you can compose your pipeline by leveraging the [Transcribe Audio Processor](/project-lakechain/audio-processing/transcribe-audio-processor) middleware to do just that.

![Transcribe Pipeline](../../../assets//transcribe-pipeline.png)

Expand Down Expand Up @@ -44,7 +44,7 @@ Every middleware declares a set of supported input and output types expressed as

This allows Lakechain to raise deployment-time exceptions if you connect middlewares that don't have overlapping input and output types, the same way a typed programming language would prevent you to compile your code if you try to pass an unsupported type to a function or method.

Some middlewares, such as the [S3 Trigger](/triggers/s3-event-trigger) may declare a *variant* as an output type, because customers can store *any* type of document into an S3 bucket. In those cases, only middlewares that support the concrete type of the document — known at runtime — will be triggered. This filtering makes your middlewares type-safe in any situation, thus preventing potential errors and unnecessary costs.
Some middlewares, such as the [S3 Trigger](/project-lakechain/triggers/s3-event-trigger) may declare a *variant* as an output type, because customers can store *any* type of document into an S3 bucket. In those cases, only middlewares that support the concrete type of the document — known at runtime — will be triggered. This filtering makes your middlewares type-safe in any situation, thus preventing potential errors and unnecessary costs.

![Event Filtering](../../../assets//event-filtering.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/src/content/docs/general/events.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Name | Description | Format | Mandatory

## 📖 Metadata

The `metadata` object contains additional information about the document. Metadata are enriched by middlewares through the lifecycle of a pipeline. For example, the [Image Metadata Extractor](/image-processing/image-metadata-extractor) enriches the metadata object with information such as image dimensions, EXIF tags, authors, camera model, etc.
The `metadata` object contains additional information about the document. Metadata are enriched by middlewares through the lifecycle of a pipeline. For example, the [Image Metadata Extractor](/project-lakechain/image-processing/image-metadata-extractor) enriches the metadata object with information such as image dimensions, EXIF tags, authors, camera model, etc.

```json
{
Expand Down
8 changes: 4 additions & 4 deletions docs/src/content/docs/general/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@ title: FAQ

Project Lakechain is a framework allowing AWS customers develop and deploy scalable and resilient document processing pipelines on AWS. Project Lakechain is built on top of the [AWS CDK](https://aws.amazon.com/cdk/), allowing customers to express their pipelines as infrastructure-as-code and follow best-practices of consistent, repeatable, auditable and versioned infrastructure.

With Lakechain, developers can compose their pipelines using [middlewares](/general/concepts#-middlewares), and model them in the shape of a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph).
With Lakechain, developers can compose their pipelines using [middlewares](/project-lakechain/general/concepts#-middlewares), and model them in the shape of a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph).

<br />

---

##### What's different about Project Lakechain?

Project Lakechain has been built on top of a cloud-native architecture with scale, security and cost-efficiency in mind since the very beginning. It leverages a strong foundation for high-throughput message-passing based on AWS SQS and AWS SNS, and a [security model](/guides/security-model) based on AWS IAM to keep customer data secure and private.
Project Lakechain has been built on top of a cloud-native architecture with scale, security and cost-efficiency in mind since the very beginning. It leverages a strong foundation for high-throughput message-passing based on AWS SQS and AWS SNS, and a [security model](/project-lakechain/guides/security-model) based on AWS IAM to keep customer data secure and private.

> ℹ️ See the [Architecture Overview](/guides/architecture) section for more details on the architecture of Lakechain.
> ℹ️ See the [Architecture Overview](/project-lakechain/guides/architecture) section for more details on the architecture of Lakechain.
By providing dozens of existing middlewares, built for the Cloud, and addressing the most common needs for processing documents using Machine-Learning, Generative AI, NLP, and Computer Vision, Project Lakechain provides an ideal blueprint for rapid prototyping and validation of ideas.

Expand Down Expand Up @@ -44,7 +44,7 @@ No, Project Lakechain is currently not intended for production-use. It is intend

##### What are the requirements to use Project Lakechain?

You can find the technical requirements for using Project Lakechain in the [Pre-requisites](/general/pre-requisites) section of the documentation.
You can find the technical requirements for using Project Lakechain in the [Pre-requisites](/project-lakechain/general/pre-requisites) section of the documentation.

<br />

Expand Down
10 changes: 5 additions & 5 deletions docs/src/content/docs/general/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Quickstart

To help you kickstart your journey with Project Lakechain, we are going to walk you through the step-by-step deployment of your first pipeline by deploying one of the [examples](https://github.com/awslabs/project-lakechain/tree/main/examples) we've built for you.

> 💁 The [pre-requisites](/general/pre-requisites) section helps you ensure you have the necessary setup on your development environment and are ready to go!
> 💁 The [pre-requisites](/project-lakechain/general/pre-requisites) section helps you ensure you have the necessary setup on your development environment and are ready to go!
---

Expand All @@ -28,13 +28,13 @@ This is how the pipeline we are going to deploy looks like.

![Face Blurring Pipeline](../../../assets//face-blurring-pipeline.png)

1. The [S3 Trigger](/triggers/s3-event-trigger) monitors any uploaded document from the source S3 buckets, and translates the S3 event into a [Cloud Event](/general/events) that's understood by the rest of the middlewares.
1. The [S3 Trigger](/project-lakechain/triggers/s3-event-trigger) monitors any uploaded document from the source S3 buckets, and translates the S3 event into a [Cloud Event](/project-lakechain/general/events) that's understood by the rest of the middlewares.

2. The [Rekognition Image Processor](/image-processing/rekognition-image-processor) handles face detections, and enriches document metadata with detected faces information.
2. The [Rekognition Image Processor](/project-lakechain/image-processing/rekognition-image-processor) handles face detections, and enriches document metadata with detected faces information.

3. The [Image Layer Processor](/image-processing/image-layer-processor) uses face detection information to blur faces and highlight face landmarks in the image.
3. The [Image Layer Processor](/project-lakechain/image-processing/image-layer-processor) uses face detection information to blur faces and highlight face landmarks in the image.

4. At the end of the pipeline, the [S3 Storage Connector](/storage-connectors/s3-storage-connector) stores the transformed image in the destination S3 bucket.
4. At the end of the pipeline, the [S3 Storage Connector](/project-lakechain/connectors/s3-storage-connector) stores the transformed image in the destination S3 bucket.

<br>

Expand Down
Loading

0 comments on commit 9d04629

Please sign in to comment.