Ingest CCSS FAQ pages as a single chunk

## **NOTE:** Don't assign yourself unless you have have confirmed with Matthew you've got a working environment

## 🧠 Context

When ingesting FAQ pages from the Carleton Computer Science Society (CCSS) site (i.e., URLs like `https://ccss.carleton.ca/resources/faq/questions/**`), the current logic creates multiple chunks—including one just for the footer text (`© 2025 Carleton Computer Science Society`), which pollutes the index.

These pages should be treated as structured, self-contained documents. Rather than splitting them up, we should ingest the entire page as a single chunk and explicitly exclude generic or boilerplate content like the footer.

---

## 🛠 Implementation Plan

1. In `WebpageIngestionService`, detect if the source URL starts with `https://ccss.carleton.ca/resources/faq/questions/`.
2. If it matches, bypass the default chunking logic and instead:

   * Strip out the footer and boilerplate content.
   * Store the entire cleaned-up page content as one chunk.
3. Add a unit test to ensure:

   * The page is ingested as a single chunk.
   * The chunk does not contain the `© 2025 Carleton Computer Science Society` text.

---

## ✅ Acceptance Criteria

* If the URL matches the pattern `https://ccss.carleton.ca/resources/faq/questions/**`, ingest the page as a single chunk.
* Do not split the content into multiple chunks.
* Exclude footer content such as `© 2025 Carleton Computer Science Society` from the chunk.
* The resulting chunk should contain only the meaningful FAQ content.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingest CCSS FAQ pages as a single chunk #3

NOTE: Don't assign yourself unless you have have confirmed with Matthew you've got a working environment

🧠 Context

🛠 Implementation Plan

✅ Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ingest CCSS FAQ pages as a single chunk #3

Description

NOTE: Don't assign yourself unless you have have confirmed with Matthew you've got a working environment

🧠 Context

🛠 Implementation Plan

✅ Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions