Skip to content

Conversation

@kachihro
Copy link
Contributor

Purpose

We had a requirement to ingest EMAIL files, EML and MSG. This required a new 'parser' - as well as a new component to display the email, when selected.

Another issue was the "#" character in some fileNames - they would ingest OK, but couldn't use the HREF due to the # within the fileName/Url - rather than DOC.PDF#page1

This PR includes an additional package for the 'extract-msg' - and some CSS tweaks also (for showing MAIL)

Does this introduce a breaking change?

Have been able to deploy via "AZD DEPLOY" and the processing of EML/MSG files is working - and can view them after a chat or ask response. ✅

[ X ] Yes
[ ] No


## Does this require changes to learn.microsoft.com docs?
No changes to deployment instructions - perhaps need to include details that EML and MSG are supported types.

[ X ] Yes
[ ] No

Type of change

[ X ] Bugfix - with fileName containing "#"
[ X ] Feature - able to ingest EML and MSG
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:


$argumentList

Start-Process -FilePath $venvPythonPath -ArgumentList $argumentList -Wait -NoNewWindow
$process = Start-Process -FilePath $venvPythonPath -ArgumentList $argumentList -Wait -NoNewWindow -PassThru
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes orthogonal to the EML change?

files = self.list_file_strategy.list()
async for file in files:
try:
# Check if filename contains # and rename on disk if it does
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to deprecate integratedvectorizerstrategy.py in favor of cloud ingestion strategy only, so no changes should be needed to this file.

import email
from email import policy
from typing import AsyncGenerator, IO, Union, Optional
import extract_msg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run pre-commit on all files to fix formatting, see CONTRIBUTING.md for instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants