Skip to content

Conversation

@letonghan
Copy link
Collaborator

Description

Fix html content loading problem of dataprep.
Use AsyncHtmlLoader of langchain to load and analysis html contents.

Issues

Fail to retrieve html contents after uploading html links.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Add html2text in requirements.txt

Tests

Local tested

pre-commit-ci bot and others added 8 commits November 18, 2024 09:52
* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: letonghan <[email protected]>
@chensuyue chensuyue added this to the v1.1 milestone Nov 19, 2024
@chensuyue
Copy link
Collaborator

chensuyue commented Nov 19, 2024

The left issue for vdms microservice will be tracking by another PR.
@srinarayan-srikanthan will submit the fixing PR.

@lvliang-intel lvliang-intel merged commit 1bfc430 into opea-project:main Nov 19, 2024
@chensuyue chensuyue mentioned this pull request Nov 19, 2024
4 tasks
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Nov 22, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Nov 28, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
cameronmorin pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Dec 2, 2024
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
@letonghan letonghan deleted the dataprep/upload_link branch December 19, 2024 08:08
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants