Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM PDF Parsing + LLM Guided Retrieval #255

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft

Conversation

KastanDay
Copy link
Member

Experimental, for categorizing PDFs

import pydantic

class DocumentMetadata(pydantic.BaseModel):
    authors: list[str]
    journal_name: str
    publication_date: str  # Changed from datetime.date to str for JSON parsing errors
    keywords: list[str]
    doi: str
    title: str
    subtitle: Optional[str]
    visible_urls: list[str]
    field_of_science: str
    concise_summary: str
    questions_document_can_answer: list[str]
{'authors': ['Chieh Hubert Lin',
  'Changil Kim',
  'Jia-Bin Huang',
  'Qinbo Li',
  'Chih Yao Ma',
  'Johannes Kopf',
  'Ming-Hsuan Yang',
  'Hung-Yu Tseng'],
 'journal_name': 'eccv',
 'publication_date': '15 Apr 2024',
 'keywords': ['Neural Radiance Field',
  'NeRF inpainting',
  'latent diffusion model',
  'stochasticity',
  'textural shift',
  'per-scene customization',
  'masked adversarial training',
  'pixel and perceptual losses'],
 'doi': '2404.09995v1',
 'title': 'Taming Latent Diffusion Model for Neural Radiance Field Inpainting',
 'subtitle': 'Taming Latent Diffusion Model for Neural Radiance Field Inpainting',
 'visible_urls': ['https://arxiv.org/abs/2404.09995v1'],
 'field_of_science': 'Computer Science',
 'concise_summary': 'This paper proposes a framework for NeRF inpainting that addresses issues related to stochasticity and textural shift in the latent diffusion model. The framework uses per-scene customization and masked adversarial training to improve the quality of the inpainted NeRF. The authors also found that commonly used pixel and perceptual losses are not suitable for this task.',
 'questions_document_can_answer': ['What is the main focus of this paper?',
  'What issues does the proposed framework address in the latent diffusion model?',
  'What techniques are used to improve the quality of the inpainted NeRF?',
  'What did the authors find about the commonly used pixel and perceptual losses in this task?']}

Copy link

You need to setup a payment method to use Lintrule

You can fix that by putting in a card here.

@KastanDay KastanDay marked this pull request as draft April 18, 2024 03:16
@KastanDay KastanDay changed the title LLM PDF Parsing LLM PDF Parsing + LLM Guided Retrieval Jul 3, 2024
Copy link

gitguardian bot commented Oct 11, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
14114433 Triggered Generic High Entropy Secret 3b12dda UIUC_Chat/pdf-parsing/qdrant.py View secret
14114433 Triggered Generic High Entropy Secret 1607f98 UIUC_Chat/pdf-parsing/qdrant.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants