Skip to content

Data Gathering Methodology

emmanuelgjr edited this page Feb 15, 2024 · 21 revisions

Introduction to Data Gathering Methodology for OWASP TOP 10 for LLM AI Applications

Welcome to our dedicated GitHub wiki for understanding and advancing the data-gathering methodology pertaining to OWASP's Top 10 for LLM AI Applications. As technology continues to evolve at an unprecedented pace, particularly in the domains of artificial intelligence and deep learning, securing these systems is of paramount importance. This wiki serves as a central repository for methodologies, strategies, and tools associated with understanding and prioritizing vulnerabilities in LLMs based on real-world data.

Why this Wiki?

  1. Centralized Knowledge Base: With the multifaceted nature of LLM vulnerabilities, having a one-stop solution where developers, researchers, and security experts can find and contribute to the most recent and relevant methodologies is invaluable.

  2. Collaborative Environment: GitHub offers an interactive platform where community members can collaborate, providing insights, updates, and refinements to the existing methodology.

  3. Transparency & Open Source Spirit: In line with the ethos of OWASP and the open-source community, this wiki promotes transparency in the data-gathering process, ensuring everyone has access to the best practices in vulnerability assessment.

  4. Addressing the Dynamic Nature of Threats: The field of AI security is nascent but growing rapidly. This wiki will act as a live document, continuously evolving to capture the latest threats and vulnerabilities.

What to Expect?

Throughout this wiki, you'll find:

  • Detailed steps and guidelines for data collection related to OWASP vulnerabilities in LLMs.
  • Tools, scripts, and code snippets to aid the data-gathering process.
  • Expert contributions, reviews, and insights on refining the methodology.
  • A section dedicated to ethical considerations, ensuring data is gathered and used responsibly.
  • Community-driven surveys, discussions, and feedback mechanisms.

Whether you're a seasoned security expert, a researcher in AI, or just someone keen on understanding the landscape of LLM vulnerabilities, this wiki is for you. Dive in, explore, contribute, and let's work together to make our AI systems more secure!

Our Slack channel is #team-llm-datagathering-methodology

OWASP DG v2 0

OWASP Top 10 for LLM - Literature Review Repository

This repository hosts a curated literature review that focuses on the OWASP Top 10 vulnerabilities within the context of Language Learning Models (LLMs). It systematically categorizes research papers to enhance understanding of various aspects of security and application in the field.

The categorized_papers.csv file is a compilation of research articles categorized using a custom Python script. The script categorizes articles based on the following criteria:

  • Research Methods:

    • Case Studies: In-depth analysis of individual or group subjects.
    • Interviews: Qualitative data through structured conversations.
    • Content Analysis: Quantitative and qualitative analysis of document content.
    • Surveys and Questionnaires: Structured data collection from samples.
    • Experiments: Empirical research to validate hypotheses.
    • Statistical Analysis: Quantitative analysis to explore data patterns.
    • Mixed Methods: Combination of qualitative and quantitative research methods.
  • Focus Areas:

    • Risk Assessment: Evaluating potential vulnerabilities or threats.
    • Expert Opinions: Insights from subject-matter experts.
    • Technological Assessments: Analysis of technology applications.
    • Policy and Regulation: Examination of governance and compliance issues.
  • Topics and Themes:

    • LLM Security: Security aspects related to Language Learning Models.
    • Industry Applications: Practical applications of LLMs in industry settings.
    • Emerging Threats: Identification of new and evolving security threats.
    • Solutions and Mitigations: Strategies to address vulnerabilities and threats.
  • Geographical Focus:

    • Global: Concerns and studies with worldwide implications.
    • Regional: Research focused on specific areas such as Asia, Europe, the Americas, or Africa.
  • Temporal Focus:

    • Historical Analyses: Lessons from the past and their impact on the present.
    • Current Issues: Contemporary challenges in the field.
    • Future Predictions: Projections and foresight into future trends and concerns.

The dataset aims to provide researchers, practitioners, and enthusiasts with a structured overview of the existing literature, enabling more efficient knowledge discovery and gap analysis in the domain of cybersecurity for LLMs.

We welcome collaboration and invite contributions to further improve our repository. For detailed information on how to contribute, please join our Slack channel as mentioned above. Forking the repository is the preferred method for substantial contributions. For additional guidance and best practices, we encourage you to visit our contributing guidelines.

Note: The categorization script and methodology are also included in the repository for transparency and reproducibility.


Access the full dataset and scripts: Literature Review Repository


Frameworks for Mapping OWASP Top 10 for Large Language Model Applications

This repository, owaspllmtop10mapping, provides mappings of the OWASP Top 10 for Large Language Model Applications to various cybersecurity frameworks and standards. These mappings offer guidance on aligning LLM security practices with established cybersecurity guidelines.

Mapped Frameworks and Standards

Explore each framework and standard for detailed insights:

  1. NIST Cybersecurity Framework - A foundational framework for managing cybersecurity risk.

  2. ISO/IEC 27001: Information Security Management and ISO/IEC 20547-4:2020: Big Data Reference Architecture Security and Privacy - Key international standards for global business compliance and security controls.

  3. MITRE ATT&CK - Detailed knowledge base for understanding and defending against cyber attacks.

  4. CIS Controls - Actionable controls developed by the Centre for Internet Security.

  5. CVEs (Common Vulnerabilities and Exposures) and CWEs (Common Weakness Enumeration) - Cataloging vulnerabilities and weaknesses, crucial in vulnerability management.

  6. FAIR (Factor Analysis of Information Risk) - Framework for risk quantification and management.

  7. STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege) - Threat modeling methodology for identifying security threats.

  8. ENISA (The European Union Agency for Network and Information Security) - Offers a range of cybersecurity advice, especially for European contexts.

  9. ASVS (Application Security Verification Standard) - A standard for web application security.

  10. SAMM (Software Assurance Maturity Model) - A model for integrating security into software development.

  11. MITRE ATLAS - Focused on adversarial behaviors in threat modeling and analysis.

  12. BSIMM (Building Security In Maturity Model) - Measures and improves software security initiatives.

  13. OPENCRE (Open Control Requirement Enumeration) - Facilitates understanding and implementation of various cybersecurity controls.

  14. CycloneDX Machine Learning Software Bill of Materials (SBOM)

    • Standard that provides advanced supply chain capabilities for cyber risk reduction.
    • Standard capable of representing software, hardware, services, and other types of inventory.

These frameworks and standards provide comprehensive guidelines for enhancing the security of Large Language Model applications, aiding in understanding, preventing, and mitigating potential vulnerabilities.

Clone this wiki locally