-
-
Notifications
You must be signed in to change notification settings - Fork 154
Data Gathering Methodology
Welcome to our dedicated GitHub wiki for understanding and advancing the data-gathering methodology pertaining to OWASP's Top 10 for LLM AI Applications. As technology continues to evolve at an unprecedented pace, particularly in the domains of artificial intelligence and deep learning, securing these systems is of paramount importance. This wiki serves as a central repository for methodologies, strategies, and tools associated with understanding and prioritizing vulnerabilities in LLMs based on real-world data.
-
Centralized Knowledge Base: With the multifaceted nature of LLM vulnerabilities, having a one-stop solution where developers, researchers, and security experts can find and contribute to the most recent and relevant methodologies is invaluable.
-
Collaborative Environment: GitHub offers an interactive platform where community members can collaborate, providing insights, updates, and refinements to the existing methodology.
-
Transparency & Open Source Spirit: In line with the ethos of OWASP and the open-source community, this wiki promotes transparency in the data-gathering process, ensuring everyone has access to the best practices in vulnerability assessment.
-
Addressing the Dynamic Nature of Threats: The field of AI security is nascent but growing rapidly. This wiki will act as a live document, continuously evolving to capture the latest threats and vulnerabilities.
Throughout this wiki, you'll find:
- Detailed steps and guidelines for data collection related to OWASP vulnerabilities in LLMs.
- Tools, scripts, and code snippets to aid the data-gathering process.
- Expert contributions, reviews, and insights on refining the methodology.
- A section dedicated to ethical considerations, ensuring data is gathered and used responsibly.
- Community-driven surveys, discussions, and feedback mechanisms.
Whether you're a seasoned security expert, a researcher in AI, or just someone keen on understanding the landscape of LLM vulnerabilities, this wiki is for you. Dive in, explore, contribute, and let's work together to make our AI systems more secure!
Our Slack channel is #team-llm-datagathering-methodology
This repository hosts a curated literature review that focuses on the OWASP Top 10 vulnerabilities within the context of Language Learning Models (LLMs). It systematically categorizes research papers to enhance understanding of various aspects of security and application in the field.
The categorized_papers.csv
file is a compilation of research articles categorized using a custom Python script. The script categorizes articles based on the following criteria:
-
Research Methods:
-
Case Studies
: In-depth analysis of individual or group subjects. -
Interviews
: Qualitative data through structured conversations. -
Content Analysis
: Quantitative and qualitative analysis of document content. -
Surveys and Questionnaires
: Structured data collection from samples. -
Experiments
: Empirical research to validate hypotheses. -
Statistical Analysis
: Quantitative analysis to explore data patterns. -
Mixed Methods
: Combination of qualitative and quantitative research methods.
-
-
Focus Areas:
-
Risk Assessment
: Evaluating potential vulnerabilities or threats. -
Expert Opinions
: Insights from subject-matter experts. -
Technological Assessments
: Analysis of technology applications. -
Policy and Regulation
: Examination of governance and compliance issues.
-
-
Topics and Themes:
-
LLM Security
: Security aspects related to Language Learning Models. -
Industry Applications
: Practical applications of LLMs in industry settings. -
Emerging Threats
: Identification of new and evolving security threats. -
Solutions and Mitigations
: Strategies to address vulnerabilities and threats.
-
-
Geographical Focus:
-
Global
: Concerns and studies with worldwide implications. -
Regional
: Research focused on specific areas such as Asia, Europe, the Americas, or Africa.
-
-
Temporal Focus:
-
Historical Analyses
: Lessons from the past and their impact on the present. -
Current Issues
: Contemporary challenges in the field. -
Future Predictions
: Projections and foresight into future trends and concerns.
-
The dataset aims to provide researchers, practitioners, and enthusiasts with a structured overview of the existing literature, enabling more efficient knowledge discovery and gap analysis in the domain of cybersecurity for LLMs.
We encourage collaboration and contributions to continually enhance the repository. For more information on how you can contribute, please visit the contributing guidelines.
Note: The categorization script and methodology are also included in the repository for transparency and reproducibility.
Access the full dataset and scripts: Literature Review Repository