From f171188f1b29df555360cea0c2df2d1ae6a8f0a0 Mon Sep 17 00:00:00 2001 From: Ads Dawson <104169244+GangGreenTemperTatum@users.noreply.github.com> Date: Tue, 21 May 2024 13:27:47 -0400 Subject: [PATCH] chore: Ads/data gathering import v2 (#320) * feat: kickoff v2 0 dir and files * chore: import data gathering to central repo --- data_gathering/README.md | 1 + data_gathering/data_validation/ASVS.md | 54 ++ data_gathering/data_validation/BSIMM.md | 61 ++ .../data_validation/CIS-CONTROLS.md | 76 ++ .../data_validation/CONTRIBUTING.md | 58 ++ data_gathering/data_validation/CWE.md | 94 +++ .../data_validation/CycloneDX-SBOM.md | 132 +++ data_gathering/data_validation/ENISA.md | 166 ++++ data_gathering/data_validation/FAIR.md | 64 ++ data_gathering/data_validation/ISO27001.md | 70 ++ .../data_validation/ISOIEC20547-4:2020.md | 85 ++ data_gathering/data_validation/LICENSE | 21 + data_gathering/data_validation/MITREATLAS.md | 162 ++++ data_gathering/data_validation/MITREATT&CK.md | 88 ++ data_gathering/data_validation/NIST.md | 94 +++ data_gathering/data_validation/OPENCRE.md | 39 + data_gathering/data_validation/README.md | 41 + data_gathering/data_validation/SAMM.md | 75 ++ data_gathering/data_validation/SETUP.md | 61 ++ data_gathering/data_validation/STRIDE.md | 74 ++ data_gathering/literature/CONTRIBUTING.md | 52 ++ data_gathering/literature/LICENSE | 201 +++++ data_gathering/literature/README.md | 87 ++ .../literature/categorized_papers.csv | 799 ++++++++++++++++++ data_gathering/literature/formatted_papers.md | 0 data_gathering/literature/scripts.md | 111 +++ data_gathering/mappings/ASVS.md | 59 ++ data_gathering/mappings/BSIMM.md | 59 ++ data_gathering/mappings/CIS_Controls.md | 55 ++ data_gathering/mappings/CODE_OF_CONDUCT.md | 45 + data_gathering/mappings/CONTRIBUTING.md | 51 ++ data_gathering/mappings/CVE_CWE.md | 53 ++ ...loneDX_Software-Bill-of-Materials(SBOM).md | 169 ++++ data_gathering/mappings/ENISA.md | 59 ++ data_gathering/mappings/FAIR.md | 95 +++ data_gathering/mappings/ISO20547-4:2020.md | 49 ++ data_gathering/mappings/ISO27001.md | 59 ++ data_gathering/mappings/LICENSE | 21 + data_gathering/mappings/MITREATLAS.md | 94 +++ data_gathering/mappings/MITREATT&CK.md | 67 ++ data_gathering/mappings/NIST.md | 80 ++ data_gathering/mappings/OPEN_CRE.md | 49 ++ data_gathering/mappings/README.md | 79 ++ data_gathering/mappings/SAMM.md | 59 ++ data_gathering/mappings/STRIDE.md | 55 ++ 45 files changed, 4023 insertions(+) create mode 100644 data_gathering/README.md create mode 100644 data_gathering/data_validation/ASVS.md create mode 100644 data_gathering/data_validation/BSIMM.md create mode 100644 data_gathering/data_validation/CIS-CONTROLS.md create mode 100644 data_gathering/data_validation/CONTRIBUTING.md create mode 100644 data_gathering/data_validation/CWE.md create mode 100644 data_gathering/data_validation/CycloneDX-SBOM.md create mode 100644 data_gathering/data_validation/ENISA.md create mode 100644 data_gathering/data_validation/FAIR.md create mode 100644 data_gathering/data_validation/ISO27001.md create mode 100644 data_gathering/data_validation/ISOIEC20547-4:2020.md create mode 100644 data_gathering/data_validation/LICENSE create mode 100644 data_gathering/data_validation/MITREATLAS.md create mode 100644 data_gathering/data_validation/MITREATT&CK.md create mode 100644 data_gathering/data_validation/NIST.md create mode 100644 data_gathering/data_validation/OPENCRE.md create mode 100644 data_gathering/data_validation/README.md create mode 100644 data_gathering/data_validation/SAMM.md create mode 100644 data_gathering/data_validation/SETUP.md create mode 100644 data_gathering/data_validation/STRIDE.md create mode 100644 data_gathering/literature/CONTRIBUTING.md create mode 100644 data_gathering/literature/LICENSE create mode 100644 data_gathering/literature/README.md create mode 100644 data_gathering/literature/categorized_papers.csv create mode 100644 data_gathering/literature/formatted_papers.md create mode 100644 data_gathering/literature/scripts.md create mode 100644 data_gathering/mappings/ASVS.md create mode 100644 data_gathering/mappings/BSIMM.md create mode 100644 data_gathering/mappings/CIS_Controls.md create mode 100644 data_gathering/mappings/CODE_OF_CONDUCT.md create mode 100644 data_gathering/mappings/CONTRIBUTING.md create mode 100644 data_gathering/mappings/CVE_CWE.md create mode 100644 data_gathering/mappings/CycloneDX_Software-Bill-of-Materials(SBOM).md create mode 100644 data_gathering/mappings/ENISA.md create mode 100644 data_gathering/mappings/FAIR.md create mode 100644 data_gathering/mappings/ISO20547-4:2020.md create mode 100644 data_gathering/mappings/ISO27001.md create mode 100644 data_gathering/mappings/LICENSE create mode 100644 data_gathering/mappings/MITREATLAS.md create mode 100644 data_gathering/mappings/MITREATT&CK.md create mode 100644 data_gathering/mappings/NIST.md create mode 100644 data_gathering/mappings/OPEN_CRE.md create mode 100644 data_gathering/mappings/README.md create mode 100644 data_gathering/mappings/SAMM.md create mode 100644 data_gathering/mappings/STRIDE.md diff --git a/data_gathering/README.md b/data_gathering/README.md new file mode 100644 index 00000000..eab1b504 --- /dev/null +++ b/data_gathering/README.md @@ -0,0 +1 @@ +[Data Gathering Methodology Wiki](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/wiki/Data-Gathering-Methodology) diff --git a/data_gathering/data_validation/ASVS.md b/data_gathering/data_validation/ASVS.md new file mode 100644 index 00000000..9ee23d18 --- /dev/null +++ b/data_gathering/data_validation/ASVS.md @@ -0,0 +1,54 @@ +```python +# Python Data Validation Script for LLM Vulnerabilities (LLM01 - LLM10) + +import re +from cryptography.fernet import Fernet +import ratelimit +import requests + +# Example key, generate your own using Fernet.generate_key() +key = b'your_encryption_key_here' +cipher_suite = Fernet(key) + +# LLM01 & LLM02: Input Validation and Secure Output Handling +def validate_and_encode_input(input_data): + """Validate and sanitize input data, then return encoded for secure handling.""" + # Simple validation example, adapt regex to your needs + if re.match("^[a-zA-Z0-9 ]*$", input_data): + # Securely encode output to prevent injection attacks + encoded_data = cipher_suite.encrypt(input_data.encode('utf-8')) + return encoded_data + else: + raise ValueError("Invalid input") + +# LLM03: Secure Training Data +def encrypt_training_data(training_data): + """Encrypt training data to ensure integrity.""" + encrypted_data = cipher_suite.encrypt(training_data.encode('utf-8')) + return encrypted_data + +# LLM04: Implement Rate Limiting to prevent DoS +@ratelimit.limits(calls=100, period=ratelimit.HOUR) +def process_request(request_data): + """Process incoming request with rate limiting to prevent DoS attacks.""" + # Simulate request processing + return "Request processed" + +# LLM05, LLM06, LLM07, LLM08, LLM09, LLM10 are more conceptual and require organizational +# and architectural measures, including secure plugin design, API security, +# data protection methods, and more, which are beyond the scope of a simple script. +# These require comprehensive approaches involving multiple systems and practices. + + +### Automated Validation Tools + +For automating validation and ensuring adherence to ASVS standards, consider integrating the following tools into your development and deployment pipelines: + +- **OWASP ZAP (Zed Attack Proxy)**: For finding vulnerabilities in web applications. +- **SonarQube**: For continuous inspection of code quality to detect bugs, vulnerabilities, and code smells in your code. +- **OWASP Dependency-Check**: For detecting publicly disclosed vulnerabilities in project dependencies. +- **Bandit**: For finding common security issues in Python code. +- **cryptography**: Python library for encryption and decryption to secure data, as shown in the script. +- **Ratelimit**: Python library to implement rate limiting, as demonstrated in the script. + +These tools can automate aspects of security validation and complement the script's functionalities, focusing on specific LLM vulnerabilities and general application security concerns outlined in the ASVS. diff --git a/data_gathering/data_validation/BSIMM.md b/data_gathering/data_validation/BSIMM.md new file mode 100644 index 00000000..9964e97a --- /dev/null +++ b/data_gathering/data_validation/BSIMM.md @@ -0,0 +1,61 @@ +```python +# Python Data Validation Script Example + +import re +from cryptography.fernet import Fernet +import tensorflow_data_validation as tfdv +import pandas as pd + +# Generate a key for encryption/decryption +# In practice, store this key securely +key = Fernet.generate_key() +cipher_suite = Fernet(key) + +# Example data validators +def validate_prompt(prompt): + """Simple validation to avoid prompt injection.""" + if re.search(r"[^\w\s]", prompt): + raise ValueError("Invalid characters in prompt.") + return prompt + +def sanitize_output(output): + """Basic output sanitization to prevent insecure data exposure.""" + sanitized_output = re.sub(r"[^\w\s]", "", output) + return sanitized_output + +def validate_training_data(training_data_file): + """Check integrity of training data using TensorFlow Data Validation.""" + stats = tfdv.generate_statistics_from_csv(training_data_file) + anomalies = tfdv.validate_statistics(stats, tfdv.load_schema_text('schema.pbtxt')) + tfdv.display_anomalies(anomalies) + +def encrypt_sensitive_data(data): + """Encrypt sensitive data.""" + if isinstance(data, str): + data = data.encode() + encrypted_data = cipher_suite.encrypt(data) + return encrypted_data + +def decrypt_sensitive_data(encrypted_data): + """Decrypt sensitive data.""" + decrypted_data = cipher_suite.decrypt(encrypted_data) + return decrypted_data.decode() + +# Example usage +if __name__ == "__main__": + # Validate and sanitize inputs/outputs + prompt = validate_prompt("Example prompt with valid characters") + output = sanitize_output("Example output with and special characters!") + + # Validate training data + validate_training_data("training_data.csv") + + # Encrypt and decrypt sensitive information + sensitive_info = "Sensitive data example" + encrypted_info = encrypt_sensitive_data(sensitive_info) + decrypted_info = decrypt_sensitive_data(encrypted_info) + + print(f"Original: {sensitive_info}, Encrypted: {encrypted_info}, Decrypted: {decrypted_info}") + + +Remember, the effectiveness of security measures greatly depends on the specific context and how they're implemented within your overall security strategy. Continuously update and refine your validation techniques to adapt to new vulnerabilities and threats. diff --git a/data_gathering/data_validation/CIS-CONTROLS.md b/data_gathering/data_validation/CIS-CONTROLS.md new file mode 100644 index 00000000..4cda282d --- /dev/null +++ b/data_gathering/data_validation/CIS-CONTROLS.md @@ -0,0 +1,76 @@ +```python +import re +from typing import List + +# Mock functions to represent the validation checks for each LLM vulnerability +# These functions should be implemented with actual validation logic based on the system's architecture + +def validate_prompt_injection(input_data: str) -> bool: + """ + Validates input data to protect against prompt injection. + Implement custom validation logic based on the system's requirements. + """ + # Example: reject input containing scripting elements or unexpected operators + return not bool(re.search(r'[<>{}();]', input_data)) + +def validate_output_handling(output_data: str) -> bool: + """ + Checks for insecure output handling. + Implement checks to ensure output data does not contain sensitive information. + """ + # Example: ensure output does not contain API keys or personal data + return "API_KEY" not in output_data and "personal_info" not in output_data + +def validate_training_data(data: List[str]) -> bool: + """ + Ensures the integrity of training data to protect against data poisoning. + This could involve checksum verification, source validation, etc. + """ + # Placeholder for actual validation logic + return True + +def validate_dos_protection(system_config: dict) -> bool: + """ + Validates system configuration to minimize the risk of DoS attacks. + This could involve checking network configurations, rate limiting settings, etc. + """ + # Placeholder for actual validation logic + return True + +# Additional validation functions should be implemented for LLM04 to LLM10 + +# Example of using the validation functions +if __name__ == "__main__": + input_data = "" + output_data = "" + training_data = ["data1", "data2"] + system_config = {"config1": "value1"} + + # Perform the validations + if not validate_prompt_injection(input_data): + print("Prompt injection vulnerability detected.") + if not validate_output_handling(output_data): + print("Insecure output handling detected.") + if not validate_training_data(training_data): + print("Training data poisoning vulnerability detected.") + if not validate_dos_protection(system_config): + print("Model Denial of Service vulnerability detected.") + + # Add additional validation checks as necessary + + +### Automated Validation Tools + +For the automated validation of these controls and vulnerabilities, you can leverage several tools, depending on the nature of the system and its architecture: + +1. **Static Code Analysis Tools**: Tools like Bandit (for Python), FindBugs (for Java), and others can automatically detect common security issues in code. + +2. **Dynamic Analysis Tools (DAST)**: Tools like OWASP ZAP or Burp Suite can test running applications for vulnerabilities such as injection attacks, insecure server configurations, and more. + +3. **Dependency Checkers**: Tools like OWASP Dependency-Check can analyze project dependencies for known vulnerabilities, particularly useful for LLM05. + +4. **Security Linters**: Linters like ESLint with security plugin for JavaScript, or Brakeman for Ruby on Rails, can detect insecure coding patterns before they go into production. + +5. **Data Validation Libraries**: For Python, libraries like Pydantic or Cerberus can help in validating input data formats and types, assisting in preventing issues like LLM01. + +Each of these tools can be integrated into your CI/CD pipeline to automate the validation process. Ensure you configure each tool according to the specific needs and architecture of your LLM system to maximize their effectiveness. diff --git a/data_gathering/data_validation/CONTRIBUTING.md b/data_gathering/data_validation/CONTRIBUTING.md new file mode 100644 index 00000000..0705c910 --- /dev/null +++ b/data_gathering/data_validation/CONTRIBUTING.md @@ -0,0 +1,58 @@ +# Contributing to OWASP Top 10 for LLM AI Applications Data Validation + +We welcome contributions from the community! Whether you're looking to fix bugs, add new features, or improve documentation, your help is appreciated. Please follow these guidelines to contribute effectively. + +## Discussion and Collaboration +The main channel for discussion and collaboration is our OWASP Slack channel: `#team-llm-datagathering-methodology` + +We use this channel for regular discussions on the project's methodology, future enhancements, and any issues we're currently facing. It's a great place to ask questions, propose ideas, and collaborate with others who are working on similar problems. + +## Getting Started + +1. **Fork the Repository** + + Start by forking the project repository to your own GitHub account. + +2. **Clone Your Fork** + + Clone your forked repository to your local machine: + +3. **Create a New Branch** + +For each new contribution, create a new branch: + + +## Making Changes + +- Ensure your changes adhere to the project's coding standards and guidelines. +- Write clear, commented code that is easy to understand and maintain. +- If adding new features or scripts, update the documentation accordingly. + +## Testing + +Before submitting your changes, please test your code thoroughly to ensure it works as expected and does not introduce new issues. + +## Submitting Your Contribution + +1. **Commit Your Changes** + +Add your changes to the git staging area, commit them with a clear message describing the changes: + +2. **Push to Your Fork** + +Push your changes to your forked repository: + + +3. **Create a Pull Request** + +Go to the original project repository on GitHub. You should see a prompt to create a pull request from your new branch. Fill in the pull request form with a clear description of your changes and submit. + +## Review Process + +Once submitted, your pull request will be reviewed by the project maintainers. You may receive feedback and requests for changes to your submission. This is a normal part of the review process, and your cooperation and patience are appreciated. + +## Code of Conduct + +Please note that this project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms. + +Thank you for contributing to improving the security and reliability of LLM AI applications! diff --git a/data_gathering/data_validation/CWE.md b/data_gathering/data_validation/CWE.md new file mode 100644 index 00000000..104dbc97 --- /dev/null +++ b/data_gathering/data_validation/CWE.md @@ -0,0 +1,94 @@ +```python +# Basic Data Validation Script for Large Language Models (LLMs) +# This script is designed to provide foundational checks for common vulnerabilities. + +import re + +# LLM01: Prompt Injection (CWE-77, CWE-94) +def validate_prompt(prompt): + """ + Validates the given prompt to prevent injection attacks. + """ + # Example validation to remove potentially dangerous characters or patterns + clean_prompt = re.sub(r'[^\w\s]', '', prompt) + return clean_prompt + +# LLM02: Insecure Output Handling (CWE-79, CWE-116) +def encode_output(output): + """ + Encodes the output to prevent XSS attacks or other output encoding issues. + """ + # Simple HTML encoding example + encoded_output = output.replace('<', '<').replace('>', '>') + return encoded_output + +# LLM03: Training Data Poisoning (CWE-506, CWE-915) +# Note: Validation should occur during the data collection and preparation phase. +def validate_training_data(data): + """ + Validates training data to ensure it's not maliciously crafted. + """ + # Example check for unexpected patterns or malicious content + if "unexpected_pattern" in data: + raise ValueError("Invalid training data detected.") + return True + +# LLM04: Model Denial of Service (CWE-400) +def check_query_limits(query): + """ + Checks if the query exceeds certain limits to prevent DoS attacks. + """ + MAX_LENGTH = 1000 # Example limit + if len(query) > MAX_LENGTH: + raise ValueError("Query exceeds maximum allowed length.") + return True + +# LLM05: Supply-Chain Vulnerabilities (CWE-829, CWE-937) +# Manual review and using trusted libraries are recommended. + +# LLM06: Sensitive Information Disclosure (CWE-200) +def redact_sensitive_info(text): + """ + Redacts sensitive information from the text. + """ + # Example redaction (simple and should be customized) + redacted_text = re.sub(r'\b(account_number|ssn)\b', '[REDACTED]', text, flags=re.IGNORECASE) + return redacted_text + +# LLM07: Insecure Plugin Design (CWE-749, CWE-1203) +# Ensure plugins do not expose dangerous methods or functions directly to end-users. + +# LLM08: Excessive Agency (CWE-807) +# Implement strict checks on inputs used for security decisions. + +# LLM09: Overreliance (CWE-1048) +# Ensure diversification in security mechanisms and checks. + +# LLM10: Model Theft (CWE-494, CWE-1241) +# Protect model artifacts using integrity checks and secure distribution methods. + +# Example usage +prompt = "Example prompt" +clean_prompt = validate_prompt(prompt) +print(f"Cleaned Prompt: {clean_prompt}") + +output = "

This is a header

" +encoded_output = encode_output(output) +print(f"Encoded Output: {encoded_output}") + + +### Recommended Automated Validation Tools + +For enhancing the security of your LLM application, integrating automated validation tools into your CI/CD pipeline is crucial. Here are some tools that can be particularly useful: + +1. **Bandit**: A tool designed to find common security issues in Python code. It's useful for static analysis and can help detect security issues related to the CWEs mentioned. + +2. **OWASP ZAP (Zed Attack Proxy)**: An open-source web application security scanner. While more web-focused, it can be useful for testing web interfaces to LLM applications for issues like XSS (CWE-79) and other web vulnerabilities. + +3. **SonarQube**: Offers comprehensive code quality and security scanning, including detection of vulnerabilities and code smells. + +4. **CodeQL**: GitHub's code scanning tool that uses queries to identify vulnerabilities across multiple languages, including Python. It can be used to automate security checks as part of your GitHub Actions workflows. + +5. **PyTorch/TensorFlow Security Advisories**: For LLMs built on these frameworks, staying updated with the latest security advisories is crucial. Though not tools per se, subscribing to these advisories can help mitigate supply-chain vulnerabilities (CWE-829, CWE-937). + +Each of these tools can be integrated into your development and deployment processes to automatically flag potential security issues, helping adhere to secure coding practices and mitigate vulnerabilities associated with LLMs. diff --git a/data_gathering/data_validation/CycloneDX-SBOM.md b/data_gathering/data_validation/CycloneDX-SBOM.md new file mode 100644 index 00000000..1439c29c --- /dev/null +++ b/data_gathering/data_validation/CycloneDX-SBOM.md @@ -0,0 +1,132 @@ +### Automated Validation Tools Suggestion + +For robust validation and security checks, it's recommended to integrate automated validation tools within your development and deployment pipelines. Here are some tools that can be useful for addressing the vulnerabilities outlined: + +1. **Bandit** - A tool designed to find common security issues in Python code. Useful for general code security checks, including some aspects related to LLM vulnerabilities. + +2. **OWASP ZAP** (Zed Attack Proxy) - For web applications using LLMs, ZAP can help find vulnerabilities in web services and APIs. + +3. **PyTorch/TensorFlow Data Validators** - For ML applications, using TensorFlow Data Validation (TFDV) or similar tools for PyTorch can help ensure the quality and integrity of training data (related to LLM03). + +4. **Sonatype Nexus or OWASP Dependency-Check** - These tools can help identify known vulnerabilities in third-party dependencies (LLM05). + +5. **Snyk** - A tool that can be integrated into the development workflow to monitor and protect against vulnerabilities in dependencies and containers, addressing supply-chain vulnerabilities. + +These tools can be incorporated into CI/CD pipelines to automate the detection of security issues and vulnerabilities, reducing the risk of deploying compromised or insecure LLM applications. + +```markdown +# Data Validation for LLM Applications + +This Python script provides basic structures and checks for addressing the OWASP Top 10 vulnerabilities for Large Language Model (LLM) applications, as outlined in the CycloneDX mapping markdown. + +```python +# Save this script as data_validation.py + +import re +from typing import Any, Dict, List +import rate_limiting_decorator +import pandas as pd + +# Example of a third-party library for rate limiting +# @rate_limiting_decorator.rate_limiter(calls=10, period=1) +def validate_input(input_text: str) -> bool: + """ + Validate input to prevent prompt injection (LLM01). + This function can be extended with more sophisticated checks as needed. + """ + if re.match(r'^[\w\s]+$', input_text): + return True + else: + return False + +def sanitize_output(output_text: str) -> str: + """ + Sanitize output to handle insecure output (LLM02). + Replace this simple example with more comprehensive sanitization as needed. + """ + safe_output = re.sub(r'[<>]', '', output_text) + return safe_output + +def verify_training_data(data_path: str) -> bool: + """ + Check the integrity of training data to mitigate training data poisoning (LLM03). + This is a placeholder function; implement specific checks based on your data. + """ + # Placeholder check: verify data source, structure, or content integrity. + return True + +def rate_limiter(func): + """ + Decorator for rate limiting to prevent model denial of service (LLM04). + Use a more sophisticated rate limiting mechanism for production. + """ + def wrapper(*args, **kwargs): + # Implement rate limiting logic here + print("Rate limiting check passed") + return func(*args, **kwargs) + return wrapper + +@rate_limiter +def model_request_handler(request_data: Any): + """ + Handle model requests with rate limiting. + """ + # Process the request + pass + +def validate_third_party_components(components: List[Dict[str, Any]]) -> bool: + """ + Validate third-party components to mitigate supply-chain vulnerabilities (LLM05). + This function should check the security of components; placeholder logic here. + """ + # Placeholder logic: verify components' sources and versions + return True + +def anonymize_data(data: Any) -> Any: + """ + Anonymize data to prevent sensitive information disclosure (LLM06). + Replace with actual data anonymization as appropriate. + """ + # Placeholder for data anonymization logic + return data + +def validate_plugins(plugins: List[str]) -> bool: + """ + Validate plugin security to address insecure plugin design (LLM07). + Placeholder function; implement specific plugin security validations. + """ + # Placeholder logic: check plugin sources, versions, and known vulnerabilities + return True + +def enforce_agency_limits(decision: Any) -> Any: + """ + Enforce limits on model's agency to prevent excessive agency (LLM08). + Placeholder logic; define and enforce decision-making boundaries. + """ + # Placeholder for enforcing agency limits + return decision + +def enforce_human_oversight(decisions: List[Any]) -> List[Any]: + """ + Ensure human oversight in decision-making to mitigate overreliance (LLM09). + Placeholder function; implement oversight mechanisms. + """ + # Placeholder for enforcing human oversight + return decisions + +def protect_model_access(): + """ + Implement protections against model theft (LLM10). + This function is a placeholder; use appropriate access controls and encryption. + """ + # Placeholder for model protection logic + print("Model access protected") + +# Example usage +if __name__ == "__main__": + input_text = "Example input" + if validate_input(input_text): + print("Input validated") + else: + print("Invalid input") + diff --git a/data_gathering/data_validation/ENISA.md b/data_gathering/data_validation/ENISA.md new file mode 100644 index 00000000..a1905f38 --- /dev/null +++ b/data_gathering/data_validation/ENISA.md @@ -0,0 +1,166 @@ +```python +import re +from typing import Any, Dict, List +import pandas as pd # For handling CSV file operations + +# Assuming a CSV file for LLM data inputs +DATA_CSV_FILE = 'path_to_your_data_file.csv' + +# LLM01: Prompt Injection - Input validation function +def validate_prompt_input(input_text: str) -> bool: + """Basic validation to prevent prompt injection.""" + # Define illegal characters or patterns here + pattern = re.compile(r'[<>{}]') + return not bool(pattern.search(input_text)) + +# LLM02: Insecure Output Handling - Output sanitization function +def sanitize_output(output_text: str) -> str: + """Sanitize output to prevent data leaks.""" + # Implement sanitization logic here + sanitized_text = output_text.replace('', '') + return sanitized_text + +# LLM03: Training Data Poisoning - Data integrity check +def check_data_integrity(data: Dict[str, Any]) -> bool: + """Check integrity of the training data.""" + # Implement integrity check logic (e.g., checksum, hash comparison) + return True + +# LLM04: Model Denial of Service - Resource allocation check +def resource_allocation_check() -> bool: + """Check if resource allocation for the model is within limits.""" + # Implement check for resource usage (e.g., CPU, memory limits) + return True + +# LLM05: Supply-Chain Vulnerabilities - Third-party library check +def third_party_library_check(library_name: str) -> bool: + """Check security of third-party libraries.""" + # Implement checks against a known vulnerabilities database + return True + +# LLM06: Sensitive Information Disclosure - Data encryption check +def data_encryption_check(data: str) -> bool: + """Check if sensitive data is encrypted.""" + # Implement encryption check logic here + return True + +# LLM07: Insecure Plugin Design - Plugin security check +def plugin_security_check(plugin_name: str) -> bool: + """Check security of plugins used by the LLM.""" + # Implement security checks for plugins + return True + +# LLM08: Excessive Agency - Decision-making capability check +def decision_making_capability_check() -> bool: + """Ensure LLM does not exceed intended agency.""" + # Implement checks to limit LLM's decision-making capabilities + return True + +# LLM09: Overreliance - User training verification +def user_training_verification(user_id: int) -> bool: + """Verify if a user has been trained on LLM limitations.""" + # Implement verification logic here + return True + +# LLM10: Model Theft - Access control check +def access_control_check(user_id: int) -> bool: + """Check if access controls are properly implemented.""" + # Implement access control checks here + return True + +# Load and validate data from a CSV file +def load_and_validate_data(csv_file: str) -> List[Dict[str, Any]]: + data = pd.read_csv(csv_file) + validated_data = [] + for index, row in data.iterrows(): + if validate_prompt_input(row['prompt']) and check_data_integrity(row.to_dict()): + validated_data.append(row.to_dict()) + return validated_data + +if __name__ == '__main__': + validated_data = load_and_validate_data(DATA_CSV_FILE) + print(f"Validated Data: {validated_data}") + + +### Recommended Automated Validation Tools + +For the aspects of data validation, protection, compliance, and security assessments highlighted in the script, the following tools and libraries can be highly effective: + +1. **Input Validation and Sanitization**: Use libraries like `cerberus`, `voluptuous`, or even Python's built-in `re` module for regex-based validations. +2. **Data Integrity Checks**: Utilize cryptographic hash libraries such as `hashlib` for generating and comparing checksums or hashes of data. +3. **Resource Allocation and Monitoring**: Tools like `psutil` can help monitor system resources to prevent DoS attacks due to resource exhaustion. +4. **Third-Party Library Security**: Use `Safety` and `Bandit` to check for known vulnerabilities and security issues in dependencies. +5. **Data Encryption**: For implementing encryption, the `cryptography` library offers both high-level and low-level cryptographic primitives. +6. ** illegal_patterns = ['") + print(f"Secured HTML: {secure_html}") + + # Anomaly Detection Example + training_data = np.random.rand(100, 5) # Dummy training data + outlier_indices = detect_anomalies(training_data) + print(f"Outlier Indices: {outlier_indices}") + + # Encrypting/Decrypting Data Example + secret_info = "Sensitive information" + encrypted_info = encrypt_data(secret_info) + print(f"Encrypted Info: {encrypted_info}") + decrypted_info = decrypt_data(encrypted_info) + print(f"Decrypted Info: {decrypted_info}") + +# Additional imports for extended functionalities +import subprocess +import os +import logging + +# LLM07: Secure Plugin Design +def load_secure_plugin(plugin_path): + # Validate and securely load a plugin (mock example) + if not os.path.exists(plugin_path) or not plugin_path.endswith('.py'): + logging.error("Invalid plugin path.") + return False + try: + # Example of securely loading a plugin with restricted capabilities + subprocess.run(['python', plugin_path], check=True, timeout=30) + return True + except Exception as e: + logging.error(f"Failed to load plugin securely: {e}") + return False + +# LLM08: Limiting Excessive Agency +def limit_decision_making(decision_function): + # Mock function to demonstrate limiting decision-making capabilities + def wrapper(*args, **kwargs): + # Insert logic here to limit the decision-making capabilities + # For example, check for user confirmation or implement additional oversight + decision = decision_function(*args, **kwargs) + logging.info(f"Decision made: {decision}. Confirm before proceeding.") + return decision + return wrapper + +@limit_decision_making +def make_decision(data): + # Dummy decision-making function + return "approve" if data else "deny" + +# LLM09: Educating Stakeholders on LLM Capabilities and Limitations +def educate_stakeholders(): + # Mock function to symbolize actions taken to educate stakeholders + print("Educating stakeholders on the capabilities and limitations of LLMs.") + +# LLM10: Secure Model Storage and Transmission +def secure_model_storage(model_data): + # Encrypt model data for secure storage + return encrypt_data(model_data) + +def secure_model_transmission(model_data): + # Encrypt model data for secure transmission + return encrypt_data(model_data) + +# Example Usage +if __name__ == "__main__": + # Secure Plugin Loading Example + if load_secure_plugin('secure_plugin.py'): + print("Plugin loaded securely.") + else: + print("Failed to load plugin securely.") + + # Limit Decision Making Example + decision = make_decision(True) + print(f"Decision: {decision}") + + # Educate Stakeholders + educate_stakeholders() + + # Secure Model Storage and Transmission Example + model_data = "Dummy model data" + secure_storage = secure_model_storage(model_data) + print(f"Model securely stored: {secure_storage}") + secure_transmission = secure_model_transmission(model_data) + print(f"Model securely transmitted: {secure_transmission}") + + +### Implementing the Code +For a practical implementation of the code above: + +- Ensure to replace dummy and mock functionality with actual logic suited to your application's architecture and security requirements. +- The secure plugin loading function (`load_secure_plugin`) and decision-limiting wrapper (`limit_decision_making`) provide basic templates; customize them based on the specific security considerations and operational requirements of your plugins and decision-making processes. +- Regularly review and update your security practices, including encryption methods and key management strategies, to adapt to evolving cybersecurity threats and standards. + +By integrating these practices into your development and security processes, you can address the identified vulnerabilities in LLM applications, enhancing both security and reliability. + + +### Recommended Automated Validation Tools + +For the vulnerabilities addressed, the following automated tools can be integrated into your development and deployment pipelines: + +- **OWASP ZAP**: Automate web application security testing. +- **Bandit**: Integrate into CI/CD pipelines for automatic scanning of Python code for security issues. +- **SQLMap**: Use in automated testing environments to check for SQL injection vulnerabilities. +- **PyCQA/flake8**: Integrate into your development process for continuous code quality and security checks. +- **TensorFlow Privacy**: Use for training machine learning models while preserving privacy, to mitigate the risk of training data poisoning. + +Incorporating these tools and practices will help enhance the security posture of applications using LLMs against the identified vulnerabilities and align with best practices in cybersecurity. diff --git a/data_gathering/data_validation/MITREATT&CK.md b/data_gathering/data_validation/MITREATT&CK.md new file mode 100644 index 00000000..010d1d77 --- /dev/null +++ b/data_gathering/data_validation/MITREATT&CK.md @@ -0,0 +1,88 @@ +```python +# Import necessary libraries +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.ensemble import IsolationForest + +# Load your dataset +# Ensure to adjust the path to where your training data is located +data_path = 'path/to/your/training/data.csv' +data = pd.read_csv(data_path) + +# Data preprocessing steps +# These might include removing irrelevant columns, encoding categorical variables, etc. +# Example: data = data.drop(['irrelevant_column'], axis=1) + +# Splitting the dataset into training and testing set to simulate a realistic scenario +# where the model is validated on unseen data. +X_train, X_test = train_test_split(data, test_size=0.2, random_state=42) + +# Using Isolation Forest for anomaly detection in the training set +# This is useful for identifying potential poisoned data points +clf = IsolationForest(random_state=42, contamination='auto') +clf.fit(X_train) + +# Predictions on the training set +# -1 indicates an outlier, which in this context could be a poisoned data point +outliers_pred = clf.predict(X_train) + +# Filtering out the potential poisoned data points from the training set +X_train_filtered = X_train[outliers_pred == 1] + +print(f"Original training set size: {len(X_train)}") +print(f"Filtered training set size: {len(X_train_filtered)}") +print("Potential poisoned data points have been removed.") + +# Further steps would include retraining your model on X_train_filtered +# and then validating its performance on X_test. + +# Remember, this script is a starting point. Expand it with specific checks +# and balances relevant to your LLM application and data. + +This script is a foundational step towards securing LLMs against training data poisoning (LLM03). It can be adapted and expanded to include validations and mitigations for other vulnerabilities, such as LLM01 (Prompt Injection) by adding input sanitization checks, or LLM06 (Sensitive Information Disclosure) by ensuring sensitive data is encrypted or redacted in the dataset. + +For a comprehensive security posture, integrate this script within a larger pipeline that includes automated tools like Great Expectations for data quality checks and Scikit-learn for more sophisticated data preprocessing and anomaly detection strategies. + + +```python +import numpy as np +import pandas as pd +from sklearn.ensemble import IsolationForest +from sklearn.datasets import make_classification + +# Simulating a dataset with features relevant to LLM training data +# This example generates a dataset with 2 features and 1,000 instances +X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2, + n_redundant=0, weights=[0.99], flip_y=0, random_state=42) + +# Converting the numpy array to a pandas DataFrame for easier manipulation +data = pd.DataFrame(X, columns=['Feature_1', 'Feature_2']) + +# Introducing missing values randomly for demonstration +for _ in range(10): + idx = np.random.choice(data.index) + col = np.random.choice(data.columns) + data.at[idx, col] = np.nan + +# Handling missing values by simple imputation (mean value) +data.fillna(data.mean(), inplace=True) + +# Using Isolation Forest for anomaly detection +# Isolation Forest is effective for identifying outliers in high-dimensional datasets +clf = IsolationForest(random_state=42, contamination=0.01) # Assuming 1% of data is anomalous +clf.fit(data) + +# Predicting outliers in the dataset +outliers_pred = clf.predict(data) + +# Filtering out the potential poisoned data points +filtered_data = data[outliers_pred == 1] + +print(f"Original dataset size: {data.shape[0]}") +print(f"Filtered dataset size: {filtered_data.shape[0]}") +print("Potential poisoned data points have been removed. The dataset is now cleaner and ready for further processing or training.") + + +This script can be directly copied and executed in a Python environment. It demonstrates a fundamental approach to data validation and preprocessing that is essential for maintaining the integrity of training data for LLMs. +This example focuses on anomaly detection and removal, which is crucial for mitigating the risk associated with training data poisoning (LLM03). +To address the full range of vulnerabilities outlined in the OWASP Top 10 for LLMs, consider integrating additional security measures and validations tailored to each specific threat. diff --git a/data_gathering/data_validation/NIST.md b/data_gathering/data_validation/NIST.md new file mode 100644 index 00000000..a4da599c --- /dev/null +++ b/data_gathering/data_validation/NIST.md @@ -0,0 +1,94 @@ +import re +from pydantic import BaseModel, ValidationError, validator +from cryptography.fernet import Fernet +import logging +import numpy as np +from ratelimit import limits, sleep_and_retry +import hashlib +import os + +# Basic setup for logging to monitor behaviors and potential attacks +logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') + +# LLM01: Prompt Injection - Enhanced input validation with Pydantic +class SafePrompt(BaseModel): + prompt: str + + @validator('prompt') + def check_for_injection(cls, v): + # Example of blocking suspicious patterns + if re.search(r"[{}]", v): # Basic check for code injection patterns + raise ValueError("Potential code injection detected.") + return v + +# LLM02: Insecure Output Handling - Enhanced output encoding +def secure_output_html(output): + # This function would be used to escape HTML entities to prevent XSS attacks + return re.sub(r'([<>"\'&])', lambda x: f"&#{ord(x.group(0))};", output) + +# LLM03: Training Data Poisoning - Anomaly Detection +def detect_anomalies_in_data(data): + # Placeholder for a more complex anomaly detection mechanism, possibly using statistical models or ML + logging.info("Anomaly detection placeholder - implement specific logic here.") + return True # Assuming anomaly detection passed for demonstration + +# LLM04: Model Denial of Service (DoS) - Implementing Rate Limiting +@sleep_and_retry +@limits(calls=100, period=60) # Example: Limit to 100 calls per minute +def process_input(input_data): + # Process input data with rate limiting to prevent DoS + logging.info(f"Processing input: {input_data}") + return True + +# LLM05: Supply-Chain Vulnerabilities - Basic Integrity Checks +def verify_third_party_integrity(): + # Placeholder for supply-chain validation logic + # Example: Check hashes of third-party libraries against known secure hashes + logging.info("Supply-chain integrity check placeholder - implement specific validation here.") + +# LLM06: Sensitive Information Disclosure - Implement Encryption for Data at Rest +def encrypt_data(data, key): + f = Fernet(key) + encrypted_data = f.encrypt(data.encode()) + return encrypted_data + +def decrypt_data(encrypted_data, key): + f = Fernet(key) + decrypted_data = f.decrypt(encrypted_data).decode() + return decrypted_data + +# Generate a new encryption key (Do this once and store securely) +# key = Fernet.generate_key() + +# LLM07: Insecure Plugin Design - Plugin Security Review +def plugin_security_review(plugin): + # Placeholder for a security review process for plugins + # This could involve static analysis, dependency checking, etc. + logging.info("Plugin security review placeholder - implement specific checks here.") + +# LLM08: Excessive Agency - Monitoring and Limiting Actions +def limit_agency(actions): + # Placeholder for limiting the agency of LLMs based on predefined rules + logging.info("Limit agency placeholder - implement action limitation logic here.") + +# LLM09: Overreliance - Checks and Balances +def check_overreliance(): + # Implement logic to detect and warn about overreliance on automated systems + logging.warning("Check for overreliance on LLMs.") + +# LLM10: Model Theft - Model Access Controls +def secure_model_access(): + # Placeholder for securing access to models, potentially using access control lists or encryption + logging.info("Model access control placeholder - implement specific security measures here.") + +if __name__ == "__main__": + # Example usage of some functions + try: + safe_prompt = SafePrompt(prompt="Hello, world! {malicious_code}") + except ValidationError as e: + logging.warning(f"Validation error: {e}") + + encrypted_message = encrypt_data("Secret Message", Fernet.generate_key()) + logging.info(f"Encrypted Message: {encrypted_message}") + +This script and the tools listed provide a starting point. Tailoring to your specific LLM's operational and security requirements is essential for effective vulnerability management. diff --git a/data_gathering/data_validation/OPENCRE.md b/data_gathering/data_validation/OPENCRE.md new file mode 100644 index 00000000..b0d95de5 --- /dev/null +++ b/data_gathering/data_validation/OPENCRE.md @@ -0,0 +1,39 @@ +```python +import re + +# Function to read markdown content from a file +def read_markdown_file(file_path): + with open(file_path, 'r', encoding='utf-8') as file: + return file.read() + +# Function to validate the presence of required sections in the markdown content +def validate_opencre_markdown(content): + # Define the regex pattern for required sections (LLM01 through LLM10) + section_pattern = re.compile(r'### LLM0[1-9]:|### LLM10:') + + # Find all matches + matches = section_pattern.findall(content) + + # Count of expected sections + expected_sections = 10 + found_sections = len(matches) + + # Check if all expected sections are found + if found_sections == expected_sections: + print("Success: All expected sections (LLM01 through LLM10) were found.") + else: + missing_count = expected_sections - found_sections + print(f"Warning: Expected {expected_sections} sections, found {found_sections}.") + print(f"{missing_count} sections are missing or not correctly labeled.") + +# Example usage +markdown_file_path = 'opencre_mapping.md' # Replace with your actual file path +markdown_content = read_markdown_file(markdown_file_path) +validate_opencre_markdown(markdown_content) + + + +### Note on Automated Validation Tools +For validating structured data derived from the markdown content or other aspects of your project where structured data validation is applicable, tools like Great Expectations, Pydantic, or Cerberus can be highly effective. They require structured data inputs, so additional preprocessing would be necessary to convert markdown sections into a format these tools can work with, such as JSON or a Python dictionary. + +Great Expectations is particularly well-suited for complex validation scenarios and can integrate with data pipelines for continuous data quality checks. Pydantic excels in data parsing and validation through Python type annotations, offering a robust solution for ensuring data types and formats meet expected criteria. Cerberus provides a flexible and lightweight option for validating Python data structures against a customizable schema, useful for a variety of validation tasks. diff --git a/data_gathering/data_validation/README.md b/data_gathering/data_validation/README.md new file mode 100644 index 00000000..6a61eaa7 --- /dev/null +++ b/data_gathering/data_validation/README.md @@ -0,0 +1,41 @@ +# Step 3: Data Validation and Quality Control Methodology + +## Overview +The intent of this phase is to establish a robust Data Validation and Quality Control (QC) framework for assessing and ensuring the integrity and accuracy of the research data on the OWASP Top 10 for LLM AI Applications. This step is crucial for maintaining the reliability of our findings and ensuring that the data reflects real-world vulnerabilities accurately. + +## Objectives +- **Define Key Performance Indicators (KPIs)**: Establish both qualitative and quantitative measures that will guide the validation process. +- **Develop Validation Tools**: Create scripts and utilize automated tools to systematically verify the data against our KPIs. +- **Ensure Data Quality**: Implement a QC process to identify and rectify any inconsistencies, inaccuracies, or biases in the dataset. + +## What to Expect with the Codes +The provided Python scripts serve as a baseline for conducting data validation and quality control. These scripts are designed to be adaptable to various datasets and environments, depending on the specific needs of the research. Below is a brief overview of what to expect: + +### Validation Scripts +- **Data Consistency Checks**: Scripts to verify that the data is consistent across different sources and timeframes. +- **Accuracy Tests**: Tools to compare sampled data against trusted benchmarks or manual checks to assess accuracy. +- **Completeness Checks**: Automated checks to ensure that the dataset is complete and all expected data points are present. + +### Quality Control Tools +- **Automated Anomaly Detection**: Scripts to identify outliers or anomalies in the data that may indicate errors or inconsistencies. +- **Bias Detection**: Tools to assess the dataset for any potential biases that could affect the research outcomes. + +### Adapting the Code +It is expected that the provided scripts will need to be adapted for each specific research environment and dataset. This adaptation may involve: + +- **Parameter Tuning**: Adjusting thresholds, weights, or other parameters within the scripts to better fit the specific data characteristics. +- **Custom Checks**: Adding or modifying checks and validations to address unique aspects of the data or research objectives. +- **Integration with Other Tools**: Modifying the scripts to work seamlessly with other tools or platforms used in the research process. + +## Getting Started +To begin using the validation and QC scripts, please ensure you have the following prerequisites installed: + +- Python 3.8 or higher +- Necessary Python libraries as listed in `each md file` + +Follow the setup instructions in `SETUP.md` to configure your environment and adapt the scripts to your dataset. For detailed documentation on each script and how to customize it, refer to the `md` files. + +## Contribution +Your contributions are welcome! If you have suggestions for improving the validation and QC methodology, please open an issue or submit a pull request with your proposed changes. For more information on contributing, please see `CONTRIBUTING.md`. + +Thank you for your interest in improving the security and reliability of LLM AI applications. diff --git a/data_gathering/data_validation/SAMM.md b/data_gathering/data_validation/SAMM.md new file mode 100644 index 00000000..50e379be --- /dev/null +++ b/data_gathering/data_validation/SAMM.md @@ -0,0 +1,75 @@ +```python +# Data Validation Script for LLMs based on OWASP Top 10 SAMM Mapping + +import re +from typing import List + +# LLM01: Prompt Injection +def validate_prompt(prompt: str) -> bool: + """ + Validate input prompt to prevent injection attacks. + """ + # Simple validation to check for malicious patterns; expand as needed + if re.search(r"[^\w\s]", prompt): + return False + return True + +# LLM02: Insecure Output Handling +def encode_output(output: str) -> str: + """ + Encode output to ensure safe handling. + """ + # Example: HTML encode to prevent XSS; adjust encoding as needed for your use case + return output.replace("<", "<").replace(">", ">") + +# LLM03: Training Data Poisoning +# Automated tool recommendation: Use data validation tools or custom scripts to verify data source integrity + +# LLM04: Model Denial of Service (DoS) +# Automated tool recommendation: Implement monitoring and alerting using tools like Prometheus or Datadog + +# LLM05: Supply-Chain Vulnerabilities +# Automated tool recommendation: Use dependency check tools like OWASP Dependency-Check + +# LLM06: Sensitive Information Disclosure +def classify_and_protect_data(data: str) -> str: + """ + Classify data and apply protection mechanisms like encryption. + """ + # Placeholder for data classification and encryption logic + return data # This should be encrypted based on classification + +# LLM07: Insecure Plugin Design +# Automated tool recommendation: Static code analysis tools like SonarQube for secure plugin development + +# LLM08: Excessive Agency +# Note: This is more about design and governance, less about direct code validation + +# LLM09: Overreliance +# Note: Addressed through education and proper use guidance, not directly through validation scripts + +# LLM10: Model Theft +# Automated tool recommendation: Implement robust access controls and use tools for auditing like AWS CloudTrail + +# Example usage +prompt = "Select * from users where user='admin';" +print("Prompt validation:", validate_prompt(prompt)) +print("Encoded output:", encode_output("")) + + +### Automated Validation Tools Recommendation + +For each vulnerability area, certain tools can enhance the automation and effectiveness of your validation processes: + +- **LLM01: Prompt Injection**: Input validation libraries or frameworks specific to your programming environment. +- **LLM02: Insecure Output Handling**: Libraries that automatically encode or sanitize output, such as OWASP's Java Encoder for Java applications. +- **LLM03: Training Data Poisoning**: Data validation tools, and machine learning datasets integrity verification tools. +- **LLM04: Model Denial of Service (DoS)**: Monitoring and alerting tools like Prometheus, Datadog, or CloudWatch. +- **LLM05: Supply-Chain Vulnerabilities**: OWASP Dependency-Check, Snyk, or Dependabot for identifying vulnerable dependencies. +- **LLM06: Sensitive Information Disclosure**: Data classification and encryption tools, such as AWS KMS or Azure Key Vault. +- **LLM07: Insecure Plugin Design**: Static code analysis tools like SonarQube, Coverity, or CodeQL to identify security flaws. +- **LLM08: Excessive Agency**: Not directly applicable for automated tools but requires design reviews and ethical guidelines. +- **LLM09: Overreliance**: Not directly applicable for automated tools; addressed through user education and system design. +- **LLM10: Model Theft**: Access control mechanisms and auditing tools like AWS CloudTrail or Google Cloud Audit Logs. + +This script and the tools listed provide a starting point. Tailoring to your specific LLM's operational and security requirements is essential for effective vulnerability management. diff --git a/data_gathering/data_validation/SETUP.md b/data_gathering/data_validation/SETUP.md new file mode 100644 index 00000000..bb118f0e --- /dev/null +++ b/data_gathering/data_validation/SETUP.md @@ -0,0 +1,61 @@ +# Setup Instructions + +This document provides detailed instructions for setting up your environment to use the data validation and quality control scripts for the OWASP Top 10 for LLM AI Applications project. Follow these steps to prepare your system for running the provided Python scripts. + +## Prerequisites + +Ensure you have the following installed on your system before proceeding: + +- Python 3.8 or higher: [Download Python](https://www.python.org/downloads/) +- pip: Should be installed with Python 3.4 or higher + +## Installation + +1. **Clone the Repository** + + First, clone the project repository to your local machine using Git: + +2. **Create a Virtual Environment** + +Navigate to the project directory and create a virtual environment: + + +3. **Activate the Virtual Environment** + +- On Windows: + + ``` + .\venv\Scripts\activate + ``` + +- On Unix or MacOS: + + ``` + source venv/bin/activate + ``` + +4. **Install Required Libraries** + +Install all the necessary Python libraries as specified in `requirements.txt`: + + +## Adapting the Scripts + +To adapt the validation and quality control scripts to your specific dataset and environment, follow these guidelines: + +- Review each script in the `src` directory to understand its functionality. +- Modify parameters, thresholds, and checks within each script as needed to suit your data characteristics. +- For adding custom validation checks, refer to the script documentation in the `docs` folder for guidance on extending script capabilities. + +## Running the Scripts + +To run a specific script, use the following command from the project root directory: + + +Replace `yourscriptname.py` with the name of the script you wish to run. Ensure your dataset files are placed in the designated input directories as per the script documentation. + +## Troubleshooting + +If you encounter any issues during setup or while running the scripts, please open an issue on the project GitHub page. + +Thank you for contributing to the security and reliability of LLM AI applications. diff --git a/data_gathering/data_validation/STRIDE.md b/data_gathering/data_validation/STRIDE.md new file mode 100644 index 00000000..8d20a920 --- /dev/null +++ b/data_gathering/data_validation/STRIDE.md @@ -0,0 +1,74 @@ +```python +# Import necessary libraries +import re +from typing import Any, Dict + +def validate_prompt(prompt: str) -> bool: + """Validate LLM01: Prompt Injection.""" + # Example: Simple check to avoid command-like inputs; customize as needed + if re.search(r"sudo|rm -rf|:", prompt): + return False + return True + +def validate_output(output: str) -> bool: + """Validate LLM02: Insecure Output Handling.""" + # Example: Ensure no sensitive info leaks; customize based on expected output + sensitive_keywords = ["password", "ssn"] + return not any(keyword in output for keyword in sensitive_keywords) + +def validate_training_data(data: Any) -> bool: + """Validate LLM03: Training Data Poisoning.""" + # Example validation: Check for anomalies or unexpected patterns + # This is highly model and data-specific + return True # Placeholder for actual validation logic + +def check_for_dos(input_data: Any) -> bool: + """Validate LLM04: Model Denial of Service.""" + # Example: Check for excessively large inputs or complex queries + if len(str(input_data)) > 10000: # Arbitrary limit + return False + return True + +def validate_supply_chain(component: Dict[str, Any]) -> bool: + """Validate LLM05: Supply-Chain Vulnerabilities.""" + # Example: Check component's integrity, e.g., via checksums or trusted sources + return True # Placeholder for actual validation logic + +def check_sensitive_info(output: str) -> bool: + """Validate LLM06: Sensitive Information Disclosure.""" + # Reuse LLM02's method as a starting point; refine as needed + return validate_output(output) + +def validate_plugin(plugin_config: Dict[str, Any]) -> bool: + """Validate LLM07: Insecure Plugin Design.""" + # Example: Ensure only allowed plugins can be loaded + allowed_plugins = ["safe_plugin", "trusted_analysis"] + return plugin_config.get("name") in allowed_plugins + +def check_excessive_agency(action: Dict[str, Any]) -> bool: + """Validate LLM08: Excessive Agency.""" + # Example: Verify actions are within predefined limits or scopes + return action.get("scope") in ["read", "write limited"] + +def validate_overreliance(use_case: str) -> bool: + """Validate LLM09: Overreliance.""" + # This validation is more about process and review than scriptable checks + return True + +def check_model_theft(model_details: Dict[str, Any]) -> bool: + """Validate LLM10: Model Theft.""" + # Example: Check if the model access is from unauthorized sources + return model_details.get("access") == "authorized" + + +### Recommended Automated Validation Tools + +For each STRIDE category, there are tools and libraries that can help automate the validation process: + +- **SAST (Static Application Security Testing)**: Tools like Bandit for Python can analyze code to find common security issues. +- **DAST (Dynamic Application Security Testing)**: OWASP ZAP can dynamically analyze running applications for vulnerabilities. +- **IAST (Interactive Application Security Testing)**: Tools like Contrast Security integrate with applications to detect vulnerabilities during runtime. +- **RASP (Runtime Application Self-Protection)**: RASP tools can protect applications from vulnerabilities in real-time, useful for mitigating risks like LLM04 and LLM07. +- **Input Validation Libraries**: Libraries like Cerberus or Marshmallow for Python can validate input data against a predefined schema, helpful for preventing issues like LLM01, LLM02, and LLM06. + +This script and the tools mentioned are starting points. Actual implementations should be tailored to specific use cases, data types, and security requirements. diff --git a/data_gathering/literature/CONTRIBUTING.md b/data_gathering/literature/CONTRIBUTING.md new file mode 100644 index 00000000..6773aae7 --- /dev/null +++ b/data_gathering/literature/CONTRIBUTING.md @@ -0,0 +1,52 @@ +# Contributing to OWASP Top 10 for LLM - Literature Review + +We welcome contributions from everyone who is interested in improving and expanding the literature review of the OWASP Top 10 vulnerabilities in Language Learning Models (LLMs). Here is how you can contribute. + +## Types of Contributions + +### Reporting Bugs or Issues + +If you encounter a problem with the dataset or scripts, or if you have a suggestion for improving them, please open an issue on this GitHub repository with a clear title and a detailed description. Tag the issue with either `bug`, `enhancement`, or `question` to help maintainers triage it appropriately. + +### Suggesting Enhancements + +This project is open to suggestions for enhancements. This can include new features, changes to existing functionalities, or improvements in the documentation. Open an issue to suggest enhancements, providing as much context and detail as possible. + +### Pull Requests + +Here is a quick guide on how to submit a pull request (PR): + +1. Fork the repository to your own GitHub account. +2. Clone the forked repository to your machine. +3. Create a new branch for your changes. +4. Make your changes on your branch. +5. Push your branch to your GitHub repository. +6. Submit a pull request to the main repository from your fork and branch. +7. Wait for a maintainer to review your PR, and be open to any further discussions or requests for changes. + +**Note:** Before submitting a pull request, please make sure to check that your changes do not break any existing functionality and that all code conforms to the project's coding standards. + +### Data Contributions + +If you have access to relevant literature or data that is not currently in the database, we encourage you to contribute. Please ensure that the data is reliable and appropriately sourced. + +### Documentation + +Improvements to documentation, whether it's a typo fix or an entirely new section, are greatly appreciated. Your documentation changes are more likely to be accepted quickly if they are clear, concise, and targeted. + +## Discussion and Collaboration + +The main channel for discussion and collaboration is on our Slack channel: [#team-llm-datagathering-methodology](https://owasp.slack.com/archives/C05P16PKD7W) + + +We use this channel for regular discussions on the project's methodology, future enhancements, and any issues we're currently facing. It's a great place to ask questions, propose ideas, and collaborate with others who are working on similar problems. + +## Code of Conduct + +We have a Code of Conduct that all contributors are expected to adhere to. This outlines our expectations for participant behavior as well as the consequences for unacceptable behavior. + +## Questions? + +If you have any questions, please feel free to ask on the GitHub issues or directly on the Slack channel ([#team-llm-datagathering-methodology](https://owasp.slack.com/archives/C05P16PKD7W)). + +Thank you for contributing to the OWASP Top 10 for LLM - Literature Review! diff --git a/data_gathering/literature/LICENSE b/data_gathering/literature/LICENSE new file mode 100644 index 00000000..261eeb9e --- /dev/null +++ b/data_gathering/literature/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/data_gathering/literature/README.md b/data_gathering/literature/README.md new file mode 100644 index 00000000..e9e070a4 --- /dev/null +++ b/data_gathering/literature/README.md @@ -0,0 +1,87 @@ +# OWASP Top 10 for LLM - Literature Review + +This repository is dedicated to the literature review of the OWASP (Open Web Application Security Project) Top 10 vulnerabilities as they pertain to Language Learning Models (LLMs). Our goal is to collect, categorize, and analyze academic papers, articles, and any form of literature that addresses these vulnerabilities in the context of LLMs. + +## Project Description + +The OWASP Top 10 is an awareness document that represents a broad consensus about the most critical security risks to web applications. Our focus extends this to the burgeoning field of LLMs, where security and reliability are of paramount importance. The literature review is a structured collection of all the relevant data to help researchers, developers, and security experts understand and address these risks. + +The repository is structured as follows: +- `categorized_papers.csv`: A CSV file containing categorized literature details. +- `scripts/`: This directory contains all the Python scripts used for categorization. + +## Categorization Scripts + +The literature is categorized by several aspects, such as research methods, focus areas, topics/themes, geographical focus, and temporal focus. The following Python code is used to categorize the literature based on its content. + +### Research Methods + +```python +# The code block for categorize_research_methods +def categorize_research_methods(text): + categories = [] + text = str(text).lower() if text is not None else '' + if 'case study' in text: + categories.append('Case Studies') + if 'interview' in text: + categories.append('Interviews') + if 'ethnography' in text: + categories.append('Ethnography') + if 'content analysis' in text: + categories.append('Content Analysis') + if 'survey' in text or 'questionnaire' in text: + categories.append('Surveys and Questionnaires') + if 'experiment' in text: + categories.append('Experiments') + if 'statistical analysis' in text: + categories.append('Statistical Analysis') + if any(method in categories for method in ['Case Studies', 'Interviews', 'Ethnography', 'Content Analysis']) and any(method in categories for method in ['Surveys and Questionnaires', 'Experiments', 'Statistical Analysis']): + categories.append('Mixed Methods') + return ', '.join(categories) + +def categorize_focus_areas(text): + categories = [] + text = str(text).lower() if text is not None else '' + if 'risk assessment' in text: + categories.append('Risk Assessment') + if 'expert opinion' in text: + categories.append('Expert Opinions') + if 'technology assessment' in text: + categories.append('Technological Assessments') + if 'policy' in text or 'regulation' in text: + categories.append('Policy and Regulation') + return ', '.join(categories) + +def categorize_topics_themes(text): + categories = [] + text = str(text).lower() if text is not None else '' + if 'llm security' in text: + categories.append('LLM Security') + if 'industry application' in text: + categories.append('Industry Applications') + if 'emerging threat' in text: + categories.append('Emerging Threats') + if 'solution' in text or 'mitigation' in text: + categories.append('Solutions and Mitigations') + return ', '.join(categories) + +def categorize_geographical_focus(text): + categories = [] + text = str(text).lower() if text is not None else '' + if 'global' in text: + categories.append('Global') + if 'regional' in text or any(region in text for region in ['asia', 'europe', 'america', 'africa']): + categories.append('Regional') + return ', '.join(categories) + +def categorize_temporal_focus(text): + categories = [] + text = str(text).lower() if text is not None else '' + if 'historical' in text: + categories.append('Historical Analyses') + if 'current' in text or 'present' in text: + categories.append('Current Issues') + if 'future' in text or 'prediction' in text: + categories.append('Future Predictions') + return ', '.join(categories) + diff --git a/data_gathering/literature/categorized_papers.csv b/data_gathering/literature/categorized_papers.csv new file mode 100644 index 00000000..6ee90bc1 --- /dev/null +++ b/data_gathering/literature/categorized_papers.csv @@ -0,0 +1,799 @@ +Position,Title,Author,Description,Article link,PDF Link,Research Methods,Focus Areas,Topics or Themes,Geographical Focus,Temporal Focus +1,Towards Automatic Mapping of Vulnerabilities to Attack Patterns using Large Language Models,"SS Das, A Dutta, S Purohit, E Serra… - … on Technologies for …, 2022 - ieeexplore.ieee.org","… To the best of our knowledge, this is the first work to propose a complete mapping of CVE-CWE-CAPEC +in an automated manner using large language models. We therefore believe …",https://ieeexplore.ieee.org/abstract/document/10025459/,,,,,, +2,Tracing capec attack patterns from cve vulnerability information using natural language processing technique,"K Kanakogi, H Washizaki, Y Fukazawa, S Ogata… - 2021 - scholarspace.manoa.hawaii.edu","… easily identify CAPEC-IDs that are mapping candidates and assist in the mapping process. +… of a topic model [11], [12]. Although we performed a simple natural language process, the …",https://scholarspace.manoa.hawaii.edu/handle/10125/71462,https://scholarspace.manoa.hawaii.edu/bitstreams/164f1948-02b1-425b-8001-cc0ff8eb926c/download,,,,, +3,Tracing cve vulnerability information to capec attack patterns using natural language processing techniques,"K Kanakogi, H Washizaki, Y Fukazawa, S Ogata… - Information, 2021 - mdpi.com","… mapped CVE to CAPEC and ATT&CK in order to find appropriate mitigation [6]. They created +a neural network model … When tracing the relationship between CVE–CWE and CWE–…",https://www.mdpi.com/2078-2489/12/8/298,https://www.mdpi.com/2078-2489/12/8/298/pdf,,,Solutions and Mitigations,, +4,V2w-bert: A framework for effective hierarchical multiclass classification of software vulnerabilities,"SS Das, E Serra, M Halappanavar… - 2021 IEEE 8th …, 2021 - ieeexplore.ieee.org","… Abstract—We consider the problem of automating the mapping … BERT language model is +further fine-tuned with CVE/CWE … For a single CVE, we create 34 CVE-CWE pairs and get the …",https://ieeexplore.ieee.org/abstract/document/9564227/,https://arxiv.org/pdf/2102.11498,,,,, +5,VWC-BERT: Scaling Vulnerability–Weakness–Exploit Mapping on Modern AI Accelerators,"SS Das, M Halappanavar, A Tumeo… - … on Big Data (Big …, 2022 - ieeexplore.ieee.org","… demonstrate higher accuracy using a larger language model, RoBERTa-Large. We show up +to 87% … Similar to the CVE-CWE mapping model, we map CWEs to CAPECs. CAPECs are …",https://ieeexplore.ieee.org/abstract/document/10020622/,,,,,, +6,On the Usage of NLP on CVE Descriptions for Calculating Risk,"T Giannakopoulos, K Maliatsos - conferences.ds.unipi.gr","… map and consequently train our model with the specific threats. … large enough set to use +for training and validation, was deemed substantial. Through the CitySCAPE project, a mapping …",https://conferences.ds.unipi.gr/cybericps2023/assets/papers/9.pdf,https://conferences.ds.unipi.gr/cybericps2023/assets/papers/9.pdf,,,,, +7,Not The End of Story: An Evaluation of ChatGPT-Driven Vulnerability Description Mappings,"X Liu, Y Tan, Z Xiao, J Zhuge… - Findings of the Association …, 2023 - aclanthology.org","… language processing (NLP) technology, large models have … ) is a closed-source large language +model (LLM), and it is … very few papers working on CVECWE mappings, we only use CVE…",https://aclanthology.org/2023.findings-acl.229/,https://aclanthology.org/2023.findings-acl.229.pdf,,,,, +8,Threat Categorization On CVE Descriptions Using Text Classification,G Thrasyvoulos - 2022 - search.proquest.com,"… threats in CVEs and annotating a large enough set to use for … mapping used for this thesis +was a snapshot of the mapping … The mapping uses a combination of CVE - CWE and CWE - …",https://search.proquest.com/openview/75d81820153562e945646cbd66092b40/1?pq-origsite=gscholar&cbl=2026366&diss=y,,,,,, +9,Threat categorization on CVE descriptions using text classification,T Giannakopoulos - 2022 - dione.lib.unipi.gr,"… threats in CVEs and annotating a large enough set to use for … mapping used for this thesis +was a snapshot of the mapping … The mapping uses a combination of CVE - CWE and CWE - …",https://dione.lib.unipi.gr/xmlui/handle/unipi/14541,https://dione.lib.unipi.gr/xmlui/bitstream/handle/unipi/14541/Giannakopoulos_Thrasyvoulos.pdf?sequence=1,,,,, +10,Threatzoom: neural network for automated vulnerability mitigation,"E Aghaei, E Al-Shaer - Proceedings of the 6th Annual Symposium on Hot …, 2019 - dl.acm.org","… These properties of the existing CVE-CWE dataset impose serious … Mapping a vulnerability +to CWE leads to understanding … Leveraging deep learning, we are mapping CWEs to CAPEC …",https://dl.acm.org/doi/abs/10.1145/3314058.3318167,,,,,, +11,The use of NLP techniques in static code analysis to detect weaknesses and vulnerabilities,"SA Mokhov, J Paquet, M Debbabi - … AI 2014, Montréal, QC, Canada, May …, 2014 - Springer","… We use language models to learn and classify such a code. … if our first estimate of a CVE/CWE +is incorrect, the next one in … directly (ie, no direct mapping from CWE to CVE exists unlike …",https://link.springer.com/chapter/10.1007/978-3-319-06483-3_33,,,,,, +12,A Cybersecurity Knowledge Graph Completion Method Based on Ensemble Learning and Adversarial Training,"P Wang, J Liu, D Hou, S Zhou - Applied Sciences, 2022 - mdpi.com","… to map models to different complex vector spaces for modeling in … language models and +the target is large, the experimental effect is limited. Other knowledge graph completion models […",https://www.mdpi.com/2076-3417/12/24/12947,https://www.mdpi.com/2076-3417/12/24/12947,Experiments,,,, +13,Clustering Software Vulnerabilities Using Self-Organizing Maps: Observations and Analysis,K Panchal - 2022 - search.proquest.com,"… language model. In the second step, also referred to as the Link prediction (LP) component, +the trained BERT model transforms CVE/CWE … The original vectored data set was huge (…",https://search.proquest.com/openview/12f7f44cebacd62b51019e88453d7ab1/1?pq-origsite=gscholar&cbl=18750&diss=y,https://rex.libraries.wsu.edu/view/pdfCoverPage?instCode=01ALLIANCE_WSU&filePid=13366119630001842&download=true,,,,,Future Predictions +14,Is github's copilot as bad as humans at introducing vulnerabilities in code?,"O Asare, M Nagappan, N Asokan - Empirical Software Engineering, 2023 - Springer","… language models as they relate to neural code completion. We first discuss a brief history +of language models … was based on samples obtained from the Big-Vul dataset curated and …",https://link.springer.com/article/10.1007/s10664-023-10380-1,https://arxiv.org/pdf/2204.04741,,,,, +15,MARFCAT: Fast code analysis for defects and vulnerabilities,"SA Mokhov, J Paquet, M Debbabi - 2015 IEEE 1st International …, 2015 - ieeexplore.ieee.org","… tracks comprising CVE-selected cases, stand-alone cases, and large … // Construct an index +mapping CVEs to files and locations … language models per default MARF specification ([14]) 3 …",https://ieeexplore.ieee.org/abstract/document/7070488/,,,,,, +16,Exploring the security awareness of the python and javascript open source communities,"G Antal, M Keleti, P Hegedŭs - … of the 17th International Conference on …, 2020 - dl.acm.org","… We found that there is a large intersection in the vulnerability … step we could not find any +CVE/CWE IDs in their commit messages… These revisions were mapped directly to the referenced …",https://dl.acm.org/doi/abs/10.1145/3379597.3387513,https://arxiv.org/pdf/2006.13652,,,,, +17,Global monitor using spatiotemporally correlated local monitors,"G Yadav, K Paul - 2021 IEEE 20th International Symposium on …, 2021 - ieeexplore.ieee.org","… In recent years, neural language models based on recurrent … ID, we use CVE-CWE-CAPEC +mapping, CAPEC description and … size decreases by a large extent. We show an example for …",https://ieeexplore.ieee.org/abstract/document/9685330/,,,,,,Current Issues +18,Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach,"T Wang, S Qin, KP Chow - 2021 IEEE 24th International …, 2021 - ieeexplore.ieee.org","… the English language model for both BERTBASE model with 12 … vastly alleviates the workload +of NVD staff on CVECWE … Lastly, due to data imbalance issue and large category amount, …",https://ieeexplore.ieee.org/abstract/document/9724608/,,,,,, +19,The SEPSES knowledge graph: An integrated resource for cybersecurity,"E Kiesling, A Ekelhart, K Kurniawan… - International Semantic …, 2019 - Springer","… mappings into the structure of the final ontology. Initially, we developed RDF Mapping +Language (RML) transformation mappings … Setting: We use a large data set collected during the …",https://link.springer.com/chapter/10.1007/978-3-030-30796-7_13,https://link.springer.com/chapter/10.1007/978-3-030-30796-7_13,,,,, +20,Constructing a “common cross site scripting vulnerabilities enumeration (cxe)” using cwe and cve,"K Sivakumar, K Garg - … : Third International Conference, ICISS 2007, Delhi …, 2007 - Springer","… By organizing these errors into a simple taxonomy and mapping CVE with CWE, we have +constructed … By using CVE-CWE-CXE-based relationships, a high quality collection of sample …",https://link.springer.com/chapter/10.1007/978-3-540-77086-2_25,,,,,, +21,Predicting entity relations across different security databases by using graph attention network,"L Yuan, Y Bai, Z Xing, S Chen, X Li… - 2021 IEEE 45th Annual …, 2021 - ieeexplore.ieee.org","… After activated by ReLU, we concatenate multiple feature maps of the same dimension by … +model with huge parameters. The DISTMULT [41], an improved version of RESCAL, adopted …",https://ieeexplore.ieee.org/abstract/document/9529668/,https://sen-chen.github.io/img_cs/pdf/compsac2021_predicting_entity_relation.pdf,,,,, +22,PenQuest: a gamified attacker/defender meta model for cyber security assessment and education,"R Luh, M Temper, S Tjoa, S Schrittwieser… - Journal of Computer …, 2020 - Springer","… , CVE/CWE and NIST SP 800-53. Attack patterns, vulnerabilities, and mitigating controls are +mapped … While this offers flexibility and is suitable for dedicated workshops comprising large …",https://link.springer.com/article/10.1007/s11416-019-00342-x,https://link.springer.com/article/10.1007/s11416-019-00342-x,,,,, +23,Comparative Evaluation of NLP-Based Approaches for Linking CAPEC Attack Patterns from CVE Vulnerability Information,"K Kanakogi, H Washizaki, Y Fukazawa, S Ogata… - Applied Sciences, 2022 - mdpi.com","… on mapping of ATT&CK [14] and CVE [15,16,17] has intensified. The approach in this study +should be applicable to map … This low accuracy is due to the CVE-CWE link. First, some of the …",https://www.mdpi.com/2076-3417/12/7/3400,https://www.mdpi.com/2076-3417/12/7/3400,,,,, +24,"Identification and Assessment of Security Attacks and Vulnerabilities, utilizing CVE, CWE and CAPEC",C Grigoriádis - 2019 - search.proquest.com,"… Furthermore, the CVE-CWE connection table provided by NIST is queried for connections. +If such a connection exists, the search goes on in the CWE table in order to provide a …",https://search.proquest.com/openview/076918e378eafe21f49a6781f9152d18/1?pq-origsite=gscholar&cbl=2026366&diss=y,,,,,, +25,Integrating Heterogeneous Security Knowledge Sources for Comprehensive Security Analysis,"G Wang, T Li, H Yue, Z Yang… - 2021 IEEE 45th Annual …, 2021 - ieeexplore.ieee.org","… difficult to comprehensively analyze security of such large-scale systems, which is a … +The third part is data mapping and linking, where we integrate relational structured data into …",https://ieeexplore.ieee.org/abstract/document/9529428/,,,,,, +26,Current challenges of cyber threat and vulnerability identification using public enumerations,"L Sadlek, P Čeleda, D Tovarňák - Proceedings of the 17th International …, 2022 - dl.acm.org","… Based on these results, we conclude that mapping of enumeration entries may not accurately +determine related parts of the information based on enumeration entries observed in the …",https://dl.acm.org/doi/abs/10.1145/3538969.3544458,https://arxiv.org/pdf/2206.14539,,,,, +27,Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs,"Z Shi, N Matyunin, K Graffi, D Starobinski - arXiv preprint arXiv:2305.00632, 2023 - arxiv.org","… levels of a system, CVE-CWE associations assist in mapping … In order to make the results +comparable, we test a large … CVE-CWE triples, and we evaluate how well the embedding …",https://arxiv.org/abs/2305.00632,https://arxiv.org/pdf/2305.00632,,,,, +28,Automatic analysis and reasoning based on vulnerability knowledge graph,"S Qin, KP Chow - Cyberspace Data and Intelligence, and Cyber-Living …, 2019 - Springer","… For cybersecurity community, obtaining and managing a large amount of high-quality … is +valuable in describing security standards and mapping other security-domain ontologies. The …",https://link.springer.com/chapter/10.1007/978-981-15-1922-2_1,,,,,, +29,"Identification and assessment of security attacks and vulnerabilities, utilizing CVE, CWE and CAPEC",Χ Γρηγοριάδης - 2019 - dione.lib.unipi.gr,"… Furthermore, the CVE-CWE connection table provided by NIST is queried for connections. +If such a connection exists, the search goes on in the CWE table in order to provide a …",https://dione.lib.unipi.gr/xmlui/handle/unipi/12252,https://dione.lib.unipi.gr/xmlui/bitstream/handle/unipi/12252/Grigoriadis_mpsp17015.pdf?sequence=1&isAllowed=y,,,,, +30,Security automation and threat information-sharing options,"P Kampanakis - IEEE Security & Privacy, 2014 - ieeexplore.ieee.org","… IF-MAP is an example of a publish/subscribe model that defines messages and transport +between entities. IF-MAP messages containing data described by the IF-MAP data model, …",https://ieeexplore.ieee.org/abstract/document/6924671/,,,,,, +31,A mining approach to obtain the software vulnerability characteristics,"X Li, J Chen, Z Lin, L Zhang, Z Wang… - … Cloud and Big Data …, 2017 - ieeexplore.ieee.org","… The feature extraction technique employed in this study is the semantic model approach. +This model allows you to select features based on their plausibility and usefulness, meaning …",https://ieeexplore.ieee.org/abstract/document/8026953/,,,,,, +32,Towards Automatically Connecting IoT Devices with Vulnerabilities in the Wild,"J Song, S Wan, M Huang, J Liu, L Sun… - ACM Transactions on …, 2023 - dl.acm.org","… information sources, we build the mapping of report IDs cross diferent information sources. … +, large language models (LLMs) have achieved impressive results in natural language …",https://dl.acm.org/doi/abs/10.1145/3608951,https://dl.acm.org/doi/pdf/10.1145/3608951,,,,, +33,A Software Security Entity Relationships Prediction Framework Based on Knowledge Graph Embedding Using Sentence-Bert,"Y Wang, X Hou, X Ma, Q Lv - International Conference on Wireless …, 2022 - Springer","… We finally designed a large number of experiments to evaluate the effectiveness of our model +… In this way, we can preserve the mapping properties of the given relations. Meanwhile, the …",https://link.springer.com/chapter/10.1007/978-3-031-19214-2_42,,Experiments,,,, +34,Keyword Extraction From Specification Documents for Planning Security Mechanisms,"JJ Poozhithara, HU Asuncion… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org","… We evaluate VDocScan using an extensive dataset of CVE vulnerability reports mapped to +over 3600 … We skipped deep learning models due to the large datasets they require to obtain …",https://ieeexplore.ieee.org/abstract/document/10172752/,https://faculty.washington.edu/lagesse/publications/ICSE2023.pdf,,,,, +35,ML-FEED: Machine Learning Framework for Efficient Exploit Detection,"T Saha, T Al Rahat, N Aaraj, Y Tian… - 2022 IEEE 4th …, 2022 - ieeexplore.ieee.org","… the execution of a CVE/CWE vulnerability fingerprint. Prior ML-… , we map the entities to their +related programming language-… an NLP model that learns word associations from a large text …",https://ieeexplore.ieee.org/abstract/document/10063446/,https://www.researchgate.net/profile/Tamjid-Al-Rahat-2/publication/364777657_ML-FEED_Machine_Learning_Framework_for_Efficient_Exploit_Detection/links/6421e283a1b72772e42f701f/ML-FEED-Machine-Learning-Framework-for-Efficient-Exploit-Detection.pdf,,,,, +36,Vulnerability Clustering and other Machine Learning Applications of Semantic Vulnerability Embeddings,"MO Stehr, M Kim - arXiv preprint arXiv:2310.05935, 2023 - arxiv.org","… the relational nature of the CVE/CWE associations and their … model tries to be more +semantically accurate by mapping … is quite small for training a language model. Clusters of …",https://arxiv.org/abs/2310.05935,https://arxiv.org/pdf/2310.05935,,,,, +37,Instruction2vec: Efficient preprocessor of assembly code to detect software weakness with CNN,"Y Lee, H Kwon, SH Choi, SH Lim, SH Baek, KW Park - Applied Sciences, 2019 - mdpi.com","… The model learns to map each discrete word ID, which … from an unsupervised neural +language model. Word vectors are … Therefore, we require a large amount of software weakness …",https://www.mdpi.com/2076-3417/9/19/4086,https://www.mdpi.com/2076-3417/9/19/4086/pdf,,,,, +38,Data-driven vulnerability exploration for design phase system analysis,"G Bakirtzis, BJ Simon, AG Collins… - IEEE Systems …, 2019 - ieeexplore.ieee.org","… model Σ. It is through that extra design information that our solution CYBOK is able to take the +graph of a system model and map … that can—by their fidelity—immediately produce a large …",https://ieeexplore.ieee.org/abstract/document/8850328/,https://arxiv.org/pdf/1909.02923,,,Solutions and Mitigations,, +39,Attack prediction in Internet of Things using knowledge graph,"S Zhang, C Zhao, S Wang, S Li… - … Conference on Internet …, 2023 - spiedigitallibrary.org","… In this paper, we obtain the linkage of these knowledge repositories and map them … large +scale knowledge graphs like IOTEKG. For triple information, we directly use the TransE model …",https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12708/127080K/Attack-prediction-in-Internet-of-Things-using-knowledge-graph/10.1117/12.2683915.short,,,,,, +40,Categorizing software vulnerabilities using overlapping self-organizing map,"S Hassanvand, M Ghasemzadeh - International Journal of Engineering … - academia.edu","… In this research work, by selecting the MoSCoW prioritization method and by combining it +with the SOM self-organizing mapping algorithm, we present a new categorization for the …",https://www.academia.edu/download/55415012/IJOER-DEC-2017-11.pdf,https://www.academia.edu/download/55415012/IJOER-DEC-2017-11.pdf,,,,,Current Issues +41,Identifying missing relationships of CAPEC attack patterns by transformer models and graph structure,"R Miyata, H Washizaki, K Sumoto… - 2023 IEEE/ACM 1st …, 2023 - ieeexplore.ieee.org","… Transformer-based models pre-trained with large amounts of … natural language processing +tasks. Fine-tuning these pre-trained models can create accurate and easily adaptable models …",https://ieeexplore.ieee.org/abstract/document/10190618/,,,,,, +42,OVM: an ontology for vulnerability management,"JA Wang, M Guo - Proceedings of the 5th Annual Workshop on Cyber …, 2009 - dl.acm.org","… Over the past a few decades, a significantly large amount of knowledge has been accumulated +… Each instance of IT_Product maps to an entity of a computing system in the real word. We …",https://dl.acm.org/doi/abs/10.1145/1558607.1558646,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=84141a28eff66c5cb78dd7090f3182605aad6c52,,,,, +43,Streamlining Attack Tree Generation: A Fragment-Based Approach,"I Pekaric, M Frick, JG Adigun, R Groner, T Witte… - arXiv preprint arXiv …, 2023 - arxiv.org","… to generate a large number of smaller ”tree” models that could be … models in the form of +attack graphs (Haque et al., 2017). These can be cyclic and acyclic graphs that outline a mapping …",https://arxiv.org/abs/2310.00654,https://arxiv.org/pdf/2310.00654,,,,, +44,A Cybersecurity Knowledge Graph Completion Method for Penetration Testing,"P Wang, J Liu, X Zhong, S Zhou - Electronics, 2023 - mdpi.com","… model, semantic matching model and neural network model. Most distance-based models +map … In the training process of the knowledge graph completion model, a large number of …",https://www.mdpi.com/2079-9292/12/8/1837,https://www.mdpi.com/2079-9292/12/8/1837,,,,, +45,Assessment of hypervisor vulnerabilities,"A Thongthua, S Ngamsuriyaroj - 2016 International conference …, 2016 - ieeexplore.ieee.org","… vulnerabilities which could lead to huge damage if they are exploited [3… All issues can map +to 33 CWEs, and two issues have … Thus, the 28 remaining issues can be mapped to CVEs and …",https://ieeexplore.ieee.org/abstract/document/7600180/,,,,,, +46,An ontological approach to computer system security,"J An Wang, MM Guo, J Camargo - Information Security Journal: A …, 2010 - Taylor & Francis","… attributes for each concept and its subconcepts as the diagram would be considerably big. … +the mapping relationships in NVD, where a type of vulnerability (CWE) can be mapped to an …",https://www.tandfonline.com/doi/abs/10.1080/19393550903404902,,,,,, +47,Selecting system specific cybersecurity attack patterns using topic modeling,"S Adams, B Carter, C Fleming… - … Big Data Science And …, 2018 - ieeexplore.ieee.org","… The proposed method uses topic modeling to extract hidden topics from the textual … a topic +model. The posterior distribution of topics for the system is estimated using the model and any …",https://ieeexplore.ieee.org/abstract/document/8455944/,,,,,, +48,Leveraging external data sources to enhance secure system design,"J Samuel, J Jaskolka, GOM Yee - … , Privacy, and Security: A Big …, 2021 - ieeexplore.ieee.org","Today's software systems are riddled with security vulnerabilities that invite attack. We +envisage a secure software design process at the architectural level, in which the security …",https://ieeexplore.ieee.org/abstract/document/9452029/,https://drive.google.com/file/d/1e2-RpXo6ybfE_GEPZ94-sPGw1mO1FZXf/view,,,,, +49,Using the SIEM Software vulnerability detection model proposed,"I Jeon, K Han, D Kim, J Choi - Journal of the Korea Institute of …, 2015 - koreascience.kr","… With the advancement of SIEM from ESM, it allows deep correlated analysis using huge … , +and respond to software’s vulnerabilities by analyzing big data. In the phase of monitoring and …",https://koreascience.kr/article/JAKO201527358961998.page,https://koreascience.kr/article/JAKO201527358961998.pdf,,,,, +50,A survey on data-driven software vulnerability assessment and prioritization,"THM Le, H Chen, MA Babar - ACM Computing Surveys, 2022 - dl.acm.org","… large data from multiple sources. Many studies in this area have proposed different Natural +Language … successfully used the feature activation maps in a CNN model [162] or leveraged …",https://dl.acm.org/doi/abs/10.1145/3529757,https://arxiv.org/pdf/2107.08364,,,,, +51,Coevolutionary modeling of cyber attack patterns and mitigations using public datasets,"M Shlapentokh-Rothman, J Kelly, A Baral… - Proceedings of the …, 2021 - dl.acm.org","… we want to model, we note that APTs frequently indiscriminately target large slices of net… +(BNF) context-free grammar and an intermediate interpreter to map from the “genome” to a “…",https://dl.acm.org/doi/abs/10.1145/3449639.3459351,https://dspace.mit.edu/bitstream/handle/1721.1/145921/3449639.3459351.pdf?sequence=1,,,,, +52,Model-based vulnerability testing for web applications,"F Lebeau, B Legeard, F Peureux… - 2013 IEEE Sixth …, 2013 - ieeexplore.ieee.org","This paper deals with an original approach to automate Model-Based Vulnerability Testing (MBVT) +for Web applications, which aims at improving the accuracy and precision of …",https://ieeexplore.ieee.org/abstract/document/6571669/,https://hal.science/hal-00935070/document,,,,, +53,Different Machine Learning Algorithms used for Secure Software Advance using Software Repositories,"K Chaudhary, S Singh - 2023 - researchgate.net","… the first project (manually mapping) and this project (automated mapping) resulted in … +mapping comparison between two large data sets was concluded as a success. This ML model …",https://www.researchgate.net/profile/Kanchan-Chaudhary/publication/370487530_Different_Machine_Learning_Algorithms_used_for_Secure_Software_Advance_using_Software_Repositories/links/64a919af95bbbe0c6e1ffa96/Different-Machine-Learning-Algorithms-used-for-Secure-Software-Advance-using-Software-Repositories.pdf,https://www.researchgate.net/profile/Kanchan-Chaudhary/publication/370487530_Different_Machine_Learning_Algorithms_used_for_Secure_Software_Advance_using_Software_Repositories/links/64a919af95bbbe0c6e1ffa96/Different-Machine-Learning-Algorithms-used-for-Secure-Software-Advance-using-Software-Repositories.pdf,,,,, +54,Ontology of metrics for cyber security assessment,"E Doynikova, A Fedorchenko, I Kotenko - Proceedings of the 14th …, 2019 - dl.acm.org","… The underlying model implies integration of user domain … ), and operations (the user maps +the actions to each expectation, … of all known attacks will take huge amount of resources. Thus, …",https://dl.acm.org/doi/abs/10.1145/3339252.3341496,,,,,, +55,MARFCAT: Transitioning to binary and larger data sets of SATE IV,"SA Mokhov, J Paquet, M Debbabi, Y Sun - arXiv preprint arXiv:1207.3718, 2012 - arxiv.org","… // Construct an index mapping CVEs to files and locations … language models per default +MARF specification ([Mok10b]) 3 … if our first estimate of a CVE/CWE is incorrect, the next one in …",https://arxiv.org/abs/1207.3718,https://arxiv.org/pdf/1207.3718,,,,, +56,Threat Management Based on Information About Vulnerabilities,BL Sadlek - is.muni.cz,"… The software performs routine work instead of a human person or processes a large amount +of data which could not be even processed by a human or could not be processed in such a …",https://is.muni.cz/th/vpeip/diploma_thesis.pdf,https://is.muni.cz/th/vpeip/diploma_thesis.pdf,,,,, +57,"Using Machine Learning for Description and Inference of Cyber Threats, Vulnerabilities, and Mitigations",A Srinivasan - 2022 - dspace.mit.edu,"… In [8], CVEs were attempted to be mapped directly to … Even though BERT was pretrained +on a large corpus and can yield … The model was finetuned on the masked language modeling …",https://dspace.mit.edu/handle/1721.1/143257,https://dspace.mit.edu/bitstream/handle/1721.1/143257/Srinivasan-ashwins-meng-eecs-2022-thesis.pdf?sequence=1&isAllowed=y,,,,, +58,Ontology for Cybersecurity Governance of ICT Systems,"F De Rosa, N Maunero, L Nicoletti, P Prinetto… - … –Italian Conference on …, 2022 - ceur-ws.org","… the management and use of these large and complex knowledge-… A large number of existing +standards and ontologies have … information and it has been mapped to all the most common …",https://ceur-ws.org/Vol-3260/paper4.pdf,https://ceur-ws.org/Vol-3260/paper4.pdf,,,,, +59,Virtual knowledge graphs for federated log analysis,"K Kurniawan, A Ekelhart, E Kiesling, D Winkler… - Proceedings of the 16th …, 2021 - dl.acm.org","… maps and parses the extracted log data into RDF. It uses the standard RDF mapping language +to map … improve log processing performance, we split large log files from the data set into …",https://dl.acm.org/doi/abs/10.1145/3465481.3465767,https://eprints.cs.univie.ac.at/6997/1/Virtual%20Knowledge%20Graphs%20for%20Federated%20Log%20Analysis.pdf,,,,, +60,Understanding and recommending security requirements from problem domain ontology: A cognitive three-layered approach,"BJ Kim, SW Lee - Journal of Systems and Software, 2020 - Elsevier","… The targets store large amounts of personal and customer information. Those data cannot +… In the “Threat Analysis” step, the result of threat modeling with the i* framework is mapped to …",https://www.sciencedirect.com/science/article/pii/S016412122030145X,,,,,, +61,A review of knowledge graph application scenarios in cyber security,"K Liu, F Wang, Z Ding, S Liang, Z Yu, Y Zhou - arXiv preprint arXiv …, 2022 - arxiv.org","… based on knowledge modeling have become new solutions under big data conditions. The +… IE models, even for the pre-training language model or the prompt-based language models. …",https://arxiv.org/abs/2204.04769,https://arxiv.org/pdf/2204.04769,,,Solutions and Mitigations,, +62,Improving Conceptual Domain Characterization in Ontology Networks,"BF Martins, JFR Román, O Pastor, M Hadad - International Conference on …, 2023 - Springer","… to describe complex domains should not be overly large or be used in isolation. Ontology … +concept present in the ontologies is being mapped through our team of domain specialists (…",https://link.springer.com/chapter/10.1007/978-3-031-33080-3_12,,,,,,Current Issues +63,Edge propagation for link prediction in requirement-cyber threat intelligence knowledge graph,"Y Zhang, J Chen, Z Cheng, X Shen, J Qin, Y Han… - Information Sciences, 2023 - Elsevier","… However, the RCTI graph is far from complete due to the large … model defines r as a +translation between the embeddings of h and t. The structural information is obtained by mapping …",https://www.sciencedirect.com/science/article/pii/S0020025523013555,,,,,, +64,A hybrid approach to detecting security defects in programs,"L Yu, J Zhou, Y Yi, J Fan, Q Wang - 2009 Ninth International …, 2009 - ieeexplore.ieee.org","… map them into reserved words or class names in the standard libraries of a program language. +… We apply the hybrid approach to other large sizes of programs and find out it can handle …",https://ieeexplore.ieee.org/abstract/document/5381537/,https://scholar.archive.org/work/dee7hgsqqjfjfm5lbvtmfjufe4/access/wayback/http://sei.pku.edu.cn/~wqx/publications/A%20Hybrid%20Approach%20to%20Detecting%20Security%20Defects%20in%20Programs.pdf,,,,, +65,Recent progress of using knowledge graph for cybersecurity,"K Liu, F Wang, Z Ding, S Liang, Z Yu, Y Zhou - Electronics, 2022 - mdpi.com","… have evolved into novel solutions in the context of big data. … or validating the IE models, +even for the pre-training language model or the prompt-based language models. However, …",https://www.mdpi.com/2079-9292/11/15/2287,https://www.mdpi.com/2079-9292/11/15/2287/pdf,,,Solutions and Mitigations,, +66,Automated Vulnerability Prediction in Software Systems and Lightweight Identification of Design Patterns in Source Code,JJ Poozhithara - 2021 - digital.lib.washington.edu,"… is a set of language modeling and feature learning techniques that map text (words or +phrases) from a vocabulary to high-dimensional vectors of real numbers. Such mapping can be …",https://digital.lib.washington.edu/researchworks/handle/1773/47191,https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/47191/Poozhithara_washington_0250O_23223.pdf?sequence=2.txt,,,,, +67,Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java,"K Li, S Chen, L Fan, R Feng, H Liu, C Liu, Y Liu… - 2023 - sen-chen.github.io","… , as we aim to conduct a large-scale experiment in this study. … effectiveness of each tool by +mapping them to CWE. Meanwhile, … been mapped to CWE Weaknesses, we thereby mapped …",https://sen-chen.github.io/img_cs/pdf/fse2023-sast.pdf,https://sen-chen.github.io/img_cs/pdf/fse2023-sast.pdf,Experiments,,,, +68,"Enhancements to Threat, Vulnerability, and Mitigation Knowledge For Cyber Analytics, Hunting, and Simulations","E Hemberg, M Turner, N Rutar… - Digital Threats: Research …, 2023 - dl.acm.org","… We map diferent kinds of hypothetical threats, those that … To handle the large number of attack +pattern combinations, we … -shot learning with newer language models. Another example is …",https://dl.acm.org/doi/abs/10.1145/3615668,https://dl.acm.org/doi/pdf/10.1145/3615668,,,,, +69,Towards a Multidimensional Analysis of the National Vulnerability Database,"R Singla, N Reddy, R Bettati, H Alnuweiri - IEEE Access, 2023 - ieeexplore.ieee.org","… We observe that the number of CVEs that map to no … a big impact when vulnerabilities are +discovered in such products. Our analysis showed that several such cases have led to a large …",https://ieeexplore.ieee.org/abstract/document/10233875/,https://ieeexplore.ieee.org/iel7/6287639/6514899/10233875.pdf,,,,, +70,Demystifying the Mysteries of Security Vulnerability Discussions on Developer Q&A Sites,"THM Le, R Croft, D Hin, MA Babar - CoRR abs/2008.04176, 2020 - researchgate.net","… To demystify such mysteries, we conduct large-scale qualitative … Therefore, we use Latent +Dirichlet Allocation topic modeling to … We manually mapped the topics with the representative …",https://www.researchgate.net/profile/Triet_Le8/publication/343567999_Demystifying_the_Mysteries_of_Security_Vulnerability_Discussions_on_Developer_QA_Sites/links/5f34c7c3458515b7291beeaa/Demystifying-the-Mysteries-of-Security-Vulnerability-Discussions-on-Developer-Q-A-Sites.pdf,https://www.researchgate.net/profile/Triet_Le8/publication/343567999_Demystifying_the_Mysteries_of_Security_Vulnerability_Discussions_on_Developer_QA_Sites/links/5f34c7c3458515b7291beeaa/Demystifying-the-Mysteries-of-Security-Vulnerability-Discussions-on-Developer-Q-A-Sites.pdf,,,,,Current Issues +71,"Hardware vulnerability description, sharing and reporting: challenges and opportunities","J Bellay, D Forte, R Martin, C Taylor - GOMACTech, 2021 - par.nsf.gov","… By and large, these vulnerabilities were complicated to mitigate and incurred significant … +However, the space mapping hardware weaknesses to assurance tools and data is largely …",https://par.nsf.gov/servlets/purl/10237521,https://par.nsf.gov/servlets/purl/10237521,,,,, +72,A Framework for Modeling the Software Assurance Ecosystem: Insights from the Software Assurance Landscape Project,"L Brownsword, CC Woody, CJ Alberts… - 2010 - pstorage-cmu-348901238291901.s3 …","… [SEI]) as we developed the value mapping models. We would also like to … huge task. +Therefore, we devised an incremental approach to develop and apply the Assurance Modeling …",http://pstorage-cmu-348901238291901.s3.amazonaws.com/12056687/file.pdf,http://pstorage-cmu-348901238291901.s3.amazonaws.com/12056687/file.pdf,,,,, +73,ML-FEED: Machine Learning Framework for Efficient Exploit Detection (Extended version),"T Saha, T Al-Rahat, N Aaraj, Y Tian, NK Jha - arXiv preprint arXiv …, 2023 - arxiv.org","… the execution of a CVE/CWE vulnerability fingerprint. Prior ML-… , we map the entities to their +related programming language-… an NLP model that learns word associations from a large text …",https://arxiv.org/abs/2301.04314,https://arxiv.org/pdf/2301.04314,,,,, +74,Cross-site scripting guardian: A static XSS detector based on data stream input-output association mining,"C Li, Y Wang, C Miao, C Huang - Applied Sciences, 2020 - mdpi.com","… report of an XSS vulnerability in CVE/CWE [38], we suppose … Therefore, we need suitable +methods to map the discrete text … It integrates the CBOW, Skip-Gram two language models as …",https://www.mdpi.com/2076-3417/10/14/4740,https://www.mdpi.com/2076-3417/10/14/4740/pdf,,,,, +75,Cybersecurity knowledge graphs,"LF Sikos - Knowledge and Information Systems, 2023 - Springer","… This paper reviews the most prominent graph-based data models used in this domain, along +with knowledge organization systems that define concepts and properties utilized in formal …",https://link.springer.com/article/10.1007/s10115-023-01860-3,https://link.springer.com/article/10.1007/s10115-023-01860-3,,,,, +76,Developing Cross-Domain Host-Based Intrusion Detection,"O Ajayi, A Gangopadhyay, RF Erbacher, C Bursat - Electronics, 2022 - mdpi.com","… models evaluation, plotting the confusion matrix to see the actual performance of individual +attack class from experience exposes huge … not have a definite CVE/CWE id attached to it. To …",https://www.mdpi.com/2079-9292/11/21/3631,https://www.mdpi.com/2079-9292/11/21/3631/pdf,,,,, +77,Cybersecurity Entity Alignment via Masked Graph Attention Networks,"Y Qin, X Liao - arXiv preprint arXiv:2207.01434, 2022 - arxiv.org","… Integrating such vulnerability information is essential for an organization to gain a big picture +of … We use a set of nodes V to denote security entities with a mapping ψ : V→A to the entity …",https://arxiv.org/abs/2207.01434,https://arxiv.org/pdf/2207.01434,,,,, +78,An ontology for vulnerability lifecycle,"R Wita, N Jiamnapanon… - 2010 Third International …, 2010 - ieeexplore.ieee.org","… from reliable sources, classified and mapped to vulnerability lifecycle concept by domain … +public because they reflect the concern of the public at large. For example, a lot of hits on the …",https://ieeexplore.ieee.org/abstract/document/5453687/,,,,,, +79,Machine Learning Based Risk Classification of Vulnerabilities Incorporating Mitre Att&Ck Framework and Threat Intelligence,S Subbaratinam - 2022 - search.proquest.com,"This paper investigates the relationship between operational context and vulnerability +assessment effectiveness through a comparison of the context-agnostic Common Vulnerability …",https://search.proquest.com/openview/58cd98fd957ca272b041236bc97ae1ae/1?pq-origsite=gscholar&cbl=18750&diss=y,,,,,, +80,"Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: present and future challenges","S Gupta, BB Gupta - International Journal of Cloud Applications and …, 2017 - igi-global.com","In this paper, we present comprehensive survey of secured web application by identifying +numerous serious threats faced by several-related organizations. Based on this, we have …",https://www.igi-global.com/article/detection-avoidance-and-attack-pattern-mechanisms-in-modern-web-application-vulnerabilities/182251,https://dl.acm.org/doi/abs/10.4018/IJCAC.2017070101,Surveys and Questionnaires,,,,Current Issues +81,Automesc: Automatic framework for mining and classifying ethereum smart contract vulnerabilities and their fixes,"M Soud, I Qasse, G Liebel, M Hamdaqa - arXiv preprint arXiv:2212.10660, 2022 - arxiv.org","… languages for Ethereum smart contracts. AutoMESC places an emphasis on extracting a large +… We mapped the tools’ vulnerabilities to the proposed vulnerability type classification in [37…",https://arxiv.org/abs/2212.10660,https://arxiv.org/pdf/2212.10660,,,,, +82,Semi-Supervised Text Classification: Automated Weak Vulnerability Detection,"A Duppils, M Tullberg - LU-CS-EX, 2020 - lup.lub.lu.se","… is de ned as language modeling and feature learning techniques in NLP that map symbols +(… the text is phrased di erently than CVE/CWE descriptions. The results for Debricked Labeled …",https://lup.lub.lu.se/student-papers/record/9023542/file/9023544.pdf,https://lup.lub.lu.se/student-papers/record/9023542/file/9023544.pdf,,,,, +83,Predicting exploit likelihood for cyber vulnerabilities with machine learning,M Edkrantz - 2015 - odr.chalmers.se,"… large number of users, but also because there are large … mappings was scraped of a +database using cve-portal2, a tool using the cve-search project. In total, 421,000 CAPEC mappings …",http://odr.chalmers.se/items/b4485699-85b0-43e6-a5ea-812914ca2c76,http://odr.chalmers.se/bitstreams/de0fb5fc-b969-4338-89bb-1b48e42e69e2/download,,,,, +84,Utilizing public repositories to improve the decision process for security defect resolution and information reuse in the development environment,AF Salen - 2021 - bora.uib.no,"… Unlike other machine learning techniques, large neural networks tend to increase their … +learning model later on, as well as how we find the most similar security source mappings for …",https://bora.uib.no/bora-xmlui/handle/11250/2761762,https://bora.uib.no/bora-xmlui/bitstream/handle/11250/2761762/Master_Thesis_Anja_Fonn_Salen.pdf?sequence=1&isAllowed=y,,,,, +85,Linked data for software security concepts and vulnerability descriptions,AP Joshi - 2013 - search.proquest.com,"The Web is typically our first source of information about new software vulnerabilities, exploits +and cyber-attacks. Information is found in semi-structured vulnerability databases as well …",https://search.proquest.com/openview/6aa04c4a0d5feb819b753a116ca0e7a3/1?pq-origsite=gscholar&cbl=18750,https://ebiquity.umbc.edu/get/a/publication/683.pdf,,,,, +86,Measuring Software Security Using Improved CWE Base Scores,"SM Nourin, G Karabatis, FC Argiropoulos - UMBC Faculty Collection, 2021 - mdsoar.org","… because a large number of CWEs cannot be mapped with any … is not enough CVE/CWE +description data to train the model, and … to use a pre-trained model for sentence embedding. This …",https://mdsoar.org/handle/11603/24068,https://mdsoar.org/bitstream/handle/11603/24068/paper16.pdf?sequence=1&isAllowed=y,,,,, +87,A survey on vulnerability assessment tools and databases for cloud-based web applications,"K Kritikos, K Magoutis, M Papoutsakis, S Ioannidis - Array, 2019 - Elsevier","… For instance, if we consider OWASP, this organisation is both big and well-known for its +security … We suspect that they map vulnerability categories to certain risk levels to address this. …",https://www.sciencedirect.com/science/article/pii/S2590005619300116,https://www.sciencedirect.com/science/article/pii/S2590005619300116,,,,, +88,Standardizing cyber threat intelligence information with the structured threat information expression (stix),"S Barnum - Mitre Corporation, 2012 - stixproject.github.io","… They consist of one or more Observable patterns potentially mapped to a related TTP … , +and conversations on these topics by a large number of individuals from many organizations, …",http://stixproject.github.io/about/STIX_Whitepaper_v1.1.pdf,http://stixproject.github.io/about/STIX_Whitepaper_v1.1.pdf,,,,, +89,Semantic concept recognition from structured and unstructured inputs within cyber security domain,AG Hoşsucu - 2015 - open.metu.edu.tr,"… is statically mapped to the defined knowledge base model. Therefore a dynamic approach +should be considered within the scope of this problem which automatically maps the given …",https://open.metu.edu.tr/handle/11511/24502,https://open.metu.edu.tr/bitstream/handle/11511/24502/index.pdf,,,,, +90,Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches,"THM Le - arXiv preprint arXiv:2207.11708, 2022 - arxiv.org","… such as Google Search, YouTube, and Google Maps, contain more than two million lines +of code [2]. Quality assurance of such large systems is a focal point for both researchers and …",https://arxiv.org/abs/2207.11708,https://arxiv.org/pdf/2207.11708,,,,, +91,Application of knowledge graph in software engineering field: A systematic literature review,"L Wang, C Sun, C Zhang, W Nie, K Huang - Information and Software …, 2023 - Elsevier","… a knowledge base involving various fields utilizing a large amount of data and focuses on +the … In summary, code knowledge mapping is a promising approach for achieving intelligent …",https://www.sciencedirect.com/science/article/pii/S0950584923001829,,,,,, +92,On prototype pollution and security risks of developing with third-party software components,A Johansson - 2022 - diva-portal.org,"… , that has had a big impact on projects … the largest publicly available sources of such +information [28]. A well-known database that provides information connected to CVEs, like mapping …",https://www.diva-portal.org/smash/record.jsf?pid=diva2:1668307,https://www.diva-portal.org/smash/get/diva2:1668307/FULLTEXT02,,,,, +93,The effect of Bellwether analysis on software vulnerability severity prediction models,"PK Kudjo, J Chen, S Mensah, R Amankwah… - Software Quality …, 2020 - Springer","… a huge impact on several users. This is why this study chose to validate the proposed model +using … The proposed model offers a possible mapping of the severity scores in open-source …",https://link.springer.com/article/10.1007/s11219-019-09490-1,https://www.academia.edu/download/95372880/s11219-019-09490-120221207-1-16gd53i.pdf,,,,, +94,A Survey on Cybersecurity Knowledge Graph Construction,"X Zhao, R Jiang, Y Han, A Li, Z Peng - Computers & Security, 2023 - Elsevier","… mapping relation between the sample entity and the attack behavior, and realizes the mapping +… [94] created (to the best of our knowledge) the world's largest publicly available malware …",https://www.sciencedirect.com/science/article/pii/S0167404823004340,,,,,, +95,Determining an Economic Value of High Assurance for Commodity Software Security,"V Gligor, A Perrig, D Basin - Cambridge International Workshop on …, 2023 - Springer","… for all other components of a large commodity software system, ie, a software “giant” [15]. +High … Unfortunately, although deterrence and assurance are separable, the mappings to their …",https://link.springer.com/chapter/10.1007/978-3-031-43033-6_23,https://www.kth.se/polopoly_fs/1.1260557.1686150226!/7%20Virgil%20Gligor.pdf,,,,, +96,Analysis and Aggregation of Vulnerability Databases with Code-Level Data,PL Galvão - 2022 - repositorio-aberto.up.pt,"… In figures 6.2 and 6.3 we show the same chart, but excluding the largest databases … a large +number of repositories in git.kernel.org. The presence of vulnerabilities to which the language …",https://repositorio-aberto.up.pt/bitstream/10216/144796/2/588886.pdf,https://repositorio-aberto.up.pt/bitstream/10216/144796/2/588886.pdf,,,,, +97,"Secu Wear: An open source, multi-component hardware/software platform for exploring wearable security","ML Hale, D Ellis, R Gamble, C Waler… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org","… [15], respectively), and then simply map these vulnerabilities to specific devices. The goal is +… with SecuWear, distribute results using a CVE/CWE-like method and then vendors or other …",https://ieeexplore.ieee.org/abstract/document/7226677/,,,,,, +98,Secure coding through integration of public information security sources to eclipse development environment,S Lunde - 2022 - bora.uib.no,"The use of open source components in software development has been growing at a rapid +pace for a number of years. This increase in use of open source software is accompanied by …",https://bora.uib.no/bora-xmlui/handle/11250/3021969,https://bora.uib.no/bora-xmlui/bitstream/handle/11250/3021969/Master-5.pdf?sequence=1&isAllowed=y,,,,, +99,Automated Identification of Cyber Threat Scenarios,L SADLEK - is.muni.cz,"… Relevant CTI issues within our context are mainly a large amount of data, lack of relevant … +Common research approaches for the cyber key terrain mapping are crown jewels analysis, …",https://is.muni.cz/th/s4byh/RIGO_Sadlek.pdf,https://is.muni.cz/th/s4byh/RIGO_Sadlek.pdf,,,,, +100,The use of CVE-related databases in improving the cybersecurity of embedded systems,O Huuhtanen - 2021 - jyx.jyu.fi,"… The NVD CVE -dataset was the second large source of CVE -information used for this +research. Unlike the dataset retrieved from cve.mitre.org, the NVD separates their CVE -entries in …",https://jyx.jyu.fi/handle/123456789/76801,https://jyx.jyu.fi/bitstream/handle/123456789/76801/1/URN%3ANBN%3Afi%3Ajyu-202106233996.pdf,,,,, +101,Mapping language models to grounded conceptual spaces,"R Patel, E Pavlick - International Conference on Learning …, 2021 - openreview.net","… to which large language models, trained only on text can be taught to map previously learned +word forms onto conceptual worlds. We investigate several generative language models in …",https://openreview.net/forum?id=gJcEM8sxHK,https://openreview.net/pdf?id=gJcEM8sxHK,,,,, +102,Prompt Middleware: Mapping Prompts for Large Language Models to UI Affordances,"S MacNeil, A Tran, J Kim, Z Huang, S Bernstein… - arXiv preprint arXiv …, 2023 - arxiv.org","… With the recent introduction of large language models (LLMs), which can generate text in +response to a natural language prompt, there are new opportunities to consider how to …",https://arxiv.org/abs/2307.01142,https://arxiv.org/pdf/2307.01142,,,,, +103,MapperGPT: Large Language Models for Linking and Mapping Entities,"N Matentzoglu, JH Caufield, HB Hegde… - arXiv preprint arXiv …, 2023 - arxiv.org","… intensive manual mapping refinement through a human curator. Large Language Models +(… Here we present MapperGPT, an approach that uses LLMs to review and refine mapping …",https://arxiv.org/abs/2310.03666,https://arxiv.org/pdf/2310.03666,,,,,Current Issues +104,Mapping with chatgpt,"R Tao, J Xu - ISPRS International Journal of Geo-Information, 2023 - mdpi.com","… The emergence and rapid advancement of large language models (LLMs), represented +by OpenAI’s Generative Pre-trained Transformer (GPT), has brought up new opportunities …",https://www.mdpi.com/2220-9964/12/7/284,https://www.mdpi.com/2220-9964/12/7/284,,,,,Current Issues +105,Language-model optimization by mapping of corpora,"D Klakow - Proceedings of the 1998 IEEE International …, 1998 - ieeexplore.ieee.org","… - grouping frequent word sequences to phrases can improve language models. More … to +optimize the perplexity of n-gram language models. In tests on two large corpora (WSJ and BNA) …",https://ieeexplore.ieee.org/abstract/document/675361/,,,,,, +106,SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model,"J Singh, T Litfin, J Singh, K Paliwal, Y Zhou - Bioinformatics, 2022 - academic.oup.com","… As a result, many methods have been developed for protein contact map … map predictor +utilizing the output of a pre-trained language model ESM-1b as an input along with a large …",https://academic.oup.com/bioinformatics/article-abstract/38/7/1888/6519147,https://academic.oup.com/bioinformatics/article/38/7/1888/6519147,,,,, +107,Mining the web to create a language model for mapping between English names and phrases and Japanese,"G Grefenstette, Y Qu, DA Evans - IEEE/WIC/ACM International …, 2004 - ieeexplore.ieee.org","… largest, exploitable collection of language use. If we can mine the Web to build abstract models +of language use, these models … using the implicit intelligence of language use to solve an …",https://ieeexplore.ieee.org/abstract/document/1410791/,,,,,, +108,KenLM: Faster and smaller language model queries,"K Heafield - Proceedings of the sixth workshop on statistical …, 2011 - aclanthology.org","… of language model storage is therefore sparse mapping: … source toolkit for handling large +scale language models. In … Tightly packed tries: How to fit large models into memory, and make …",https://aclanthology.org/W11-2123.pdf,https://aclanthology.org/W11-2123.pdf,,,,, +109,Least-to-most prompting enables complex reasoning in large language models,"D Zhou, N Schärli, L Hou, J Wei, N Scales… - arXiv preprint arXiv …, 2022 - arxiv.org","… mapping prompt containing 14 examples to demonstrate how to map natural language … +The demonstration examples in the command-mapping are supposed to be able to fully …",https://arxiv.org/abs/2205.10625,https://arxiv.org/pdf/2205.10625.pdf?trk=public_post_comment-text,,,,, +110,"Topological Data Mapping of Online Hate Speech, Misinformation, and General Mental Health: A Large Language Model Based Study","A Alexander, H Wang - arXiv preprint arXiv:2309.13098, 2023 - arxiv.org","… Recent progresses in machine learning and large language models such as ChatGPT have +… to the embeddings to obtain a visual map connecting online hate speech, misinformation, …",https://arxiv.org/abs/2309.13098,https://arxiv.org/pdf/2309.13098,,,,, +111,Mapping the Latent Spaces of Culture,T Underwood - 2022 - hcommons.org,"… “large language models.”A paper from Stanford emphasizes applications: “foundation … +But to understand the risks language models pose, I think we will need to understand how they …",https://hcommons.org/deposits/item/hc:41973/,https://hcommons.org/deposits/download/hc:41974/CONTENT/mappingthelatentspacesofculture.pdf/,,,,, +112,Mapping ESG trends by distant supervision of neural language models,"N Raman, G Bang, A Nourbakhsh - Machine Learning and Knowledge …, 2020 - mdpi.com","… The preferred solution to this problem was to use word representations from a neural +language model that is pre-trained on a large scale unlabeled text corpus with a general-purpose …",https://www.mdpi.com/2504-4990/2/4/25,https://www.mdpi.com/2504-4990/2/4/25/pdf,,,Solutions and Mitigations,,Current Issues +113,P2V-MAP: Mapping market structures for large retail assortments,"S Gabel, D Guhl, D Klapper - Journal of Marketing Research, 2019 - journals.sagepub.com","… two recent methodological advances in natural language processing and machine learning. +They customize a neural network language model to derive latent product attributes by …",https://journals.sagepub.com/doi/abs/10.1177/0022243719833631,,,,,, +114,Linearly mapping from image to text space,"J Merullo, L Castricato, C Eickhoff, E Pavlick - arXiv preprint arXiv …, 2022 - arxiv.org","… We find that all three encoders perform equally well at transferring visual property information +to the language model (eg, whether an animal is large or small), but that image encoders …",https://arxiv.org/abs/2209.15162,https://arxiv.org/pdf/2209.15162,,,,, +115,Bridging the gap: Mapping layperson narratives to legal issues with language models,"H Westermann, S Meeùs, M Godet, A Troussel, J Tan… - 2023 - ceur-ws.org","… of language models, which have been trained on large … language models and seed questions +can overcome the cold-start problem. This seems to be the case, as the language models …",https://ceur-ws.org/Vol-3441/paper5.pdf,https://ceur-ws.org/Vol-3441/paper5.pdf,,,,, +116,Does GPT-3 Grasp Metaphors? Identifying Metaphor Mappings with Generative Language Models,"L Wachowiak, D Gromann - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org","… We believe future iterations of large language models like GPT-3 will become important … +In the future, we want to experiment with using large language models to generate complete …",https://aclanthology.org/2023.acl-long.58/,https://aclanthology.org/2023.acl-long.58.pdf,Experiments,,,,Future Predictions +117,Chain-of-thought prompting elicits reasoning in large language models,"J Wei, X Wang, D Schuurmans… - Advances in …, 2022 - proceedings.neurips.cc","… the ability of large language models to perform complex … emerge naturally in sufficiently +large language models via a simple … Experiments on three large language models show that …",https://proceedings.neurips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html,https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf,Experiments,,,, +118,Task adaptation using MAP estimation in n-gram language modeling,"H Masataki, Y Sagisaka, K Hisaki… - … on Acoustics, Speech …, 1997 - ieeexplore.ieee.org","… language models have been widely used as effective linguistic constraints to reduce search +efforts [1][2]. However, large … However, fairly large amounts of text data are needed if these …",https://ieeexplore.ieee.org/abstract/document/596042/,,,,,, +119,On structuring probabilistic dependences in stochastic language modelling,"H Ney, U Essen, R Kneser - Computer Speech & Language, 1994 - Elsevier","… by the language model, the perplexity will be infinitely large. This is one of the real challenges +for the language model: … Now we describe a procedure by which such mappings can be …",https://www.sciencedirect.com/science/article/pii/S0885230884710011,,,,,, +120,Improved and scalable online learning of spatial concepts and language models with mapping,"A Taniguchi, Y Hagiwara, T Taniguchi, T Inamura - Autonomous Robots, 2020 - Springer","We propose a novel online learning algorithm, called SpCoSLAM 2.0, for spatial concepts +and lexical acquisition with high accuracy and scalability. Previously, we proposed …",https://link.springer.com/article/10.1007/s10514-020-09905-0,https://link.springer.com/article/10.1007/s10514-020-09905-0,,,,, +121,GamMa: Efficient Fine-Tuning of Pre-Trained Language Models Using Gradient Activation Mapping Masking,"A Gui, J Ye, H Xiao - 2023 International Joint Conference on …, 2023 - ieeexplore.ieee.org","… Since the introduction of Transformer [12], a number of large-scale language models have +… -trained language model consists of L Transformer encoders. The weights of the model can …",https://ieeexplore.ieee.org/abstract/document/10191351/,,,,,, +122,Not The End of Story: An Evaluation of ChatGPT-Driven Vulnerability Description Mappings,"X Liu, Y Tan, Z Xiao, J Zhuge… - Findings of the Association …, 2023 - aclanthology.org","… However, the cost of mapping through manual methods is … language processing (NLP) +technology, large models have … ) is a closed-source large language model (LLM), and it is …",https://aclanthology.org/2023.findings-acl.229/,https://aclanthology.org/2023.findings-acl.229.pdf,,,,, +123,"Using stochastic language models (SLM) to map lexical, syntactic, and phonological information processing in the brain","A Lopopolo, SL Frank, A Van den Bosch, RM Willems - PloS one, 2017 - journals.plos.org","… measures derived from computational language models to detect neural correlates of … +Probabilistic language models have proven to be useful tools for studying how language is …",https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177794,https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177794,,,,, +124,TeaBERT: An Efficient Knowledge Infused Cross-lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System,"L Chen, Y Qi, A Wu, L Deng… - IEEE Journal of …, 2023 - ieeexplore.ieee.org","… models can be applied to map non-English medical entities to the UMLS. In this study, we +compared our TeaBERT language model with the following baseline models. … of large memory …",https://ieeexplore.ieee.org/abstract/document/10250933/,,,,,, +125,Automatic chain of thought prompting in large language models,"Z Zhang, A Zhang, M Li, A Smola - arXiv preprint arXiv:2210.03493, 2022 - arxiv.org","… Large language models (LLMs) can perform complex reasoning by … > mappings. For example, +mistakes in either the mapping or the mapping will …",https://arxiv.org/abs/2210.03493,https://arxiv.org/pdf/2210.03493,,,,, +126,BERT-EMD: Many-to-many layer mapping for BERT compression with earth mover's distance,"J Li, X Liu, H Zhao, R Xu, M Yang, Y Jin - arXiv preprint arXiv:2010.06133, 2020 - arxiv.org","… , the pretrained language models (eg, BERT) usually have a large number of parameters, … +the computation overhead and model storage of pre-trained language models without perfor…",https://arxiv.org/abs/2010.06133,https://arxiv.org/pdf/2010.06133,,,,, +127,Latent semantic mapping [information retrieval],"JR Bellegarda - IEEE signal processing magazine, 2005 - ieeexplore.ieee.org","… in other areas of natural language processing, including word clustering, document/topic +clustering, large-vocabulary speech recognition language modeling, automated call routing, …",https://ieeexplore.ieee.org/abstract/document/1511825/,,,,,, +128,Placing flickr photos on a map,"P Serdyukov, V Murdock, R Van Zwol - Proceedings of the 32nd …, 2009 - dl.acm.org","… language model based entirely on the annotations provided by users. We define extensions +to improve over the language model … incorporate GeoNames1, a large external database of …",https://dl.acm.org/doi/abs/10.1145/1571941.1572025,https://www.researchgate.net/profile/Roelof-Zwol/publication/221299053_Placing_Flickr_Photos_on_a_Map/links/0a85e53821a59d2da8000000/Placing-Flickr-Photos-on-a-Map.pdf,,,,, +129,Vision and Language Navigation in the Real World via Online Visual Language Mapping,"C Xu, HT Nguyen, C Amato, LLS Wong - arXiv preprint arXiv:2310.10822, 2023 - arxiv.org","… , to ground the unstructured language instructions, we utilize a large language model (LLM) … +visual-language map using a large visual-language model (VLM). With the latest map and …",https://arxiv.org/abs/2310.10822,https://arxiv.org/pdf/2310.10822,,,,, +130,Exploring the limits of language modeling,"R Jozefowicz, O Vinyals, M Schuster… - arXiv preprint arXiv …, 2016 - arxiv.org","… and working on large data sets and models with clear benchmarks will help advance +Language Modeling. … and we hope it will enable similar gains when used to map onto words. …",https://arxiv.org/abs/1602.02410,https://arxiv.org/pdf/1602.02410.pdf%3C/p%3E%3Cp%3E,,,,, +131,A neural probabilistic language model,"Y Bengio, R Ducharme… - Advances in neural …, 2000 - proceedings.neurips.cc","… language modeling is not new either, eg [8]. In contrast, here we push this idea to a large +scale, and concentrate on learning a statistical model … A mapping C from any element of V to a …",https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html,https://proceedings.neurips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf,,,,, +132,A cache-based natural language model for speech recognition,"R Kuhn, R De Mori - IEEE transactions on pattern analysis and …, 1990 - ieeexplore.ieee.org","… Whatever solution we adopt will consist of mapping the set of … Let us denote this many to +one mapping by M. Thus, M(< W¡, … on a large scale, it could lead to pseudosemantic language …",https://ieeexplore.ieee.org/abstract/document/56193/,https://escholarship.mcgill.ca/downloads/nc580n69w,,,Solutions and Mitigations,, +133,Multi-mask label mapping for prompt-based learning,"J Qi, R Zhang, J Kim, J Chen, W Qin… - Proceedings of the AAAI …, 2023 - ojs.aaai.org","… for cloze-type language models to satisfy the conditions of few-shot learning. Further, +because lexical cues are proven to play a significant role in large language models like BERT/…",https://ojs.aaai.org/index.php/AAAI/article/view/26579,https://ojs.aaai.org/index.php/AAAI/article/download/26579/26351,,,,, +134,Palm: Scaling language modeling with pathways,"A Chowdhery, S Narang, J Devlin, M Bosma… - arXiv preprint arXiv …, 2022 - arxiv.org","… Large language models have been shown to achieve remarkable performance across a +variety of natural language … large language models and discuss potential mitigation strategies. …",https://arxiv.org/abs/2204.02311,https://arxiv.org/pdf/2204.02311,,,Solutions and Mitigations,, +135,,"H Sak, M Shannon, K Rao, F Beaufays - Interspeech, 2017",,,,,,,, +136,"Document language models, query models, and risk minimization for information retrieval","J Lafferty, C Zhai - Proceedings of the 24th annual international ACM …, 2001 - dl.acm.org","… As the models are highly lexical, it is unlikely that a sufficiently large collection … for future +work will be to go beyond the use of a single document and query model (as in MAP estimation). …",https://dl.acm.org/doi/abs/10.1145/383952.383970,https://www.academia.edu/download/30739205/10.1.1.69.116.pdf,,,,,Future Predictions +137,Making pre-trained language models better few-shot learners,"T Gao, A Fisch, D Chen - arXiv preprint arXiv:2012.15723, 2020 - arxiv.org","… 5More generally, we can consider a one-to-many mapping M: Y → 2|Y| in which we map … +For automatically template generation, we take the T5-3B13 model, which is the largest publicly …",https://arxiv.org/abs/2012.15723,https://arxiv.org/pdf/2012.15723,,,,, +138,"What do we mean by GenAI? A systematic mapping of the evolution, trends, and techniques involved in Generative AI","F García-Peñalvo, A Vázquez-Ingelmo - 2023 - reunir.unir.net","… Tools like ChatGPT, Dall-E, or Midjourney have democratized access to Large Language +Models, enabling the creation of human-like content. However, the concept 'Generative …",https://reunir.unir.net/handle/123456789/15134,https://reunir.unir.net/bitstream/handle/123456789/15134/ip2023_07_006.pdf?sequence=1&isAllowed=y,,,,, +139,"Machine-learning-based evidence and attribution mapping of 100,000 climate impact studies","M Callaghan, CF Schleussner, S Nath… - Nature climate …, 2021 - nature.com","… language model BERT to identify and classify studies on observed climate impacts, producing +a comprehensive machine-learning-assisted evidence map… , large language models such …",https://www.nature.com/articles/s41558-021-01168-6,https://www.researchsquare.com/article/rs-783398/latest.pdf,,,,, +140,Web mining for innovation ecosystem mapping: a framework and a large-scale pilot study,"J Kinne, J Axenbeck - Scientometrics, 2020 - Springer","… the field of natural language processing (NLP) (eg Mikolov et al. 2011, 2013a; Mikolov et al. +2013b), especially the ones involving artificial neural network language models, resulted an …",https://link.springer.com/article/10.1007/s11192-020-03726-9,https://link.springer.com/article/10.1007/s11192-020-03726-9,,,,, +141,A balanced data approach for evaluating cross-lingual transfer: Mapping the linguistic blood bank,"D Malkin, T Limisiewicz, G Stanovsky - arXiv preprint arXiv:2205.04086, 2022 - arxiv.org","… developers of large-scale multilingual language models in … ) combinations of bilingual +masked language models over our … language l2, based only on language modeling performance. …",https://arxiv.org/abs/2205.04086,https://arxiv.org/pdf/2205.04086,,,,, +142,Language model adaptation with MAP estimation and the perceptron algorithm,"M Bacchiani, B Roark, M Saraclar - Proceedings of HLT-NAACL …, 2004 - aclanthology.org","… language model adaptation approaches: MAP estimation and the perceptron algorithm. Used +in isolation, we show that MAP … Corrective language modeling for large vocabulary ASR …",https://aclanthology.org/N04-4006.pdf,https://aclanthology.org/N04-4006.pdf,,,,, +143,Dataset geography: Mapping language data to language users,"F Faisal, Y Wang, A Anastasopoulos - arXiv preprint arXiv:2112.03497, 2021 - arxiv.org","… Training large language models require huge amount of … trained language model as well +as the fine-tuned models often … other language side, using the respective language model for …",https://arxiv.org/abs/2112.03497,https://arxiv.org/pdf/2112.03497,,,,, +144,Amalgam: Automatic mapping among lexicogrammatical annotation models,"ES Atwell, J Hughes, DC Souter - … Approaches to Language …, 1994 - eprints.whiterose.ac.uk","… training sets into a large unified multicorpus. Our architecture combines standard statistical +language modelling and a rule-base derived from linguists' analyses of tagset-mappings, in a …",https://eprints.whiterose.ac.uk/81160/,https://eprints.whiterose.ac.uk/81160/1/AMALGAM.pdf,,,,, +145,Self-organized language modeling for speech recognition,"F Jelinek - Readings in speech recognition, 1990 - books.google.com","… language modeling for speech recognition, and to suggest some lines of possible solution +related to selforganized statistical information extraction from large … this unique mapping of w …",https://books.google.com/books?hl=en&lr=&id=iDHgboYRzmgC&oi=fnd&pg=PA450&dq=Large+Language+Model+Mapping&ots=jdeRGVPqmO&sig=xliTgM9Sk_3mXtRXEzqJO3YiSq4,https://www.academia.edu/download/30444460/10.1177_1470412911430581.pdf,,,Solutions and Mitigations,, +146,SciBERT: A pretrained language model for scientific text,"I Beltagy, K Lo, A Cohan - arXiv preprint arXiv:1903.10676, 2019 - arxiv.org","… version of SCIBERT analogous to BERT-Large, as well as experiment with different proportions +of papers from each domain. Because these language models are costly to train, we aim …",https://arxiv.org/abs/1903.10676,https://arxiv.org/pdf/1903.10676,Experiments,,,, +147,From captions to visual concepts and back,"H Fang, S Gupta, F Iandola… - Proceedings of the …, 2015 - openaccess.thecvf.com","… , language models, and multimodal similarity models learnt … Image model: We map images +to semantic vectors using the … In fact, we observe the largest improvement for nouns and …",http://openaccess.thecvf.com/content_cvpr_2015/html/Fang_From_Captions_to_2015_CVPR_paper.html,http://openaccess.thecvf.com/content_cvpr_2015/papers/Fang_From_Captions_to_2015_CVPR_paper.pdf,,,,, +148,Language models are unsupervised multitask learners,"A Radford, J Wu, R Child, D Luan… - OpenAI …, 2019 - insightcivic.s3.us-east-1.amazonaws …","… Preliminary experiments confirmed that sufficiently large language models are able to … +While it is a large step from the well-posed setup described above to the messiness of “language …",https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf,https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf,Experiments,,,, +149,Quasi-compositional mapping from form to meaning: A neural network-based approach to capturing neural responses during human language comprehension,"M Rabovsky, JL McClelland - … Transactions of the …, 2020 - royalsocietypublishing.org","… language models bear long-term promise to capture how … does not fully capture the mapping +from language to meaning, and in … in large-scale machine approaches to natural language …",https://royalsocietypublishing.org/doi/abs/10.1098/rstb.2019.0313,https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2019.0313?download=true,,,,, +150,Don't stop pretraining: Adapt language models to domains and tasks,"S Gururangan, A Marasović, S Swayamdipta… - arXiv preprint arXiv …, 2020 - arxiv.org","… We map unlabeled CHEMPROT and 1M BIOMED sentences to a shared vector space … +large pretrained language models to distant domains, and building reusable language models …",https://arxiv.org/abs/2004.10964,https://arxiv.org/pdf/2004.10964.pdf](https://arxiv.org/pdf/2004.10964.pdf),,,,, +151,Rail-kd: Random intermediate layer mapping for knowledge distillation,"MA Haidar, N Anchuri, M Rezagholizadeh… - arXiv preprint arXiv …, 2021 - arxiv.org","… output of teacher and student models) especially over large pre-trained language models. … +efforts required for setting up a proper layer mapping. To address these problems, we propose …",https://arxiv.org/abs/2109.10164,https://arxiv.org/pdf/2109.10164,,,,, +152,Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps,"Y Wei, T Zhang, H Zhang, T Zhong, L Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org","… as semantic redundancy and ambiguity, resulting in an inaccurate mapping to brain images. +On the other hand, large language models (LLMs) like ChatGPT have shown great potential …",https://arxiv.org/abs/2309.05021,https://arxiv.org/pdf/2309.05021,,,,, +153,The importance of query-concept-mapping for automatic video retrieval,"D Wang, X Li, J Li, B Zhang - Proceedings of the 15th ACM international …, 2007 - dl.acm.org","… a large lexicon of 311 learned semantic concept detectors. … based on query language model +gets an MAP of 0.051 (not … within 85% of 0.060, the MAP of the oracle which selects the best …",https://dl.acm.org/doi/abs/10.1145/1291233.1291293,https://www.researchgate.net/profile/Xirong-Li-2/publication/221572213_The_importance_of_query-concept-mapping_for_automatic_video_retrieval/links/0046351dec08db376b000000/The-importance-of-query-concept-mapping-for-automatic-video-retrieval.pdf,,,,, +154,Mapping the design-space of textual variability modeling languages: a refined analysis,"H Eichelberger, K Schmid - International Journal on Software Tools for …, 2015 - Springer","… languages, were introduced. We consider the recent trend in product line engineering towards +textual variability modeling languages as … specialized capabilities for large-scale models. …",https://link.springer.com/article/10.1007/s10009-014-0362-x,,,,,, +155,Map-based transparent persistence for very large models,"A Gómez, M Tisi, G Sunyé, J Cabot - … Conference, FASE 2015, Held as Part …, 2015 - Springer","… We argue that this strategy improves execution of model-driven tools on large models in real… +a map-based persistence model for MDE tools, arguing that persisting model graphs directly …",https://link.springer.com/chapter/10.1007/978-3-662-46675-9_2,https://inria.hal.science/hal-01140776/document,,,,, +156,Unifying visual-semantic embeddings with multimodal neural language models,"R Kiros, R Salakhutdinov, RS Zemel - arXiv preprint arXiv:1411.2539, 2014 - arxiv.org","… Given an image, we first map it into the multimodal space. From this embedding, we define +… We trained a Kneser-Ney trigram model on a large corpus and compute the logprobability of …",https://arxiv.org/abs/1411.2539,https://arxiv.org/pdf/1411.2539,,,,, +157,Building end-to-end dialogue systems using generative hierarchical neural network models,"I Serban, A Sordoni, Y Bengio, A Courville… - Proceedings of the AAAI …, 2016 - ojs.aaai.org","… on models which can be trained efficiently on large datasets … model can be performed as +in standard language modeling: … Finally, our analysis of the model MAP outputs suggests that …",https://ojs.aaai.org/index.php/AAAI/article/view/9883,https://ojs.aaai.org/index.php/AAAI/article/view/9883/9742,,,,, +158,Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition,"I Himawan, P Motlicek, D Imseng… - … , Speech and Signal …, 2015 - ieeexplore.ieee.org","… The AMI pronunciation dictionary of approximately 23K words is used in the experiments, +and the Viterbi decoding is performed using a 2-gram language model, previously built for …",https://ieeexplore.ieee.org/abstract/document/7178830/,https://infoscience.epfl.ch/record/207946/files/Himawan_ICASSP2015_2015.pdf,Experiments,,,, +159,Large scale classification in deep neural network with label mapping,"Q Zhang, KC Lee, H Bao, Y You, W Li… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org","… large, the classification problem will become infeasible because the required resources for +model … huge number of classes, such as language model of word level, image recognition of …",https://ieeexplore.ieee.org/abstract/document/8637580/,https://arxiv.org/pdf/1806.02507,,,,, +160,"Cortical language localization in left, dominant hemisphere: an electrical stimulation mapping investigation in 117 patients","G Ojemann, J Ojemann, E Lettich, M Berger - Journal of neurosurgery, 1989 - thejns.org","… Prior to language mapping, rolandic cortex was identified by stimulation and the threshold +… to be sampled with language mapping. Language mapping used the largest current that did …",https://thejns.org/view/journals/j-neurosurg/71/3/article-p316.xml,,,,,,Current Issues +161,Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages,"V Hai, X Xiao, ES Chng, H Li - IEICE TRANSACTIONS on …, 2014 - search.ieice.org","… a novel acoustic modeling technique of large vocabulary … trained acoustic models of other +languages (called source languages). … We assume that the language model and pronunciation …",https://search.ieice.org/bin/summary.php?id=e97-d_2_285,https://www.jstage.jst.go.jp/article/transinf/E97.D/2/E97.D_285/_pdf,,,,, +162,Mapping Brand Territories Using ChatGPT,"LF Rodriguez-Sarmiento, I Galpin… - … Conference on Applied …, 2024 - Springer","… Traditionally, mapping a brand territory is a largely manual, … a more efficient way of mapping +brand territory. Our approach … customer reviews using this Large Language Model, we show …",https://link.springer.com/chapter/10.1007/978-3-031-46813-1_3,,,,,, +163,Tinybert: Distilling bert for natural language understanding,"X Jiao, Y Yin, L Shang, X Jiang, X Chen, L Li… - arXiv preprint arXiv …, 2019 - arxiv.org","… However, pre-trained language models are usually … 4-layer student models demonstrate that: +1) There is a large perfor… We also investigate the effects of different mapping functions n = g(…",https://arxiv.org/abs/1909.10351,https://arxiv.org/pdf/1909.10351.pdf?trk=public_post_comment-text,,,,, +164,Character-aware neural language models,"Y Kim, Y Jernite, D Sontag, A Rush - … of the AAAI conference on artificial …, 2016 - ojs.aaai.org","… to obtain a feature map fk ∈ Rl−w+1 . Specifically, the i-th element of fk is given by: … +2005)—a common strategy for training language models with very large |V|—instead of the usual …",https://ojs.aaai.org/index.php/AAAI/article/view/10362,https://ojs.aaai.org/index.php/AAAI/article/download/10362/10221,,,,, +165,Mapping rules for building a Tunisian dialect lexicon and generating corpora,"R Boujelbane, ME Khemekhem… - … on natural language …, 2013 - aclanthology.org","… language modeling: studying large amounts of text to learn about patterns of words in a +language… However, a limited vocabulary is a problem if we want to model a language model for a …",https://aclanthology.org/I13-1048.pdf,https://aclanthology.org/I13-1048.pdf,,,,, +166,Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks,"M Plappert, C Mandery, T Asfour - Robotics and Autonomous Systems, 2018 - Elsevier","… -language model and language-to-motion model. … large variety of different human whole-body +motions. Conversely, we showed that our model is capable of generating a similarly large …",https://www.sciencedirect.com/science/article/pii/S0921889017306280,https://arxiv.org/pdf/1705.06400,,,,, +167,Mapping languages: The corpus of global language use,"J Dunn - Language Resources and Evaluation, 2020 - Springer","… It also avoids biasing the model toward language-domain pairs with a large number of samples. +For … Supervised text-based geolocation using language models on an adaptive grid. In …",https://link.springer.com/article/10.1007/s10579-020-09489-2,https://arxiv.org/pdf/2004.00798,,,,, +168,EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding,"Y Miao, M Gowayyed, F Metze - 2015 IEEE workshop on …, 2015 - ieeexplore.ieee.org","… A lexicon WFST encodes the mapping from sequences of … We apply the WSJ standard pruned +trigram language model in … trained models outperform the existing hybrid systems on large…",https://ieeexplore.ieee.org/abstract/document/7404790/,https://arxiv.org/pdf/1507.08240,,,,, +169,Self-organizing map models of language acquisition,"P Li, X Zhao - Frontiers in psychology, 2013 - frontiersin.org","… Language as a hallmark of human behavior thus received in-depth treatment in the original +PDP volumes, and connectionist language models … in large areas of the map, that is, large …",https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00828/full,https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00828/full,,,,, +170,Mapping the origins and expansion of the Indo-European language family,"R Bouckaert, P Lemey, M Dunn, SJ Greenhill… - Science, 2012 - science.org","… for the Anatolian hypothesis under a RRW model. This model allows large variation in rates +of … Further, the geographic centroid of the languages considered here falls within the broader …",https://www.science.org/doi/abs/10.1126/science.1219669,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112997/,,,,, +171,End-to-end learning of driving models from large-scale video datasets,"H Xu, Y Gao, F Yu, T Darrell - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com","… model is akin to a language model, which scores the likelihood of character or word sequences +given certain corpora; our model … to map from pixels to actuation, [2] proposed mapping …",http://openaccess.thecvf.com/content_cvpr_2017/html/Xu_End-To-End_Learning_of_CVPR_2017_paper.html,https://openaccess.thecvf.com/content_cvpr_2017/papers/Xu_End-To-End_Learning_of_CVPR_2017_paper.pdf,,,,, +172,High-gamma modulation language mapping with stereo-EEG: a novel analytic approach and diagnostic validation,"B Ervin, J Buroker, L Rozhkov, T Holloway… - Clinical …, 2020 - Elsevier","… Time to largest power change reliably localized … of language mapping with SEEG and +subdural grids. In addition, our HGM-model shows consistent localization of Neurosynth language …",https://www.sciencedirect.com/science/article/pii/S1388245720304995,,,,,, +173,Hierarchical probabilistic neural network language model,"F Morin, Y Bengio - International workshop on artificial …, 2005 - proceedings.mlr.press","… large number of models would be required and the whole … class-based statistical language +model by using the following … is a deterministic function c(.) mapping Y to C), so as to write …",http://proceedings.mlr.press/r5/morin05a/morin05a.pdf,http://proceedings.mlr.press/r5/morin05a/morin05a.pdf,,,,, +174,Understanding linguistic evolution by visualizing the emergence of topographic mappings,"H Brighton, S Kirby - Artificial life, 2006 - ieeexplore.ieee.org","… , we can consider it a topographic mapping [21]. Given the language model presented here, +… In other words, given a finite subset of some infinitely large language, the learning algorithm …",https://ieeexplore.ieee.org/abstract/document/6791988/,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=6a54db36afce548b5e4deefb3a4df0428a13a813,,,,,Current Issues +175,Latent semantic mapping: Principles and applications,JR Bellegarda - 2022 - books.google.com,"… language processing, including word clustering, document/topic clustering, large vocabulary +speech recognition language modeling… language modeling, where multispan language mod…",https://books.google.com/books?hl=en&lr=&id=1YByEAAAQBAJ&oi=fnd&pg=PP1&dq=Large+Language+Model+Mapping&ots=3p6U0xbwoU&sig=U_iyGqNdWKvaQ_b53dcJjbbj6Vc,,,,,, +176,,"P Fung, CY Ma, WK Liu - … European Conference on Speech Communication and …, 1999",,,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8a8a103e00ff540018684fd7dc28281d38d819ed,,,,, +177,… examination of neural basis of language processing: proposal of a dynamic hodotopical model from data provided by brain stimulation mapping during picture naming,"H Duffau, S Moritz-Gasser, E Mandonnet - Brain and language, 2014 - Elsevier","… test which involves a large network, thus adapted for … model for visual language processing +in humans (after the first step of visual recognition): a ventral stream is involved in mapping …",https://www.sciencedirect.com/science/article/pii/S0093934X13001090,https://www.researchgate.net/profile/Emmanuel-Mandonnet/publication/250918798_A_re-examination_of_neural_basis_of_language_processing_Proposal_of_a_dynamic_hodotopical_model_from_data_provided_by_brain_stimulation_mapping_during_picture_naming/links/5e92d09c4585150839d643a8/A-re-examination-of-neural-basis-of-language-processing-Proposal-of-a-dynamic-hodotopical-model-from-data-provided-by-brain-stimulation-mapping-during-picture-naming.pdf,,,,, +178,Mapping language to code in programmatic context,"S Iyer, I Konstas, A Cheung, L Zettlemoyer - arXiv preprint arXiv …, 2018 - arxiv.org","… , a new large dataset with over 100,000 examples consisting of Java classes from online +code repositories, and develop a new encoder-decoder architecture that models the interaction …",https://arxiv.org/abs/1808.09588,https://arxiv.org/pdf/1808.09588,,,,, +179,Cross-language phoneme mapping for phonetic search keyword spotting in continuous speech of under-resourced languages.,"E Tetariy, Y Bar-Yosef, V Silber-Varod, M Gishri… - Artif. Intell. Res., 2015 - Citeseer","… acoustic and language models, in addition to compiling a large … , a language model estimated +from target language data … even a language model estimated from the source language is …",https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d10efc528d426aa17e5ab36849a7c5005f7ac6bc,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d10efc528d426aa17e5ab36849a7c5005f7ac6bc,,,,, +180,A bit of progress in language modeling,"JT Goodman - Computer Speech & Language, 2001 - Elsevier","… language model directly into the search, or rescoring large … In order to deal with data sparsity, +they first map each word to a … The mapping is learned by backpropagation in the same way …",https://www.sciencedirect.com/science/article/pii/S0885230801901743,https://arxiv.org/pdf/cs/0108005.pdf%C3%AF%C2%BC%E2%80%B0_%C3%A5%C2%BE%C2%AE%C3%A8%C2%BD%C2%AF%C3%A7%C2%A0%E2%80%9D%C3%A7%C2%A9%C2%B6%C3%A9%E2%84%A2%C2%A2%C3%AF%C2%BC%C5%A1%C3%A5%20%C5%BD%C3%A7%E2%80%BA%E2%80%BA%C3%A9%C2%A1%C2%BF%C3%A5%C2%B7%C5%BE%C3%A9%E2%80%BA%C2%B7%C3%A5%C2%BE%C2%B7%C3%A8%E2%80%99%E2%84%A2%C3%A5%C2%BE%C2%B7%C3%AF%C2%BC%CB%86%C3%A7%C2%BE%C5%BD%C3%A5%E2%80%BA%C2%BD%C3%AF%C2%BC%E2%80%B0%C3%A3%E2%82%AC%E2%80%9A%C3%A6%C5%A0%E2%82%AC%C3%A6%C5%93%C2%AF%C3%A6%C5%A0%C2%A5%C3%A5%E2%80%98%C5%A0MSR-TR-2001-72%C3%A3%E2%82%AC%E2%80%9A,,,,, +181,Mapping between levels in the metamodel architecture,"JM Alvarez, A Evans, P Sammut - … Language. Modeling Languages …, 2001 - Springer","… languages. This paper argues that although MML takes a metamodeling approach to language +… In this nested architecture, the transformation of any model between its representations …",https://link.springer.com/chapter/10.1007/3-540-45441-1_4,https://www.researchgate.net/profile/Andy-Evans-2/publication/220868476_Mapping_between_Levels_in_the_Metamodel_Architecture/links/0deec53a5d6d7ac69e000000/Mapping-between-Levels-in-the-Metamodel-Architecture.pdf,,,,,Current Issues +182,Integrating and evaluating neural word embeddings in information retrieval,"G Zuccon, B Koopman, P Bruza… - Proceedings of the 20th …, 2015 - dl.acm.org","… translation language model for information retrieval. This … with MAP as objective measure5. +For example, for the Dirichlet … of neural translation language models and the large number of …",https://dl.acm.org/doi/abs/10.1145/2838931.2838936,https://eprints.qut.edu.au/91418/1/adcs2015_neural_translation_lm.pdf,,,,, +183,Context-aware cross-lingual mapping,"H Aldarmaki, M Diab - arXiv preprint arXiv:1903.03243, 2019 - arxiv.org","… from Language Models) is a recently-proposed deep model for … a bi-LSTM network trained +as a language model (Peters et al., … Since this can result in a very large dictionary, we capped …",https://arxiv.org/abs/1903.03243,https://arxiv.org/pdf/1903.03243,,,,, +184,Dataset cartography: Mapping and diagnosing datasets with training dynamics,"S Swayamdipta, R Schwartz, N Lourie, Y Wang… - arXiv preprint arXiv …, 2020 - arxiv.org","… unable to overfit to data) than those involving representations from large, pretrained +language models. Each data map plots 25K instances, for clarity, and are best viewed enlarged. …",https://arxiv.org/abs/2009.10795,https://arxiv.org/pdf/2009.10795,,,,,Current Issues +185,MAP adaptation of stochastic grammars,"M Bacchiani, M Riley, B Roark, R Sproat - Computer speech & language, 2006 - Elsevier","… a large amount of attention. For large vocabulary systems, an effective acoustic model will +… We then constructed adapted language models using the Scanmail model counts, the 25% …",https://www.sciencedirect.com/science/article/pii/S0885230804000798,https://www.academia.edu/download/39987529/MAP_adaptation_of_stochastic_grammars20151113-18899-shxf9n.pdf,,,,, +186,Conceptfusion: Open-set multimodal 3d mapping,"KM Jatavallabhula, A Kuwajerwala, Q Gu… - arXiv preprint arXiv …, 2023 - arxiv.org","… 6, we optionally adopt a large language model [55] for parsing language queries to an +appropriate composition of 3DSCs. For instance, the query “what is the distance between the …",https://arxiv.org/abs/2302.07241,https://arxiv.org/pdf/2302.07241,,,,, +187,ChatGPT as a mapping assistant: A novel method to enrich maps with generative AI and,"L Juhász, P Mooney, HH Hochmair - 2023 - researchgate.net","… AI as a mapping assistant for enhancing the efficiency of collaborative mapping. We present +… geographic information (VGI) and large language models (LLMs). Three analysts described …",https://www.researchgate.net/profile/Levente-Juhasz/publication/373512618_ChatGPT_as_a_mapping_assistant_A_novel_method_to_enrich_maps_with_generative_AI_and_content_derived_from_street-level_photographs/links/64ef644cf850d430c36a88a0/ChatGPT-as-a-mapping-assistant-A-novel-method-to-enrich-maps-with-generative-AI-and-content-derived-from-street-level-photographs.pdf,https://www.researchgate.net/profile/Levente-Juhasz/publication/373512618_ChatGPT_as_a_mapping_assistant_A_novel_method_to_enrich_maps_with_generative_AI_and_content_derived_from_street-level_photographs/links/64ef644cf850d430c36a88a0/ChatGPT-as-a-mapping-assistant-A-novel-method-to-enrich-maps-with-generative-AI-and-content-derived-from-street-level-photographs.pdf,,,,,Current Issues +188,"What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI","A Vázquez-Ingelmo, FJ García-Peñalvo - 2023 - repositorio.grial.eu","… Tools like ChatGPT, Dall-E, or Midjourney have democratized access to Large Language +Models, enabling the creation of human-like content. However, the concept 'Generative …",https://repositorio.grial.eu/handle/grial/2934,https://repositorio.grial.eu/bitstream/grial/2934/1/ip2023_07_006.pdf,,,,, +189,A common language for physical mapping of the human genome,"M Olson, L Hood, C Cantor, D Botstein - Science, 1989 - science.org","… This will solve the problem of merging data from many sources, eliminate the need for large +clone archives, and define a physical map that can evolve smoothly and naturally toward the …",https://www.science.org/doi/pdf/10.1126/science.2781285,https://www.academia.edu/download/66625714/57defdd02ea23777900ae497d122e487341a.pdf,,,,, +190,Bertmcn: Mapping colloquial phrases to standard medical concepts using bert and highway network,"KS Kalyan, S Sangeetha - Artificial Intelligence in Medicine, 2021 - Elsevier","… language models achieved promising results in many NLP tasks. Some of the popular deep +language representation models … language models over large volumes of text. Further these …",https://www.sciencedirect.com/science/article/pii/S0933365721000014,,,,,,Current Issues +191,Recurrent continuous translation models,"N Kalchbrenner, P Blunsom - … methods in natural language …, 2013 - aclanthology.org","… (RCTM) that map without loss of generality a sentence from … recurrent language model +underlying the generative aspect. … the model robustly applicable to a large number of languages …",https://aclanthology.org/D13-1176.pdf,https://aclanthology.org/D13-1176.pdf,,,,,Current Issues +192,History of awake mapping and speech and language localization: from modules to networks,"S Rahimpour, MM Haglund, AH Friedman… - Neurosurgical focus, 2019 - thejns.org","… The classic Broca-Wernicke model of cortical speech and language organization underwent +a paradigm shift in large part due to advances in brain mapping techniques. This initially …",https://thejns.org/focus/view/journals/neurosurg-focus/47/3/article-pE4.xml,https://thejns.org/focus/view/journals/neurosurg-focus/47/3/article-pE4.xml?tab_body=pdf,,,,, +193,Natural language processing for requirements engineering: A systematic mapping study,"L Zhao, W Alhoshan, A Ferrari, KJ Letsholo… - ACM Computing …, 2021 - dl.acm.org","… presents a large-scale systematic mapping study of the field. The mapping study reviews 404 +… several solutions are proposed to address a large spectrum of RE phases, tasks and types …",https://dl.acm.org/doi/abs/10.1145/3444689,https://arxiv.org/pdf/2004.01099,,,Solutions and Mitigations,,Current Issues +194,Automated speech generation from UN General Assembly statements: Mapping risks in AI generated texts,"J Bullock, M Luengo-Oroz - arXiv preprint arXiv:1906.01946, 2019 - arxiv.org","… and rapidly, and used for propaganda, disinformation and personal harm on a large scale. … +In training the language model we follow the methodology as laid out by Howard and Ruder …",https://arxiv.org/abs/1906.01946,https://arxiv.org/pdf/1906.01946,,,,, +195,DiviML: A Module-based Heuristic for Mapping Neural Networks onto Heterogeneous Platforms,"Y Ghannane, MS Abdelfattah - arXiv preprint arXiv:2308.00127, 2023 - arxiv.org","… In this case study, we investigate the use of our scheduler for a large language model (LLM), +GPT-3 [40], on a distributed heterogeneous platform as shown in Figure 8. This model …",https://arxiv.org/abs/2308.00127,https://arxiv.org/pdf/2308.00127,Case Studies,,,, +196,Geospatial and semantic mapping platform for massive COVID-19 scientific publication search,"X Ye, J Du, X Gong, S Na, W Li, S Kudva - Journal of Geovisualization and …, 2021 - Springer","… mapping platform to search and organize these large and unmapped digital collections. +The semantic map … Our system uses state-of-the-art language modeling methods to analyze …",https://link.springer.com/article/10.1007/s41651-021-00073-y,https://link.springer.com/article/10.1007/s41651-021-00073-y,,,,, +197,How multilingual is multilingual BERT?,"T Pires, E Schlinger, D Garrette - arXiv preprint arXiv:1906.01502, 2019 - arxiv.org","… These models can be pre-trained on large corpora of readily … data, relying on the induced +language model structure to facilitate … representation ability, mapping structures onto new …",https://arxiv.org/abs/1906.01502,https://arxiv.org/pdf/1906.01502,,,,,Current Issues +198,Sequence to sequence learning with neural networks,"I Sutskever, O Vinyals, QV Le - Advances in neural …, 2014 - proceedings.neurips.cc","… whenever large labeled training sets are available, they cannot be used to map sequences +… As typical neural language models rely on a vector representation for each word, we used a …",https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html,https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf,,,,,Current Issues +199,Latent semantic mapping: Dimensionality reduction via globally optimal continuous parameter modeling,"JR Bellegarda - IEEE Workshop on Automatic Speech …, 2005 - ieeexplore.ieee.org","… to other areas of natural language processing, including word clustering, document/topic +clustering, large vocabulary speech recognition language modeling, automated call routing, …",https://ieeexplore.ieee.org/abstract/document/1566490/,,,,,, +200,Mapping queries to the Linking Open Data cloud: A case study using DBpedia,"E Meij, M Bron, L Hollink, B Huurnink… - Journal of Web …, 2011 - Elsevier","… In order to use the concept descriptions, we adopt a language modeling for information … +In future work, we intend to perform a large-scale post-hoc evaluation in which we directly …",https://www.sciencedirect.com/science/article/pii/S1570826811000187,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5d42da23d389adb9628c36d8d4abab5d5139ef90,,,,,Future Predictions +201,Extracting training data from large language models,"N Carlini, F Tramer, E Wallace, M Jagielski… - 30th USENIX Security …, 2021 - usenix.org","… It has become common to publish large (billion parameter) language models that have … +querying the language model. We demonstrate our attack on GPT-2, a language model trained …",https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting,https://www.usenix.org/system/files/sec21-carlini-extracting.pdf,,,,, +202,Montage: A neural network language {Model-Guided}{JavaScript} engine fuzzer,"S Lee, HS Han, SK Cha, S Son - … Security Symposium (USENIX Security …, 2020 - usenix.org","… language model from training instances so that the model can … This way, the model considers +each fragment as a lexicon, … [16] trained a language model from a large number of PDF …",https://www.usenix.org/conference/usenixsecurity20/presentation/lee-suyoung,https://www.usenix.org/system/files/sec20-lee-suyoung.pdf,,,,, +203,Privacy risks of general-purpose language models,"X Pan, M Zhang, S Ji, M Yang - … IEEE Symposium on Security …, 2020 - ieeexplore.ieee.org","… over 90% for each language model when the shadow corpus size is larger than 100. +Moreover, we interestingly observe, a larger language model (GPT-2Large) is less robust than a …",https://ieeexplore.ieee.org/abstract/document/9152761/,https://secsys.fudan.edu.cn/_upload/article/files/83/cf/30cf2162490d965e57d40c5690df/33a28df5-f9d1-45d3-bfa1-971de54513b3.pdf,,,,, +204,Automated big text security classification,"K Alzhrani, EM Rudd, TE Boult… - … Intelligence and Security …, 2016 - ieeexplore.ieee.org","… large text corpus into smaller groups of similar paragraphs wherein multiple similarity-based +classification models can be built to predict a paragraph’s security … text security detection as …",https://ieeexplore.ieee.org/abstract/document/7745451/,https://arxiv.org/pdf/1610.06856,,,,, +205,S-gram: towards semantic-aware security auditing for ethereum smart contracts,"H Liu, C Liu, W Zhao, Y Jiang, J Sun - Proceedings of the 33rd ACM/IEEE …, 2018 - dl.acm.org","… Then, based on a S-gram language model, we can predict potential … large sequence of +generated tokens, S-gram leverages well-designed Ngram toolkits to build the language model …",https://dl.acm.org/doi/abs/10.1145/3238147.3240728,http://www.wingtecher.com/themes/WingTecherResearch/assets/papers/ase18-sgram.pdf,,,,, +206,Palm: Scaling language modeling with pathways,"A Chowdhery, S Narang, J Devlin, M Bosma… - arXiv preprint arXiv …, 2022 - arxiv.org","… Finally, we discuss the ethical considerations related to large language models and discuss +… In this work, we continue the scaling line of language modeling improvements and train a …",https://arxiv.org/abs/2204.02311,https://arxiv.org/pdf/2204.02311,,,,, +207,Palmtree: Learning an assembly language model for instruction embedding,"X Li, Y Qu, H Yin - … on Computer and Communications Security, 2021 - dl.acm.org","… pre-train an assembly language model called PalmTree for generating general-purpose +instruction embeddings by conducting self-supervised training on large-scale unlabeled binary …",https://dl.acm.org/doi/abs/10.1145/3460120.3484587,https://dl.acm.org/doi/pdf/10.1145/3460120.3484587,,,,, +208,Identifying security bug reports via text mining: An industrial case study,"M Gegick, P Rotella, T Xie - 2010 7th IEEE Working Conference …, 2010 - ieeexplore.ieee.org","… large number of BRs in text mining, the term-by-document frequency matrix can become +large… Our natural-language model has moderate success in classifying SBRs that bug reporters …",https://ieeexplore.ieee.org/abstract/document/5463340/,,,,,, +209,Asleep at the keyboard? assessing the security of github copilot's code contributions,"H Pearce, B Ahmad, B Tan… - … on Security and …, 2022 - ieeexplore.ieee.org","… , in which large models originally designed for natural language … code in a variety of languages +given some context such as … Copilot is built on a large language model that is trained on …",https://ieeexplore.ieee.org/abstract/document/9833571/,https://arxiv.org/pdf/2108.09293.pdf?trk=article-ssr-frontend-pulse_x-social-details_comments-action_comment-text,,,,, +210,Large language models for code: Security hardening and adversarial testing,"J He, M Vechev - 2023 - openreview.net","… the security of LMs for code in two complementary directions. First, we introduce security +hard… Second, we explore the potential of degrading LM’s security level from an adversarial …",https://openreview.net/forum?id=Km1XyJJVpS,https://openreview.net/pdf?id=Km1XyJJVpS,,,,, +211,"Unveiling Security, Privacy, and Ethical Concerns of ChatGPT","X Wu, R Duan, J Ni - arXiv preprint arXiv:2307.14192, 2023 - arxiv.org","… ], a highly capable large language model for natural language processing. GPT has exhibited +exceptional performance across a wide range of complex language tasks, positioning it as …",https://arxiv.org/abs/2307.14192,https://arxiv.org/pdf/2307.14192,,,,, +212,Sensec: Mobile security through passive sensing,"J Zhu, P Wu, X Wang, J Zhang - 2013 international conference …, 2013 - ieeexplore.ieee.org","… the language. We then train a continuos n-gram language model on those traces and use the +trained model for … our hypothesis and improve our model in the presence of large data sets. …",https://ieeexplore.ieee.org/abstract/document/6504251/,https://www.cic.ipn.mx/~pescamilla/MS/papers_2014/Zhuetal2013.pdf,,,,, +213,Language models are unsupervised multitask learners,"A Radford, J Wu, R Child, D Luan… - OpenAI …, 2019 - insightcivic.s3.us-east-1.amazonaws …","… At the core of our approach is language modeling. Language … is the head of the department +of homeland security 2017? … When a large language model is trained on a sufficiently large …",https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf,https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf,,,,, +214,Examining zero-shot vulnerability repair with large language models,"H Pearce, B Tan, B Ahmad, R Karri… - … on Security and …, 2023 - ieeexplore.ieee.org","… also generated by the language model without bugs, it is also worth noting that the LLMs +are capable of generating bug-free code even without additional context—assuming that they ‘…",https://ieeexplore.ieee.org/abstract/document/10179324/,https://arxiv.org/pdf/2112.02125,,,,, +215,Unlocking Hardware Security Assurance: The Potential of LLMs,"X Meng, A Srivastava, A Arunachalam, A Ray… - arXiv preprint arXiv …, 2023 - arxiv.org","… language model by enhancing and identifying the security property-related context in each +sentence. Further details about the security … by Wordnet, a large lexical database of English, …",https://arxiv.org/abs/2308.11042,https://arxiv.org/pdf/2308.11042,,,,, +216,VAE-Stega: linguistic steganography based on variational auto-encoder,"ZL Yang, SY Zhang, YT Hu, ZW Hu… - … Forensics and Security, 2020 - ieeexplore.ieee.org","… to learn the statistical language model of a large number of normal sentences, and then +automatically generate sentences based on the learned language model, and finally realize the …",https://ieeexplore.ieee.org/abstract/document/9193914/,https://www.researchgate.net/profile/Zhongliang-Yang-2/publication/343840385_VAE-Stega_Linguistic_Steganography_Based_on_Variational_Auto-Encoder/links/5f44616e299bf13404efa79b/VAE-Stega-Linguistic-Steganography-Based-on-Variational-Auto-Encoder.pdf,,,,, +217,Language models are few-shot learners,"T Brown, B Mann, N Ryder… - Advances in neural …, 2020 - proceedings.neurips.cc","… GPT-3, an autoregressive language model with 175 billion … When the test set is private, our +model is often too large to fit … To help with this, we can think in terms of traditional security risk …",https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_medium=email&utm_source=transaction,https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf,,,,, +218,SecureFalcon: The Next Cyber Reasoning System for Cyber Security,"MA Ferrag, A Battah, N Tihanyi, M Debbah… - arXiv preprint arXiv …, 2023 - arxiv.org","… SecureBERT [23] is a language model designed for cybersecurity … Penedo, “Falcon-40B: +an open large language model with … cyber threat detection with large language models,” arXiv …",https://arxiv.org/abs/2307.06616,https://arxiv.org/pdf/2307.06616,,,,, +219,Large language models in cryptocurrency securities cases: Can chatgpt replace lawyers?,"A Trozze, T Davies, B Kleinberg - arXiv preprint arXiv:2308.06032, 2023 - arxiv.org","… It is not appropriate to rely solely on an AI language model to generate such a document, +as it may not be accurate, up-to-date, or tailored to the specific legal context. Legal documents …",https://arxiv.org/abs/2308.06032,https://arxiv.org/pdf/2308.06032,,,,, +220,"Document language models, query models, and risk minimization for information retrieval","J Lafferty, C Zhai - Proceedings of the 24th annual international ACM …, 2001 - dl.acm.org","… A language model for each document is estimated, as well as a language model for each … +The query language model can be exploited to model user preferences, the context of a query, …",https://dl.acm.org/doi/abs/10.1145/383952.383970,https://www.academia.edu/download/30739205/10.1.1.69.116.pdf,,,,, +221,ChatGPT for good? On opportunities and challenges of large language models for education,"E Kasneci, K Seßler, S Küchemann, M Bannert… - Learning and individual …, 2023 - Elsevier","… large language models in education that were published since the release of the first large +language model … and security concerns when using large language models in education are: …",https://www.sciencedirect.com/science/article/pii/S1041608023000195,https://osf.io/preprints/edarxiv/5er8f/download,,,,, +222,Growing a pattern language (for security),"M Hafiz, P Adamczyk, RE Johnson - … on New ideas, new paradigms, and …, 2012 - dl.acm.org","… But the next step–building pattern languages–has proven much more difficult. This paper … +large pattern language for security: an approach that can be used to create pattern languages …",https://dl.acm.org/doi/abs/10.1145/2384592.2384607,https://hillside.net/plop/2011/papers/A-39-Hafiz.pdf,,,,, +223,Training language models to follow instructions with human feedback,"L Ouyang, J Wu, X Jiang, D Almeida… - Advances in …, 2022 - proceedings.neurips.cc","… a language model API, we collect a dataset of labeler demonstrations of the desired model +behavior, … Overall, our results indicate that fine-tuning large language models using human …",https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html,https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf,,,,, +224,Security applications of formal language theory,"L Sassaman, ML Patterson, S Bratus… - IEEE Systems …, 2013 - ieeexplore.ieee.org","… a large share of modern computing systems’ insecurity. We show how our approach leads +to advances in input validation, security modeling, … adornments to a language model will not …",https://ieeexplore.ieee.org/abstract/document/6553401/,https://digitalcommons.dartmouth.edu/cgi/viewcontent.cgi?article=1337&context=cs_tr,,,,, +225,Harnessing GPT-4 for generation of cybersecurity GRC policies: A focus on ransomware attack mitigation,"T McIntosh, T Liu, T Susnjak, H Alavizadeh, A Ng… - … & Security, 2023 - Elsevier","… ), a state-of-the-art large language model, in generating cybersecurity policies to deter and … +policies, with those from established security vendors and government cybersecurity agencies…",https://www.sciencedirect.com/science/article/pii/S0167404823003346,https://www.sciencedirect.com/science/article/pii/S0167404823003346,,,,, +226,A study of probabilistic password models,"J Ma, W Yang, M Luo, N Li - 2014 IEEE Symposium on Security …, 2014 - ieeexplore.ieee.org","… We also observe that research in password modeling can benefit from the extensive literature +in statistical language modeling. We conduct a systematic evaluation of a large number of …",https://ieeexplore.ieee.org/abstract/document/6956595/,https://www.ieee-security.org/TC/SP2014/papers/AStudyofProbabilisticPasswordModels.pdf,,,,, +227,Pointer sentinel mixture models,"S Merity, C Xiong, J Bradbury, R Socher - arXiv preprint arXiv:1609.07843, 2016 - arxiv.org","… that allows for learning long range dependencies, we also introduce a new benchmark +dataset for language modeling called WikiText. … We compare against the large model configu- …",https://arxiv.org/abs/1609.07843,https://arxiv.org/pdf/1609.07843,,,,, +228,Integrating and evaluating neural word embeddings in information retrieval,"G Zuccon, B Koopman, P Bruza… - Proceedings of the 20th …, 2015 - dl.acm.org","… well known translation language model for information retrieval. This language model captures +… our implementation of neural translation language models and the large number of word …",https://dl.acm.org/doi/abs/10.1145/2838931.2838936,https://eprints.qut.edu.au/91418/1/adcs2015_neural_translation_lm.pdf,,,,, +229,RNN-stega: Linguistic steganography based on recurrent neural networks,"ZL Yang, XQ Guo, ZM Chen, YF Huang… - … and Security, 2018 - ieeexplore.ieee.org","… model with a large number of artificially generated samples and obtained a good estimate of +the statistical language model. … According to the well-trained statistical language model, we …",https://ieeexplore.ieee.org/abstract/document/8470163/,https://www.researchgate.net/profile/Zhongliang-Yang-2/publication/327852903_RNN-Stega_Linguistic_Steganography_Based_on_Recurrent_Neural_Networks/links/5bec313b92851c6b27be06d9/RNN-Stega-Linguistic-Steganography-Based-on-Recurrent-Neural-Networks.pdf,,,,, +230,Federated learning for mobile keyboard prediction,"A Hard, K Rao, R Mathews, S Ramaswamy… - arXiv preprint arXiv …, 2018 - arxiv.org","… Training a prediction model requires a large data sample that … We show that a CIFG language +model trained from scratch … Federated learning offers security and privacy advantages for …",https://arxiv.org/abs/1811.03604,https://arxiv.org/pdf/1811.03604,,,,,Future Predictions +231,A meta language for threat modeling and attack simulations,"P Johnson, R Lagerström, M Ekstedt - … , Reliability and Security, 2018 - dl.acm.org","… model-driven security engineering we also find domain-specific languages intended for security +… The field of model-driven security engineering includes quite a large number of domain-…",https://dl.acm.org/doi/abs/10.1145/3230833.3232799,https://www.researchgate.net/profile/Pontus-Johnson/publication/327005933_A_Meta_Language_for_Threat_Modeling_and_Attack_Simulations/links/5bfe8cb445851523d151b167/A-Meta-Language-for-Threat-Modeling-and-Attack-Simulations.pdf,,,,, +232,An investigative design based statistical approach for determining Bangla sentence validity,"MR Rahman, MT Habib, MS Rahman… - … and Network Security, 2016 - researchgate.net","… language model combined with Witten-Bell smoothing and Backoff language modeling … +score produced by a language model (LM) learned from a large corpus of correct sentences…",https://www.researchgate.net/profile/Riazur-Rahman/publication/311693706_An_Investigative_Design_Based_Statistical_Approach_for_Determining_Bangla_Sentence_Validity/links/5854ce5b08ae8f695553d724/An-Investigative-Design-Based-Statistical-Approach-for-Determining-Bangla-Sentence-Validity.pdf,https://www.researchgate.net/profile/Riazur-Rahman/publication/311693706_An_Investigative_Design_Based_Statistical_Approach_for_Determining_Bangla_Sentence_Validity/links/5854ce5b08ae8f695553d724/An-Investigative-Design-Based-Statistical-Approach-for-Determining-Bangla-Sentence-Validity.pdf,,,,, +233,Malware detection by analysing network traffic with neural networks,"P Prasse, L Machlica, T Pevný… - 2017 IEEE Security …, 2017 - ieeexplore.ieee.org","… method based on a neural language model and a long short-term memory (LSTM) +network. We study the method’s ability to detect new malware in a largescale empirical study. …",https://ieeexplore.ieee.org/abstract/document/8227308/,https://www.researchgate.net/profile/Tomas-Pevny/publication/322002101_Malware_Detection_by_Analysing_Network_Traffic_with_Neural_Networks/links/5bc85c67a6fdcc03c78f58e4/Malware-Detection-by-Analysing-Network-Traffic-with-Neural-Networks.pdf,,,,, +234,Attacks on lexical natural language steganography systems,"CM Taskiran, U Topkara, M Topkara… - Security …, 2006 - spiedigitallibrary.org","… Section 4 introduces the language modeling scheme used by our system. In Section 5 we +… piece of text according to the model. SRILM provides a large number of parameters for LM …",https://www.spiedigitallibrary.org/conference-proceedings-of-spie/6072/607209/Attacks-on-lexical-natural-language-steganography-systems/10.1117/12.649551.short,https://www.researchgate.net/profile/Umut-Topkara/publication/249915757_Attacks_on_Lexical_Natural_Language_Steganography_Systems/links/55d35ad808aec1b0429f36bc/Attacks-on-Lexical-Natural-Language-Steganography-Systems.pdf,,,,, +235,A domain-specific language for modelling security objectives in a business process models of soa applications,"M Saleem, J Jaafar, M Hassan - AISS, 2012 - researchgate.net","… We have presented a DSL, to model the security requirements along the business process +model. We … It is very clumsy to add domain-specific restrictions in large languages like UML; …",https://www.researchgate.net/profile/Jafreezal-Jaafar/publication/265026009_A_Domain-Specific_Language_for_Modelling_Security_Objectives_in_a_Business_Process_Models_of_SOA_Applications/links/5440743b0cf2fd72f99dd9b8/A-Domain-Specific-Language-for-Modelling-Security-Objectives-in-a-Business-Process-Models-of-SOA-Applications.pdf,https://www.researchgate.net/profile/Jafreezal-Jaafar/publication/265026009_A_Domain-Specific_Language_for_Modelling_Security_Objectives_in_a_Business_Process_Models_of_SOA_Applications/links/5440743b0cf2fd72f99dd9b8/A-Domain-Specific-Language-for-Modelling-Security-Objectives-in-a-Business-Process-Models-of-SOA-Applications.pdf,,,,,Current Issues +236,Prefix-tuning: Optimizing continuous prompts for generation,"XL Li, P Liang - arXiv preprint arXiv:2101.00190, 2021 - arxiv.org","… large pretrained language models to perform downstream tasks. However, it modifies all the +language model … ural language generation tasks, which keeps language model parameters …",https://arxiv.org/abs/2101.00190,https://arxiv.org/pdf/2101.00190,,,,, +237,On challenges of AI to cognitive security and safety,"R Huang, X Zheng, Y Shang, X Xue - Security and Safety, 2023 - sands.edpsciences.org","… large language models (LLMs). As with any emerging technology, it is a two-sided coin, bringing +not only vast social impacts but also significant security … to explore the security concerns …",https://sands.edpsciences.org/articles/sands/abs/2023/01/sands20230010/sands20230010.html,https://sands.edpsciences.org/articles/sands/full_html/2023/01/sands20230010/sands20230010.html?utm_source=TrendMD&utm_medium=cpc&utm_campaign=Security_and_Safety_(S%2526S)_TrendMD_0,,,,, +238,A deep learning-based RNNs model for automatic security audit of short messages,"L You, Y Li, Y Wang, J Zhang… - 2016 16th International …, 2016 - ieeexplore.ieee.org","… model will provide support for police’s manual review, relieve their working pressure to a large +degree, and improve prison security … excellent, especially in Language modeling, speech …",https://ieeexplore.ieee.org/abstract/document/7751626/,,,,,, +239,"PCySeMoL: Predictive, Probabilistic Cyber Security Modeling Language","H Holm, K Shahzad, M Buschle… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org","… state of the physical process can have severe effects for the society at large. As a result, +various common cyber security tools that can have a negative impact on availability are not used …",https://ieeexplore.ieee.org/abstract/document/6990572/,https://www.diva-portal.org/smash/record.jsf?pid=diva2:690780,,,,, +240,Web spam detection: new classification features based on qualified link analysis and language models,"L Araujo, J Martinez-Romo - … Information Forensics and Security, 2010 - ieeexplore.ieee.org","… Thus, we apply a language model approach to different … fewer features, on two large and +public datasets such as … This information is very useful because we will be able to detect a large …",https://ieeexplore.ieee.org/abstract/document/5475235/,http://e-spacio.uned.es/fez/eserv/bibliuned:DptoLSI-ETSI-MA2VICMR-1080/Documento.pdf,,,,, +241,Topic modeling: beyond bag-of-words,"HM Wallach - Proceedings of the 23rd international conference on …, 2006 - dl.acm.org","… Given these counts, the aim of bigram language modeling is to … estimator has too large +a variance to be used by itself. … language model and Blei et al.’s latent Dirichlet allocation. …",https://dl.acm.org/doi/abs/10.1145/1143844.1143967,https://people.cs.umass.edu/~wallach/talks/beyond_bag-of-words.pdf,,,,, +242,Gender bias in neural natural language processing,"K Lu, P Mardziel, F Wu, P Amancharla… - … , Language, and Security …, 2020 - Springer","… In this section we briefly summarize requisite elements of neural coreference resolution +and language modeling systems: scoring layers and loss evaluation, performance measures, …",https://link.springer.com/chapter/10.1007/978-3-030-62077-6_14,https://arxiv.org/pdf/1807.11714,,,Solutions and Mitigations,, +243,Identifying and mitigating the security risks of generative ai,"C Barrett, B Boyd, E Burzstein, N Carlini… - arXiv preprint arXiv …, 2023 - arxiv.org","… Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion +models… However, the language modeling objective used for training – predicting the next token – …",https://arxiv.org/abs/2308.14840,https://arxiv.org/pdf/2308.14840,,,,, +244,{A4NT}: Author attribute anonymity by adversarial training of neural machine translation,"R Shetty, B Schiele, M Fritz - … Security Symposium (USENIX Security 18), 2018 - usenix.org","… with large … language models are good at producing grammatically correct text. The likelihood +of the sentence produced by our A4NT model s under an unconditional language model, My…",https://www.usenix.org/conference/usenixsecurity18/presentation/shetty,https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-shetty.pdf,,,,, +245,Language-based information-flow security,"A Sabelfeld, AC Myers - IEEE Journal on selected areas in …, 2003 - ieeexplore.ieee.org","… It is unrealistic to assume that all the programs in a large computing system are trustworthy; +security mechanisms such as signature verification and antivirus scanning do not provide …",https://ieeexplore.ieee.org/abstract/document/1159651/,https://www.cs.cornell.edu/andru/papers/jsac/sm-jsac03.pdf,,,,, +246,Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review,"P Cheng, Z Wu, W Du, G Liu - arXiv preprint arXiv:2309.06055, 2023 - arxiv.org","… Model with Fine-tuning The question of whether downloading untrusted pre-trained weights +can pose a security hazard is raised in light of the fact that large … crafted language model that …",https://arxiv.org/abs/2309.06055,https://arxiv.org/pdf/2309.06055,,,,, +247,Learning differentially private recurrent language models,"HB McMahan, D Ramage, K Talwar… - arXiv preprint arXiv …, 2017 - arxiv.org","… updates together, enabling large-step model updates. … language model trained with strong +privacy guarantees in §3, showing no significant decrease in model accuracy given a large …",https://arxiv.org/abs/1710.06963,https://arxiv.org/pdf/1710.06963,,,,, +248,Malware detection on highly imbalanced data through sequence modeling,"R Oak, M Du, D Yan, H Takawale, I Amit - … intelligence and security, 2019 - dl.acm.org","… We achieve a near-perfect F1 score of almost 0.985 on a large dataset having 183, 000 +samples, … Using the state-of-the-art language model BERT, we achieve an F1 score of 0.919 on a …",https://dl.acm.org/doi/abs/10.1145/3338501.3357374,https://dl.acm.org/doi/pdf/10.1145/3338501.3357374,,,,, +249,The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification,"N Tihanyi, T Bisztray, R Jain, MA Ferrag… - arXiv preprint arXiv …, 2023 - arxiv.org","… This paper presents the FormAI dataset, a large collection of … diverse set of programs utilizing +Large Language Models (LLMs). … Bounded Model Checker (ESBMC), which exploits model …",https://arxiv.org/abs/2307.02192,https://arxiv.org/pdf/2307.02192,,,,,Current Issues +250,Automated extraction of security policies from natural-language software documents,"X Xiao, A Paradkar, S Thummalapenta… - Proceedings of the ACM …, 2012 - dl.acm.org","… of ACPs is crucial to prevent security vulnerabilities. However, in practice, ACPs are +commonly written in Natural Language (NL) and buried in large documents such as requirements …",https://dl.acm.org/doi/abs/10.1145/2393596.2393608,https://repository.lib.ncsu.edu/bitstream/handle/1840.4/4295/TR-2011-7.pdf?sequence=1,,,,, +251,Analysis of security data from a large computing organization,"A Sharma, Z Kalbarczyk, J Barlow… - 2011 IEEE/IFIP 41st …, 2011 - ieeexplore.ieee.org","… This conclusion clearly does not represent the state of security in 2010 as indicated by our +study of data on security attacks, which occurred in a large computing organization over the …",https://ieeexplore.ieee.org/abstract/document/5958263/,https://www.academia.edu/download/74326863/Analysis_of_security_data_from_a_large_c20211107-7243-1gteu0x.pdf,,,,,Current Issues +252,SecureUML: A UML-based modeling language for model-driven security,"T Lodderstedt, D Basin, J Doser - … on the Unified Modeling Language, 2002 - Springer","… It is an industry standard with strong security support, which is implemented by a large +number of application servers. Due to lack of space, we only describe the basic concepts of EJB, …",https://link.springer.com/chapter/10.1007/3-540-45800-X_33,https://ethz.ch/content/dam/ethz/special-interest/infk/inst-infsec/information-security-group-dam/research/publications/pub2002/SecureUML.pdf,,,,, +253,A survey of text data augmentation,"P Liu, X Wang, C Xiang, W Meng - … and Network Security  …, 2020 - ieeexplore.ieee.org","… However, the pre-trained model is different from this. If we can add tag … the model, we can +get good results by using a large number of unlabeled data for pre-training of language model. …",https://ieeexplore.ieee.org/abstract/document/9240734/,,,,,, +254,Modeling and verifying security policies in business processes,"M Salnitri, F Dalpiaz, P Giorgini - … on Business Process Modeling …, 2014 - Springer","… These systems manage a large amount of private and … , to model security policies using +the security annotations in Table 2. Our query language permits to graphically model security …",https://link.springer.com/chapter/10.1007/978-3-662-43745-2_14,http://www.disi.unitn.it/~pgiorgio/papers/BPMDS-14.pdf,,,,, +255,Model driven security: From UML models to access control infrastructures,"D Basin, J Doser, T Lodderstedt - ACM Transactions on Software …, 2006 - dl.acm.org","… modeling language for this process, we propose a general schema for constructing such +languages that combines languages for modeling … as a large step towards integrating security …",https://dl.acm.org/doi/abs/10.1145/1125808.1125810,https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/69242/1/eth-4475-01.pdf,,,,, +256,The AVANTSSAR platform for the automated validation of trust and security of service-oriented architectures,"A Armando, W Arsac, T Avanesov, M Barletta… - … 2012, Held as Part of the …, 2012 - Springer","… We have applied the platform to a large number of exemplary industrial case studies, which +we have collected into the AVANTSSAR Library of validated problem cases. In doing so, we …",https://link.springer.com/chapter/10.1007/978-3-642-28756-5_19,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1f76aba2e1752625b34b6a9d7d9b11a61903bf31#page=284,,,,, +257,The cyber security modeling language: A tool for assessing the vulnerability of enterprise system architectures,"T Sommestad, M Ekstedt, H Holm - IEEE Systems Journal, 2012 - ieeexplore.ieee.org","… the security of an enterprise’s system architecture, a large … tool called the cyber security +modeling language (CySeMoL). This … model (PRM) [2] to support system-security managers in …",https://ieeexplore.ieee.org/abstract/document/6378394/,https://www.diva-portal.org/smash/get/diva2:561243/FULLTEXT02,,,,, +258,Communication-efficient learning of deep networks from decentralized data,"B McMahan, E Moore, D Ramage… - Artificial intelligence …, 2017 - proceedings.mlr.press","… can significantly reduce privacy and security risks by limiting the … image classification and +language modeling tasks where … over clients, we evaluate on a large language modeling task. …",https://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com,http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf,,,,, +259,Speak their language: Designing effective messages to improve employees' information security decision making,"AC Johnston, M Warkentin, AR Dennis… - Decision …, 2019 - Wiley Online Library","… within a large US … security-focused fear appeal message to use language that is more +personally relevant to employees without altering the core tenets of the appeal? Within a large …",https://onlinelibrary.wiley.com/doi/abs/10.1111/deci.12328,https://jyx.jyu.fi/bitstream/handle/123456789/63605/1/johnstonwarkentinetalspeaktheirlanguage.pdf,,,,, +260,OFMC: A symbolic model checker for security protocols,"D Basin, S Mödersheim, L Vigano - … Journal of Information Security, 2005 - Springer","… , an on-the-fly model checker for security protocol analysis. We have carried out a large +number of … Moreover, we have successfully applied OFMC to a number of large-scale protocols …",https://link.springer.com/article/10.1007/s10207-004-0055-7,https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/52641/10207_2004_Article_55.pdf?sequence=2,,,,, +261,Analyzing information leakage of updates to natural language models,"S Zanella-Béguelin, L Wutschitz, S Tople… - … security, 2020 - dl.acm.org","… We show that a differential analysis of language model snapshots before and after an update +… We combined this large dataset with this (relatively low-capacity) model to test if our results …",https://dl.acm.org/doi/abs/10.1145/3372297.3417880,https://arxiv.org/pdf/1912.07942,,,,, +262,MOPS: an infrastructure for examining security properties of software,"H Chen, D Wagner - … on Computer and Communications Security, 2002 - dl.acm.org","… range of security vulnerabilities in large programs efficiently. … in the caller, and the language +generated with a stack is … we model the set T of feasible traces as a context free language. It …",https://dl.acm.org/doi/abs/10.1145/586110.586142,https://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-02-1197.pdf,,,,, +263,A high level protocol specification language for industrial security-sensitive protocols,"Y Chevalier, L Compagna, J Cuellar… - … of Security …, 2004 - inria.hal.science","… with a temporal logic semantics to formalise security properties gives us great generality … +specification languages like HLPSL — for the analysis of large-scale Internet security-sensitive …",https://inria.hal.science/inria-00099882/document,https://inria.hal.science/inria-00099882/document,,,,, +264,Keyboard acoustic emanations revisited,"L Zhuang, F Zhou, JD Tygar - … on Information and System Security  …, 2009 - dl.acm.org","… For English text, the previously described spelling and grammar language model is used to +further correct the result. To distinguish between two types … We find the path with the largest: …",https://dl.acm.org/doi/abs/10.1145/1609956.1609959,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5a65641f9af29c4f63f026734c3b26290bf4c817,,,,, +265,Netspa: A network security planning architecture,ML Artz - 2002 - dspace.mit.edu,"… in the model checker's state transition language. This large setup cost is prohibitive for normal +security … In a similar vein, Sheyner et al. also use a modified model checker to create attack …",https://dspace.mit.edu/handle/1721.1/29899,https://dspace.mit.edu/bitstream/handle/1721.1/29899/51072296-MIT.pdf?sequence=2&isAllowed=y,,,,, +266,Model-based risk assessment to improve enterprise security,"JO Aagedal, F Den Braber, T Dimitrakos… - Proceedings. Sixth …, 2002 - ieeexplore.ieee.org","… However, UML is a large language and its use in different phases of system evolution is +not standardised. In this paper we show how UML can be used to document both the target of …",https://ieeexplore.ieee.org/abstract/document/1137696/,https://www.researchgate.net/profile/Ketil-Stolen/publication/3985784_Model-based_risk_assessment_to_improve_enterprise_security/links/614de1d2154b3227a8a8a329/Model-based-risk-assessment-to-improve-enterprise-security.pdf,,,,, +267,"Binder, a logic-based security language","J DeTreville - Proceedings 2002 IEEE Symposium on Security …, 2002 - ieeexplore.ieee.org","… a security language, used to express security statements in a distributed system. Most existing +security languages encode security … logic-based security language that encodes security …",https://ieeexplore.ieee.org/abstract/document/1004365/,https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2002-21.doc,,,,, +268,The safe-tcl security model,"JK Ousterhout, JY Levy, BB Welch - 1998 - Springer","… This makes it difficult to provide a single security policy with a large variety of features; instead, +it encourages a large number of smaller, specialized security policies. The second lesson …",https://link.springer.com/chapter/10.1007/3-540-68671-1_12,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=3c60ba5e7b5e75c85e43b1bf01a4b4101be67b8d,,Policy and Regulation,,, +269,The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions,"U Topkara, M Topkara, MJ Atallah - … on Multimedia and security, 2006 - dl.acm.org","… Previous schemes for information hiding in natural language … based on conformity to a +language model (and not in reference … A large number of these techniques are based on WordNet, …",https://dl.acm.org/doi/abs/10.1145/1161366.1161397,http://umut.topkara.org/papers/ToToAt_MMSEC06.pdf,,,,, +270,Recognizing functions in binaries with neural networks,"ECR Shin, D Song, R Moazzezi - … security symposium (USENIX Security …, 2015 - usenix.org","… Since we need to compute the gradient a very large number of times during optimization, … +[3] first used neural networks to make a language model. Language models give a probability …",https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/shin,https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-shin.pdf,,,,, +271,Security-typed languages for implementation of cryptographic protocols: A case study,"A Askarov, A Sabelfeld - … Symposium on Research in Computer Security …, 2005 - Springer","… deploying security-typed languages. Motivated … security-typed language Jif. To the best of +our knowledge, this implementation is the largest program written in a security-typed language …",https://link.springer.com/chapter/10.1007/11555827_12,https://www.researchgate.net/profile/Paul-Syverson/publication/239577558_Computer_Security_-_ESORICS_2005_10th_European_Symposium_on_Research_in_Computer_Security_Milan_Italy_September_12-14_2005_Proceedings/links/0046352de80edc5626000000/Computer-Security-ESORICS-2005-10th-European-Symposium-on-Research-in-Computer-Security-Milan-Italy-September-12-14-2005-Proceedings.pdf#page=207,,,,, +272,Towards security defect prediction with AI,"CD Sestili, WS Snavely, NM VanHoudnos - arXiv preprint arXiv …, 2018 - arxiv.org","… questions [10], [11]; and a language modeling technique that represents words from natural-… +to model security defects. The most significant barrier to this work is the lack of large enough …",https://arxiv.org/abs/1808.09897,https://arxiv.org/pdf/1808.09897,,,,,Current Issues +273,"Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence","M Bayer, T Frey, C Reuter - Computers & Security, 2023 - Elsevier","… (T) New insights on data augmentation with large pre-trained language models. In our study… +As in the former, we used the large language model GPT-3 with a prompting strategy and …",https://www.sciencedirect.com/science/article/pii/S0167404823003401,https://arxiv.org/pdf/2207.11076,,,,, +274,English shellcode,"J Mason, S Small, F Monrose… - … communications security, 2009 - dl.acm.org","… Generate and train a natural language model with a large and diverse corpus of English text… +To do so, we traverse the language model using the Viterbi algorithm [24]. Viterbi is used to …",https://dl.acm.org/doi/abs/10.1145/1653662.1653725,https://www.cs.umd.edu/class/fall2019/cmsc818O/papers/english-shellcode.pdf,,,,, +275,"Short text, large effect: Measuring the impact of user reviews on android app security & privacy","DC Nguyen, E Derr, M Backes… - … Security and Privacy (SP …, 2019 - ieeexplore.ieee.org","… and security- & privacy-related changes in apps. Using natural language processing on +4.5M user reviews for the top 2,583 apps in Google Play, we identified 5,527 security and …",https://ieeexplore.ieee.org/abstract/document/8835383/,https://publications.cispa.saarland/2815/1/main_sp.pdf,,,,, +276,A decade of model-driven security,"D Basin, M Clavel, M Egea - Proceedings of the 16th ACM symposium …, 2011 - dl.acm.org","… model transformations. For example, in multi-tier systems, we used model transformations to +transform a security … This large expansion was due to the high level of abstraction provided …",https://dl.acm.org/doi/abs/10.1145/1998441.1998443,https://infsec.ethz.ch/content/dam/ethz/special-interest/infk/inst-infsec/information-security-group-dam/research/publications/pub2011/sacmat11.pdf,,,,, +277,The Safe-Tcl Security Model.,"JY Levy, L Demailly, JK Ousterhout… - USENIX Annual Technical …, 1998 - usenix.org","… If a security policy included a large number of features, it … encourages a large number of +smaller, specialized security policies. … a scripting language to implement a security model is both …",https://www.usenix.org/publications/library/proceedings/usenix98/full_papers/levy/levy.pdf,https://www.usenix.org/publications/library/proceedings/usenix98/full_papers/levy/levy.pdf,,Policy and Regulation,,, +278,Effect of grammar on security of long passwords,"A Rao, B Jha, G Kini - … ACM conference on Data and application security …, 2013 - dl.acm.org","… We use fixed entropy estimate of 1.75 bits per character for printed English from[18], which +derived the estimate using a 3-word gram language model trained on large amounts of …",https://dl.acm.org/doi/abs/10.1145/2435349.2435395,http://reports-archive.adm.cs.cmu.edu/anon/anon/home/ftp/isr2012/CMU-ISR-12-113.pdf,,,,, +279,What security questions do developers ask? a large-scale study of stack overflow posts,"XL Yang, D Lo, X Xia, ZY Wan, JL Sun - Journal of Computer Science and …, 2016 - Springer","… for security researchers, security educators and security … , we conduct a large-scale study +on security-related questions … and security-related questions occupy a large proportion and …",https://link.springer.com/article/10.1007/s11390-016-1672-0,https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=4578&context=sis_research,,,,, +280,Language identification of encrypted voip traffic: Alejandra y roberto or alice and bob?,"CV Wright, L Ballard, F Monrose… - USENIX Security …, 2007 - usenix.org","… model (GMM) classification and techniques based on single-language phone recognition and +n-gram language modeling… , we evaluate our techniques on a large corpus of traffic from dif…",https://www.usenix.org/event/sec07/tech/full_papers/wright/wright_html/,https://www.usenix.org/event/sec07/tech/full_papers/wright/wright_html/,,,,, +281,Deeplog: Anomaly detection and diagnosis from system logs through deep learning,"M Du, F Li, G Zheng, V Srikumar - … and communications security, 2017 - dl.acm.org","… problem of language modeling, widely studied by the natural language … e typical language +modeling approach for assigning … We rst compare its e ectiveness on large system logs with …",https://dl.acm.org/doi/abs/10.1145/3133956.3134015,https://dl.acm.org/doi/pdf/10.1145/3133956.3134015,,,,, +282,Automated crowdturfing attacks and defenses in online review systems,"Y Yao, B Viswanath, J Cryan, H Zheng… - … communications security, 2017 - dl.acm.org","… the generative language model. Popular sites like Yelp have already released large review +… This dataset contains the reviews generated by our RNN language model. We use the attack …",https://dl.acm.org/doi/abs/10.1145/3133956.3133990,https://dl.acm.org/doi/pdf/10.1145/3133956.3133990,,,,, +283,A markov random field model for term dependencies,"D Metzler, WB Croft - Proceedings of the 28th annual international ACM …, 2005 - dl.acm.org","… Much like the language modeling framework, our model does not … security measures, or +security measures appear in a doc… As further evidence of the power of these models on large …",https://dl.acm.org/doi/abs/10.1145/1076034.1076115,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=3eae360c6ee52950f27f577aedd5f9934a04e137,,,,, +284,Acoustic {Side-Channel} attacks on printers,"M Backes, M Dürmuth, S Gerling, M Pinkal… - 19th USENIX Security …, 2010 - usenix.org","… large range of domains and thus make our model robust in the face of arbitrary input texts, +we train the language model … taken into account by the language model. Higher values for n …",https://www.usenix.org/event/sec10/tech/full_papers/Backes.pdf,https://www.usenix.org/event/sec10/tech/full_papers/Backes.pdf,,,,, +285,"Database security-concepts, approaches, and challenges","E Bertino, R Sandhu - IEEE Transactions on Dependable and …, 2005 - ieeexplore.ieee.org","… security and summarize the most well-known techniques. We focus on access control +systems, on which a large … , and the role-based access control (RBAC) model. We also discuss …",https://ieeexplore.ieee.org/abstract/document/1416861/,https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2005-99.ps,,,,, +286,Clockwork finance: Automated analysis of economic security in smart contracts,"K Babel, P Daian, M Kelkar… - … Security and Privacy (SP), 2023 - ieeexplore.ieee.org","… orders in the formal model directly results in a large amount of repeated computation as … +our language model, which we now describe. The first component of our language model …",https://ieeexplore.ieee.org/abstract/document/10179346/,https://arxiv.org/pdf/2109.04347,,,,, +287,"Möbius 2.3: An extensible tool for dependability, security, and performance evaluation of large and complex system models","T Courtney, S Gaonkar, K Keefe… - 2009 IEEE/IFIP …, 2009 - ieeexplore.ieee.org","… Möbius 2.3 is an extensible dependability, security, and performance modeling environment +for large-scale discrete-event systems. It provides multiple model formalisms and solution …",https://ieeexplore.ieee.org/abstract/document/5270318/,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=df6514dd527647cdf4eab5d67d53629cbc48d9bd,,,Solutions and Mitigations,, +288,Automated identification of security issues from commit messages and bug reports,"Y Zhou, A Sharma - Proceedings of the 2017 11th joint meeting on …, 2017 - dl.acm.org","… From GitHub, JIRA, and Bugzilla, we collected a wide range of security related commits and +bug … ming languages at low cost and large scale. Our proposed K-fold stacking model for …",https://dl.acm.org/doi/abs/10.1145/3106237.3117771,https://www.researchgate.net/profile/Asankhaya-Sharma/publication/318872391_Automated_identification_of_security_issues_from_commit_messages_and_bug_reports/links/5a7947c2a6fdcc4ffe90c684/Automated-identification-of-security-issues-from-commit-messages-and-bug-reports.pdf,,,,, +289,Security scheme for managing a large quantity of individual information in RFID environment,"N Park - International Conference on Information Computing …, 2010 - Springer","… Ensuring the security of mobile RFID's large-capacity … propose a security service solution +for managing a large quantity … It aims to become aware of distribution of large quantity of …",https://link.springer.com/chapter/10.1007/978-3-642-16339-5_10,https://www.academia.edu/download/36619990/1060072.pdf,,,Solutions and Mitigations,, +290,Finding application errors and security flaws using PQL: a program query language,"M Martin, B Livshits, MS Lam - Acm Sigplan Notices, 2005 - dl.acm.org","… large programs, we applied the technique to 6 large real-life applications with nearly 60,000 +classes combined and found 206 errors. We found several security … that the language covers …",https://dl.acm.org/doi/abs/10.1145/1103845.1094840,https://www.academia.edu/download/36333484/oopsla05pql.pdf,,,,, +291,Model-checking of safety and security aspects in web service flows,"S Nakajima - Web Engineering: 4th International Conference, ICWE …, 2004 - Springer","… Last, up to the time of writing this paper, we cover a core part of BPEL language only. +BPEL is a large language that has many interesting features such as compensation, fault, and …",https://link.springer.com/chapter/10.1007/978-3-540-27834-4_60,,,,,, +292,The importance of resources and security in the socio-economic integration of refugees. A study on the impact of length of stay in asylum accommodation and …,"L Bakker, J Dagevos, G Engbersen - Journal of International Migration and …, 2014 - Springer","… In this study, we use a large-scale dataset containing detailed information on about 4,000 +refugees to show that also post-migration stressors affect mental health and hinder the socio-…",https://link.springer.com/article/10.1007/s12134-013-0296-2,https://repub.eur.nl/pub/73332/art-3A10.1007-2Fs12134-013-0296-2.pdf,,,,, +293,On the opportunities and risks of foundation models,"R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org","… of writing are language models, the term language model is … For security, we view foundation +models as akin to operating … of a foundation model without a large amount of training. …",https://arxiv.org/abs/2108.07258,https://arxiv.org/pdf/2108.07258.pdf?utm_source=morning_brew,,,,, +294,Stay on-topic: Generating context-specific fake restaurant reviews,"M Juuti, B Sun, T Mori, N Asokan - … on Research in Computer Security, 2018 - Springer","… We also assume that it is easy for the agent to create a large number of accounts on the +review … , our language model modifications require some knowledge of that target language’s …",https://link.springer.com/chapter/10.1007/978-3-319-99073-6_7,https://arxiv.org/pdf/1805.02400,,,,, +295,Controlled-channel attacks: Deterministic side channels for untrusted operating systems,"Y Xu, W Cui, M Peinado - 2015 IEEE Symposium on Security …, 2015 - ieeexplore.ieee.org","… Abstract—The presence of large numbers of security vulner… untrusted operating system to +extract large amounts of sensitive … To mitigate the ambiguity, we leverage a language model to …",https://ieeexplore.ieee.org/abstract/document/7163052/,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5ac7a4dca5509c9dee49d96b4c3c62cc1d0bb9dd,,,,, +296,Towards a formal model for security policies specification and validation in the selinux system,"G Zanin, LV Mancini - Proceedings of the ninth ACM symposium on …, 2004 - dl.acm.org","… This paper presents a formal model, called SELAC, for analyzing an arbitrary security policy +configuration for the SELinux system. A security policy for SELinux is complex and large: it is …",https://dl.acm.org/doi/abs/10.1145/990036.990059,https://www.researchgate.net/profile/Luigi-Mancini/publication/221367005_Towards_a_formal_model_for_security_policies_specification_and_validation_in_the_SELinux_system/links/02bfe511cbb93de777000000/Towards-a-formal-model-for-security-policies-specification-and-validation-in-the-SELinux-system.pdf,,Policy and Regulation,,,Current Issues +297,Polisis: Automated analysis and presentation of privacy policies using deep learning,"H Harkous, K Fawaz, R Lebret, F Schaub… - 27th USENIX Security …, 2018 - usenix.org","… a custom, privacy-specific language model that we generated using … model outperforms +the SemVec model. This result is not entirely surprising since we seeded Retrieval with a large …",https://www.usenix.org/conference/usenixsecurity18/presentation/harkous,https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-harkous.pdf,,,,, +298,An on-the-fly model-checker for security protocol analysis,"D Basin, S Mödersheim, L Vigano - … on Research in Computer Security …, 2003 - Springer","… security protocol model-checker. Our starting point is the approach of [4] of using lazy data-types +to model … A bisimulation proof shows that, for large classes of properties, the model with …",https://link.springer.com/chapter/10.1007/978-3-540-39650-5_15,https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/69805/1/eth-4466-01.pdf,,,,, +299,The secret sharer: Evaluating and testing unintended memorization in neural networks,"N Carlini, C Liu, Ú Erlingsson, J Kos… - 28th USENIX Security …, 2019 - usenix.org","… of training a large language model on … a language model on this dataset using a two-layer +LSTM with 200 hidden units (with approximately 600,000 parameters). The language model re…",https://www.usenix.org/conference/usenixsecurity19/presentation/carlini,https://www.usenix.org/system/files/sec19-carlini.pdf,,,,, +300,Extracting training data from diffusion models,"N Carlini, J Hayes, M Nasr, M Jagielski… - 32nd USENIX Security …, 2023 - usenix.org","… Overall, running this iterative generation process (which we will denote by Gen) with large-scale … +language modeling work has begun to explore approximate memorization as well [39]). …",https://www.usenix.org/conference/usenixsecurity23/presentation/carlini,https://www.usenix.org/system/files/usenixsecurity23-carlini.pdf,,,,, +301,Cam: A large language model-based creative analogy mining framework,"B Bhavya, J Xiong, C Zhai - Proceedings of the ACM Web Conference …, 2023 - dl.acm.org","… In contrast, we use a unifed framework based on prompting a pre-trained large-scale +language model, which enables our framework to both retrieve/extract explicitly mentioned …",https://dl.acm.org/doi/abs/10.1145/3543507.3587431,https://bhaavya.github.io/files/www23.pdf,,,,, +302,A language modeling framework for resource selection and results merging,"L Si, R Jin, J Callan, P Ogilvie - Proceedings of the eleventh international …, 2002 - dl.acm.org","… as one single giant ‘document’ and perform the similar computation for the document-query +similarity. More formally, we need to find the collections that have largest probabilities of P(Q|…",https://dl.acm.org/doi/abs/10.1145/584792.584856,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1cc7013247056e45264de9817171d72690181692,,,,, +303,"Document language models, query models, and risk minimization for information retrieval","J Lafferty, C Zhai - Proceedings of the 24th annual international ACM …, 2001 - dl.acm.org","… motivate the language modeling approach from a general probabilistic retrieval framework +based on risk … As the models are highly lexical, it is unlikely that a sufficiently large collection …",https://dl.acm.org/doi/abs/10.1145/383952.383970,https://www.academia.edu/download/30739205/10.1.1.69.116.pdf,,,,, +304,InfoXLM: An information-theoretic framework for cross-lingual language model pre-training,"Z Chi, L Dong, F Wei, N Yang, S Singhal… - arXiv preprint arXiv …, 2020 - arxiv.org","… an informationtheoretic framework that formulates crosslingual language model pre-training … +More importantly, inspired by the framework, we propose a new pretraining task based on …",https://arxiv.org/abs/2007.07834,https://arxiv.org/pdf/2007.07834,,,,, +305,A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency …,"CC Chiang, M Luo, G Dumkrieger, S Trivedi, YC Chen… - medRxiv, 2023 - ncbi.nlm.nih.gov","… robust model based on a state-of-the-art large language model (LLM)- a GPT-2 generative +model … We also showed that GPT2-based frameworks outperformed ClinicalBERT in terms of …",https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593021/,https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593021/,,,,, +306,Palm: Scaling language modeling with pathways,"A Chowdhery, S Narang, J Devlin, M Bosma… - arXiv preprint arXiv …, 2022 - arxiv.org","… Finally, we discuss the ethical considerations related to large language models and discuss +… checkpoint 15,000, then the training framework is guaranteed to produce identical results in …",https://arxiv.org/abs/2204.02311,https://arxiv.org/pdf/2204.02311,,,,, +307,Evaluating large language models for use in healthcare: A framework for translational value assessment,"S Reddy - Informatics in Medicine Unlocked, 2023 - Elsevier","… language model, then input text is down into individual tokens (words, subwords, or characters), +which the language model … through training the language model on large amounts of text…",https://www.sciencedirect.com/science/article/pii/S2352914823001508,https://www.sciencedirect.com/science/article/pii/S2352914823001508,,,,, +308,Model-based feedback in the language modeling approach to information retrieval,"C Zhai, J Lafferty - Proceedings of the tenth international conference on …, 2001 - dl.acm.org","… query language model based on feedback documents, one based on a generative probabilistic +model … risk minimization retrieval framework. Interestingly, it is similar to the vector space …",https://dl.acm.org/doi/abs/10.1145/502585.502654,https://www.academia.edu/download/30739222/p403-zhai.pdf,,,,, +309,Language models are few-shot learners,"T Brown, B Mann, N Ryder… - Advances in neural …, 2020 - proceedings.neurips.cc","… GPT-3, an autoregressive language model with 175 billion … When the test set is private, our +model is often too large to fit … of traditional security risk assessment frameworks, which outline …",https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_medium=email&utm_source=transaction,https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf,,Risk Assessment,,, +310,Fedmed: A federated learning framework for language modeling,"X Wu, Z Liang, J Wang - Sensors, 2020 - mdpi.com","Federated learning (FL) is a privacy-preserving technique for training a vast amount of +decentralized data and making inferences on mobile devices. As a typical language modeling …",https://www.mdpi.com/1424-8220/20/14/4048,https://www.mdpi.com/1424-8220/20/14/4048/pdf,,,,, +311,Unipelt: A unified framework for parameter-efficient language model tuning,"Y Mao, L Mathias, R Hou, A Almahairi, H Ma… - arXiv preprint arXiv …, 2021 - arxiv.org","… Prefix-tuning is originally evaluated on natural language generation and we adapt it to … very +large model sizes (billions of total parameters), and is thus not considered in our study. Note …",https://arxiv.org/abs/2110.07577,https://arxiv.org/pdf/2110.07577,,,,, +312,On the opportunities and risks of foundation models,"R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org","… language models, the term language model is simply too narrow for our purpose: as we +describe, the scope of foundation models goes well beyond language… model without a large …",https://arxiv.org/abs/2108.07258,https://arxiv.org/pdf/2108.07258.pdf?utm_source=morning_brew,,,,, +313,Universal language model fine-tuning for text classification,"J Howard, S Ruder - arXiv preprint arXiv:1801.06146, 2018 - arxiv.org","… Finetuning (ULMFiT), which pretrains a language model (LM) on a large general-domain +corpus and fine-tunes it on the target task using novel techniques. The method is universal in …",https://arxiv.org/abs/1801.06146,https://arxiv.org/pdf/1801.06146.pdf%C3%AF%C2%BC%E2%80%B0%C3%A3%E2%82%AC%E2%80%9A,,,,, +314,Unirex: A unified learning framework for language model rationale extraction,"A Chan, M Sanjabi, L Mathias, L Tan… - International …, 2022 - proceedings.mlr.press","… In light of this, we propose UNIREX, a flexible learning framework which generalizes rationale +extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (…",https://proceedings.mlr.press/v162/chan22a.html,https://proceedings.mlr.press/v162/chan22a/chan22a.pdf,,,,, +315,A study of smoothing methods for language models applied to information retrieval,"C Zhai, J Lafferty - ACM Transactions on Information Systems (TOIS), 2004 - dl.acm.org","… In this article, we study the problem of language model … Yet this new framework is very +promising because of its … We find that the noninterpolated average precision on the large database …",https://dl.acm.org/doi/abs/10.1145/984321.984322,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=df4e8955cd509dacde825791d73b9b46101e1b53,,,,, +316,Simmim: A simple framework for masked image modeling,"Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com","… language modeling tasks have largely repainted the field [2,12,30], ie, learning very large-scale +language models by using huge … the study of masked language modeling as a pretext …",http://openaccess.thecvf.com/content/CVPR2022/html/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.html,http://openaccess.thecvf.com/content/CVPR2022/papers/Xie_SimMIM_A_Simple_Framework_for_Masked_Image_Modeling_CVPR_2022_paper.pdf,,,,, +317,Openflamingo: An open-source framework for training large autoregressive vision-language models,"A Awadalla, I Gao, J Gardner, J Hessel… - arXiv preprint arXiv …, 2023 - arxiv.org","… of a frozen, autoregressive language model. To embed images, we … performance, changing +language model backbones did. … We hypothesize that the language model has similarly large …",https://arxiv.org/abs/2308.01390,https://arxiv.org/pdf/2308.01390.pdf?trk=public_post_comment-text,,,,, +318,ChatGPT for good? On opportunities and challenges of large language models for education,"E Kasneci, K Seßler, S Küchemann, M Bannert… - Learning and individual …, 2023 - Elsevier","… In the following, we provide an overview of research works employing large language +models in education that were published since the release of the first large language model in …",https://www.sciencedirect.com/science/article/pii/S1041608023000195,https://osf.io/preprints/edarxiv/5er8f/download,,,,, +319,A risk minimization framework for information retrieval,"CX Zhai, J Lafferty - Information Processing & Management, 2006 - Elsevier","… as observations from a probabilistic model, called a statistical language model, and encode +… However, a large amount of data and empirical experimentation may be needed in order to …",https://www.sciencedirect.com/science/article/pii/S0306457304001530,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=fbfd8121dee2cd04d3fd6f38eef02df195cf4be3,Experiments,,,, +320,Kart: Privacy leakage framework of language models pre-trained with clinical records,"Y Nakamura, S Hanaoka, Y Nomura… - arXiv preprint arXiv …, 2020 - researchgate.net","… eventual privacy risk, we made Large and Small subsets. Their details are listed in Table 1. +… original full patient name with masked language model for each targeted full name mention …",https://www.researchgate.net/profile/Yuta-Nakamura/publication/348213139_KART_Privacy_Leakage_Framework_of_Language_Models_Pre-trained_with_Clinical_Records/links/60024586299bf140889b71d4/KART-Privacy-Leakage-Framework-of-Language-Models-Pre-trained-with-Clinical-Records.pdf,https://www.researchgate.net/profile/Yuta-Nakamura/publication/348213139_KART_Privacy_Leakage_Framework_of_Language_Models_Pre-trained_with_Clinical_Records/links/60024586299bf140889b71d4/KART-Privacy-Leakage-Framework-of-Language-Models-Pre-trained-with-Clinical-Records.pdf,,,,, +321,"cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models","C Dyer, A Lopez, J Ganitkevitch, J Weese… - Proceedings of the …, 2010 - research.ed.ac.uk","… (§3), we use the logic for language model rescoring described by Chiang (2007), although +… , the resulting rescored forest may still be too large to represent completely. cdec therefore …",https://www.research.ed.ac.uk/en/publications/cdec-a-decoder-alignment-and-learning-framework-for-finite-state-,https://www.research.ed.ac.uk/files/21524505/P10_4002.pdf,,,,,Current Issues +322,Esg2risk: A deep learning framework from esg news to stock volatility prediction,"T Guo, N Jamet, V Betrix, LA Piquet… - arXiv preprint arXiv …, 2020 - arxiv.org","… A language model is typically pre-trained on a large corpus of text, … framework, ESG2Risk, +to predict future volatility of stock prices. We show that a transformer-based language model …",https://arxiv.org/abs/2005.02527,https://arxiv.org/pdf/2005.02527,,,,,Future Predictions +323,Scalable framework for cyber threat situational awareness based on domain name systems data analysis,"R Vinayakumar, P Poornachandran… - Big data in engineering …, 2018 - Springer","… have achieved a significant performance in various tasks such as language modeling, text +classification and many others in the area of natural language processing (NLP) [18]. They …",https://link.springer.com/chapter/10.1007/978-981-10-8476-8_6,https://www.researchgate.net/profile/Vinayakumar-Ravi/publication/324906611_Scalable_Framework_for_Cyber_Threat_Situational_Awareness_Based_on_Domain_Name_Systems_Data_Analysis/links/5e80ca46458515efa0b87897/Scalable-Framework-for-Cyber-Threat-Situational-Awareness-Based-on-Domain-Name-Systems-Data-Analysis.pdf,,,,, +324,A big data analytics framework for detecting user-level depression from social networks,"X Yang, R McEwen, LR Ong, M Zihayat - International Journal of …, 2020 - Elsevier","… novel frameworks to identify those at risk of depression. Moreover, such frameworks can … +In this paper, we propose a big data analytics framework to detect depression for users of …",https://www.sciencedirect.com/science/article/pii/S0268401219313325,,,,,, +325,REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models,"R Zhang, SS Hussain, P Neekhara… - arXiv preprint arXiv …, 2023 - arxiv.org","… We present REMARK-LLM, a novel efficient, and robust watermarking framework designed +for texts generated by large language models (LLMs). Synthesizing human-like content using …",https://arxiv.org/abs/2310.12362,https://arxiv.org/pdf/2310.12362,,,,,Current Issues +326,REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes,"C Wang, Z Li, Y Peng, S Gao, S Chen, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org","… ➋ Large Language Model (LLM)-based explanation with … [31], we leverage the large +language model to automatically … • We propose REEF, a framework to mine up-to-date, real…",https://arxiv.org/abs/2309.08115,https://arxiv.org/pdf/2309.08115,,,,, +327,Learning simpler language models with the differential state framework,"AG Ororbia II, T Mikolov, D Reitter - Neural computation, 2017 - ieeexplore.ieee.org","… RNN, will be evaluated as a language model, and we will not … However, we opt to use this +large corpus (training consists of … -RNN as a (subword) language model. The IMDB data set …",https://ieeexplore.ieee.org/abstract/document/8124214/,https://arxiv.org/pdf/1703.08864,,,,, +328,A proposed chatbot framework for COVID-19,"E Amer, A Hazem, O Farouk, A Louca… - 2021 International …, 2021 - ieeexplore.ieee.org","… To manage the large number of user requests during … , our model used the pre-trained Google +BERT language model. On top … As a result, training a model on a large dataset can take a …",https://ieeexplore.ieee.org/abstract/document/9447652/,https://www.researchgate.net/profile/Eslam-Amer-3/publication/352274956_A_Proposed_Chatbot_Framework_for_COVID-19/links/60d3784f92851c8f7995a93e/A-Proposed-Chatbot-Framework-for-COVID-19.pdf,,,,, +329,"Leveraging cyber threat intelligence for a dynamic risk framework: Automation by using a semantic reasoner and a new combination of standards (STIX™, SWRL and …","R Riesco, VA Villagrá - International Journal of Information Security, 2019 - Springer","… examples of real risks for any organization. Risk management frameworks are not integrated +and automated with near real-time (NRT) risk-related cybersecurity threat intelligence (CTI) …",https://link.springer.com/article/10.1007/s10207-019-00433-2,http://oa.upm.es/63893/1/INVE_MEM_2019_321228.pdf,,,,, +330,Anticipating safety issues in e2e conversational ai: Framework and tooling,"E Dinan, G Abercrombie, AS Bergman, S Spruit… - arXiv preprint arXiv …, 2021 - arxiv.org","… , different risk estimates would be expected if there are large … at different stages as we suggest +in the framework in §4. … -trained with a masked language model objective on pushshift.io …",https://arxiv.org/abs/2107.03451,https://arxiv.org/pdf/2107.03451,,,,, +331,A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis,"D Feng, H Chen - Advanced Engineering Informatics, 2021 - Elsevier","… on domain-specific natural language processing, and since language forms are generally … +pre-trained model with large scale corpus training. For example, the leading language model, …",https://www.sciencedirect.com/science/article/pii/S1474034621000112,,,,,, +332,"Low-latency, high-throughput access to static global resources within the Hadoop framework","J Lin, A Bahety, S Konda, S Mahindrakar - University of Maryland, Tech …, 2009 - Citeseer","… of scoring a large collection of natural language sentences given a unigram language model. +… Table 1: Results of scoring natural language sentences with a unigram language model in …",https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=eb142151c77e9fc91933b223d43bb1d3f772310c,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=eb142151c77e9fc91933b223d43bb1d3f772310c,,,,, +333,GameGPT: Multi-agent Collaborative Framework for Game Development,"D Chen, H Wang, Y Huo, Y Li, H Zhang - arXiv preprint arXiv:2310.08067, 2023 - arxiv.org","… The large language model (LLM) based agents have … propose a multi-agent collaborative +framework, dubbed GameGPT, to … Our framework presents a series of methods to mitigate both …",https://arxiv.org/abs/2310.08067,https://arxiv.org/pdf/2310.08067,,,,,Current Issues +334,Zero and R2D2: A large-scale Chinese cross-modal benchmark and A vision-language framework,"C Xie, J Li, H Cai, F Kong, X Wu, J Song… - arXiv preprint arXiv …, 2022 - arxiv.org","… Masked Language Modeling with Enhanced Training. We apply a masked language +modeling loss to the text-image cross encoder to improve the ability to model the relationship …",https://arxiv.org/abs/2205.03860,https://arxiv.org/pdf/2205.03860,,,,, +335,Risk based life cycle assessment conceptual framework for energy supply systems in large buildings,"N Ayoub, F Musharavati, S Pokharel… - Journal of Cleaner …, 2015 - Elsevier","… supply side of a large building and develop a framework for risk based life cycle analysis (… +of a large building based on potential risks. By combining environmental assessment and risk …",https://www.sciencedirect.com/science/article/pii/S0959652615004503,https://www.academia.edu/download/96608616/j.jclepro.2015.04.07520221231-1-1ovyvod.pdf,,,,, +336,"Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","M Lewis, Y Liu, N Goyal, M Ghazvininejad… - arXiv preprint arXiv …, 2019 - arxiv.org","… schemes within the BART framework, to better measure … coder, and for our large model we +use 12 layers in each. The … The Masked Language Model and the Permuted Language Model …",https://arxiv.org/abs/1910.13461,https://arxiv.org/pdf/1910.13461,,,,, +337,A risk-aware modeling framework for speech summarization,"B Chen, SH Lin - … transactions on audio, speech, and language …, 2011 - ieeexplore.ieee.org","… train -gram language models for speech recognition with the SRI Language Modeling Toolkit +[… as a query and posing it to an IR system to obtain a ranked list of documents from a large …",https://ieeexplore.ieee.org/abstract/document/5876303/,http://140.122.185.120/Berlin_Research/Manuscripts/2011-/2012-IEEE-TASL-A%20risk-aware%20modeling%20framework%20for%20speech%20summarization.pdf,,,,, +338,RAH! RecSys-Assistant-Human: A Human-Central Recommendation Framework with Large Language Models,"Y Shu, H Gu, P Zhang, H Zhang, T Lu, D Li… - arXiv preprint arXiv …, 2023 - arxiv.org","… Finally, we discuss further strategies in the RAH framework to address human-central … 29] +suggests that using the large language model (LLM) as the recommender system is promising …",https://arxiv.org/abs/2308.09904,https://arxiv.org/pdf/2308.09904,,,,, +339,A medical ethics framework for conversational artificial intelligence,"E Fournier-Tombs, J McHardy - Journal of Medical Internet Research, 2023 - jmir.org","… risks in the use of conversational chatbots in medicine to the principles of medical ethics. In +doing so, we propose a framework … features specific risks related to large language models […",https://www.jmir.org/2023/1/e43068/,https://www.jmir.org/2023/1/e43068/,,,,, +340,"Grasping project complexity in large engineering projects: The TOE (Technical, Organizational and Environmental) framework","M Bosch-Rekveldt, Y Jongkind, H Mooi… - International journal of …, 2011 - Elsevier","… a framework for characterising project complexity in large … Recently, a large number of +project complexity related papers … However, no generally accepted framework is available to …",https://www.sciencedirect.com/science/article/pii/S0263786310001122,https://www.researchgate.net/profile/Hans-Bakker-2/publication/229307177_Grasping_project_complexity_in_large_engineering_projects_The_TOE_Technical_Organizational_and_Environmental_framework/links/5788860308ae95560407c47d/Grasping-project-complexity-in-large-engineering-projects-The-TOE-Technical-Organizational-and-Environmental-framework.pdf,,,,, +341,From 'multicultural health'to 'knowledge translation'. Rethinking strategies to promote language access within a risk management framework,"S Bowen, M Gibbens, J Roy, J Edwards - Jostrans, 2010 - jostrans.org","… This paper will present a case study of a multifaceted knowledge translation (KT) strategy +to promote evidence-informed action to address language barriers within a large Canadian …",https://jostrans.org/issue14/art_bowen.pdf,https://jostrans.org/issue14/art_bowen.pdf,Case Studies,,,,Current Issues +342,On the dangers of stochastic parrots: Can language models be too big?🦜,"EM Bender, T Gebru, A McMillan-Major… - Proceedings of the 2021 …, 2021 - dl.acm.org","… Similar to [14], we understand the term language model (LM) to refer to systems which are +… In this section, we provide a brief overview of the general trend of language modeling in …",https://dl.acm.org/doi/abs/10.1145/3442188.3445922,https://dl.acm.org/doi/pdf/10.1145/3442188.3445922?uuid=f2qngt2LcFCbgtaZ2024,,,,, +343,Cloud computing adoption framework: A security framework for business clouds,"V Chang, YH Kuo, M Ramachandran - Future Generation Computer …, 2016 - Elsevier","… Second, it ensures that large amount of data and large data … of large-scale penetration testing +to validate the framework. … core technologies and results from large-scale experiments for …",https://www.sciencedirect.com/science/article/pii/S0167739X15003118,https://eprints.leedsbeckett.ac.uk/id/eprint/1857/1/CCAF_VC_openstack_accepted.pdf,Experiments,,,, +344,Predictive modeling in e-mental health: a common language framework,"D Becker, W van Breda, B Funk, M Hoogendoorn… - Internet …, 2018 - Elsevier","… framework that identifies four model types that can be used to classify existing and future +research and applications. To illustrate this, we use the framework … problems have huge impacts …",https://www.sciencedirect.com/science/article/pii/S2214782917301124,https://www.sciencedirect.com/science/article/pii/S2214782917301124,,,,,Future Predictions +345,A real-time ATC safety monitoring framework using a deep learning approach,"Y Lin, L Deng, Z Chen, X Wu, J Zhang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org","… language model is also trained in this subsystem to improve the overall performance of the +framework… All the advanced technologies, such as big data and the deep learning-based ASR …",https://ieeexplore.ieee.org/abstract/document/8846596/,,,,,, +346,A static compliance-checking framework for business process models,"Y Liu, S Muller, K Xu - IBM Systems Journal, 2007 - ieeexplore.ieee.org","… these compliance rules by means of model-checking technology. The benefit of … large set +of business process models, our approach increases deployment efficiency and lowers the risk …",https://ieeexplore.ieee.org/abstract/document/5386614/,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=89cb83ee8dae69af10d6ab25ff5483056e6e1bc1,,,,, +347,Chatmof: An autonomous ai system for predicting and generating metal-organic frameworks,"Y Kang, J Kim - arXiv preprint arXiv:2308.01423, 2023 - arxiv.org","… of metal-organic frameworks (MOFs). By leveraging a large-scale language model (gpt-3.5-… +The study further explores the merits and constraints of using large language models (LLMs…",https://arxiv.org/abs/2308.01423,https://arxiv.org/pdf/2308.01423,,,,, +348,Quantitative multi-risk analysis for natural hazards: a framework for multi-risk modelling,"J Schmidt, I Matcham, S Reese, A King, R Bell… - Natural hazards, 2011 - Springer","… This paper introduces a generic framework for multi-risk modelling developed in the project +‘… modelling risks from different natural hazards and for various elements at risk. The technical …",https://link.springer.com/article/10.1007/s11069-011-9721-z,https://www.academia.edu/download/48326906/s11069-011-9721-z20160826-1221-qvfkpo.pdf,,,,, +349,Natural language processing for mental health interventions: a systematic review and research framework,"M Malgaroli, TD Hull, JM Zech, T Althoff - Translational Psychiatry, 2023 - nature.com","… Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased +sample sizes and use of large language models. Digital health platforms were the largest …",https://www.nature.com/articles/s41398-023-02592-2,https://www.nature.com/articles/s41398-023-02592-2,,,,, +350,When can models learn from explanations? a formal framework for understanding the roles of explanation data,"P Hase, M Bansal - arXiv preprint arXiv:2102.02201, 2021 - arxiv.org","… We suggest that, in the paradigm of large language model pretraining, this interpretation +function will be meta-learned during pretraining. This behavior is clearly exemplified in GPT-3, …",https://arxiv.org/abs/2102.02201,https://arxiv.org/pdf/2102.02201,,,,, +351,An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder,"M Li, D Tang, J Zeng, T Zhou, H Zhu, B Chen… - … Speech & Language, 2019 - Elsevier","… from speech transcripts, we adopt language model, dependency treebank and Term +Frequency–… to the n-gram language model. In this work, we use a large-scale Chinese …",https://www.sciencedirect.com/science/article/pii/S0885230817303601,https://sites.duke.edu/dkusmiip/files/2022/11/J_csl.pdf,,,,, +352,"A framework for estimating information security risk assessment method completeness: Core Unified Risk Framework, CURF","G Wangen, C Hallstensen, E Snekkenes - International Journal of …, 2018 - Springer","… This paper proposes the Core Unified Risk Framework (CURF) as an all-inclusive approach +to compare different methods, all-inclusive since we grew CURF organically by adding new …",https://link.springer.com/article/10.1007/s10207-017-0382-0,https://link.springer.com/article/10.1007/s10207-017-0382-0,,,,, +353,MulDA: A multilingual data augmentation framework for low-resource cross-lingual NER,"L Liu, B Ding, L Bing, S Joty, L Si… - … on Natural Language …, 2021 - aclanthology.org","… language model based NER models to generalize better with both the language-specific +features from the target-language … (2020), we reproduce their results with XLM-R large for a fair …",https://aclanthology.org/2021.acl-long.453/,https://aclanthology.org/2021.acl-long.453.pdf,,,,, +354,Continually-Adaptive Representation Learning Framework for Time-Sensitive Healthcare Applications,"A Choudhuri, H Jang, AM Segre, PM Polgreen… - Proceedings of the …, 2023 - dl.acm.org","… and the natural language processing model to process and … learning framework to train +the model on large batches of … , we only minimize the Masked Language Model Loss. This is …",https://dl.acm.org/doi/abs/10.1145/3583780.3615464,https://dl.acm.org/doi/pdf/10.1145/3583780.3615464,,,,, +355,Developing a framework to re-design writing assignment assessment for the era of Large Language Models,"YP Hsiao, N Klijn, MS Chiu - Learning: Research and Practice, 2023 - Taylor & Francis","… Consequently, we have developed a framework to identify design dimensions for … This +report aims to describe the prototype of our framework and share the lessons learned from its …",https://www.tandfonline.com/doi/full/10.1080/23735082.2023.2257234,https://www.tandfonline.com/doi/full/10.1080/23735082.2023.2257234,,,,, +356,SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis,"E Cambria, Q Liu, S Decherchi, F Xing… - … Thirteenth Language …, 2022 - aclanthology.org","… introduces permutation language modeling, where … big shift in NLP research has been the +upgrade from the bag-of-words (BOW) model to the continuous-bag-of-words (CBOW) model, …",https://aclanthology.org/2022.lrec-1.408/,https://aclanthology.org/2022.lrec-1.408.pdf,,,,, +357,FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models,"T Fan, Y Kang, G Ma, W Chen, W Wei, L Fan… - arXiv preprint arXiv …, 2023 - arxiv.org","… Glm: General language model pretraining with autoregressive blank infilling. In Proceedings +of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: …",https://arxiv.org/abs/2310.10049,https://arxiv.org/pdf/2310.10049,,,,, +358,Mongoose: A learnable lsh framework for efficient neural network training,"B Chen, Z Liu, B Peng, Z Xu, JL Li, T Dao… - International …, 2020 - openreview.net","… on large-scale deep learning models for recommendation systems and language modeling. +… effectiveness of our framework on both recommendation and language modeling tasks. The …",https://openreview.net/forum?id=wWK7yXkULyh,https://openreview.net/pdf?id=wWK7yXkULyh,,,,, +359,BPRIM: An integrated framework for business process management and risk management,"E Lamine, R Thabet, A Sienou, D Bork, F Fontanili… - Computers in …, 2020 - Elsevier","… These observations led to the emergence of the risk manager job profile in large organizations. +Having its roots in areas such as project management, finance, and industrial safety, …",https://www.sciencedirect.com/science/article/pii/S0166361520300890,https://www.sciencedirect.com/science/article/am/pii/S0166361520300890,,,,, +360,Automating threat modeling using an ontology framework,"M Välja, F Heiding, U Franke, R Lagerström - Cybersecurity, 2020 - Springer","… in modeling automation can be addressed with ontologies. In this paper, we introduce an +ontology framework to improve automatic threat modeling… categories capture a large part of the …",https://link.springer.com/article/10.1186/s42400-020-00060-8?wt_mc=Internal.Event.1.SEM.ArticleAuthorIncrementalIssue&utm_source=ArticleAuthorIncrementalIssue&utm_medium=email&utm_content=AA_en_06082018&ArticleAuthorIncrementalIssue_20201003,https://link.springer.com/article/10.1186/s42400-020-00060-8?wt_mc=Internal.Event.1.SEM.ArticleAuthorIncrementalIssue&utm_source=ArticleAuthorIncrementalIssue&utm_medium=email&utm_content=AA_en_06082018&ArticleAuthorIncrementalIssue_20201003,,,,, +361,Greenpeace v. Shell: Media exploitation and the social amplification of risk framework (SARF),"V Bakir - Journal of Risk Research, 2005 - Taylor & Francis","… is its large expected role in the social amplification of risk, … framework? To answer this +question, using key SARF concepts, this paper tracks Greenpeace's and Shell's framing of their risk …",https://www.tandfonline.com/doi/abs/10.1080/13669870500166898,,,,,, +362,Epidemiology as a framework for large-scale mobile application accessibility assessment,"AS Ross, X Zhang, J Fogarty… - Proceedings of the 19th …, 2017 - dl.acm.org","… Accordingly, our framework puts forth notions like risk and … Our epidemiology-inspired +conceptual framework is the main … In a preliminary exercising of our framework, we perform an …",https://dl.acm.org/doi/abs/10.1145/3132525.3132547,https://dl.acm.org/doi/pdf/10.1145/3132525.3132547,,,,, +363,A framework for Big Data driven product lifecycle management,"Y Zhang, S Ren, Y Liu, T Sakao, D Huisingh - Journal of Cleaner …, 2017 - Elsevier","… Big Data processing and analysis, and to build up a referenced BDD-… framework for +manufacturing enterprises. Based upon the infrastructure of Big Data analysis, an overall framework …",https://www.sciencedirect.com/science/article/pii/S0959652617309150,https://www.sciencedirect.com/science/article/am/pii/S0959652617309150,,,,, +364,A general optimization framework for smoothing language models on graph structures,"Q Mei, D Zhang, CX Zhai - Proceedings of the 31st annual international …, 2008 - dl.acm.org","… largest similarity have the smallest distance on the hyperplane [1]. By estimating a document +language model … original language model ensures that the smoothed language model is …",https://dl.acm.org/doi/abs/10.1145/1390334.1390438,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=97a1617a789cd8432a8ade86a79b25f90fe2c3b1,,,,, +365,AI and big data in healthcare: towards a more comprehensive research framework for multimorbidity,"LT Majnarić, F Babič, S O'Sullivan… - Journal of Clinical …, 2021 - mdpi.com","… Techniques used in the models, either statistical method, are the cox proportional hazard +regression model and the risk survival analysis, or ML methods, like SVM, ANNs, DTs, and RF. …",https://www.mdpi.com/2077-0383/10/4/766,https://www.mdpi.com/2077-0383/10/4/766/pdf,,,,, +366,An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of …,"YH Tu, J Du, Q Wang, X Bao, LR Dai, CH Lee - … Speech & Language, 2017 - Elsevier","… It is based on a deep learning framework with a large deep … network (RNN)-based language +model for rescoring, a WER of … improved MVDR approach and language model re-scoring, …",https://www.sciencedirect.com/science/article/pii/S0885230816300766,,,,,, +367,Big data analytics framework for peer-to-peer botnet detection using random forests,"K Singh, SC Guntuku, A Thakur, C Hota - Information Sciences, 2014 - Elsevier","… In this section the components of the proposed scalable network threat detection +framework are described in detail. The framework consists of the following components: …",https://www.sciencedirect.com/science/article/pii/S0020025514003570,,,,,, +368,A natural language processing framework for assessing hospital readmissions for patients with COPD,"A Agarwal, C Baechle, R Behara… - IEEE journal of …, 2017 - ieeexplore.ieee.org","… statistically predict those most in danger of readmission, a few … We have proposed a framework, +which uses natural language … predictive modeling for many diseases have seen a large …",https://ieeexplore.ieee.org/abstract/document/7880658/,https://ieeexplore.ieee.org/ielaam/6221020/8306527/7880658-aam.pdf,,,,, +369,A risk management framework for distributed agile projects,"SV Shrivastava, U Rathod - Information and software technology, 2017 - Elsevier","… The framework … , language barrier and large project scope with the risks that have severe +impact on DAD project. Further, this study aimed at validating the risk management framework …",https://www.sciencedirect.com/science/article/pii/S0950584916304815,https://www.sciencedirect.com/science/article/am/pii/S0950584916304815,,,,, +370,A framework of issues in large process modelling projects,"C Raduescu, H Tan, M Jayaganesh… - Proceedings of the …, 2006 - eprints.qut.edu.au","… This paper makes a first contribution to a potential research agenda in this field by defining +the characteristics of large-scale process modeling projects and proposing a framework of …",https://eprints.qut.edu.au/25619/,https://eprints.qut.edu.au/25619/3/25619.pdf,,,,, +371,A value-at-risk framework for longevity trend risk,"SJ Richards, ID Currie, GP Ritchie - British Actuarial Journal, 2014 - cambridge.org","… framework described in this paper works with a wide variety of models, enabling practitioners +to explore the impact of model risk on … For large portfolios the idiosyncratic risk will often be …",https://www.cambridge.org/core/journals/british-actuarial-journal/article/valueatrisk-framework-for-longevity-trend-risk/CF4D3DDCF63EE9B1D7FA73CA43798255,https://www.longevitas.co.uk/sites/default/files/VaR_2012.pdf,,,,, +372,A reinforcement learning framework for relevance feedback,"A Montazeralghaem, H Zamani, J Allan - Proceedings of the 43rd …, 2020 - dl.acm.org","… , we focus on the language modeling framework [36] and use … as part of our reinforcement +learning framework using the back-… Instead of designing a complex neural network with huge …",https://dl.acm.org/doi/abs/10.1145/3397271.3401099,https://par.nsf.gov/servlets/purl/10175982,,,,, +373,Towards a framework for project risk knowledge management in the construction supply chain,"JHM Tah, V Carr - Advances in Engineering Software, 2001 - Elsevier","… risk management is made. A common language for describing risks based on a hierarchical-risk +… A prototype system being developed to support the risk management framework is …",https://www.sciencedirect.com/science/article/pii/S0965997801000357,https://www.academia.edu/download/68258257/s0965-9978_2801_2900035-720210722-15143-6cit9e.pdf,,,,, +374,"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","P Liu, W Yuan, J Fu, Z Jiang, H Hayashi… - ACM Computing …, 2023 - dl.acm.org","… framework is powerful and attractive for a number of reasons: It allows the language model +… be trained on large datasets, in the process learning robust general-purpose features of the …",https://dl.acm.org/doi/abs/10.1145/3560815,https://dl.acm.org/doi/pdf/10.1145/3560815?trk=public_post_comment-text,,,,, +375,A framework for big data analytics in commercial social networks: A case study on sentiment analysis and fake review detection for marketing decision-making,"E Kauffmann, J Peral, D Gil, A Ferrández… - Industrial Marketing …, 2020 - Elsevier","… source of big data that can be transformed into valuable information. A huge number of … +We propose a framework to automatically analyse these reviews, transforming negative and …",https://www.sciencedirect.com/science/article/pii/S0019850118307612,,,,,, +376,Efficient-FedRec: Efficient federated learning framework for privacy-preserving news recommendation,"J Yi, F Wu, C Wu, R Liu, G Sun, X Xie - arXiv preprint arXiv:2109.05446, 2021 - arxiv.org","… framework, we decompose the news recommendation model into a large news model and a +lightweight user model… This is because we use pre-trained language model in news model, …",https://arxiv.org/abs/2109.05446,https://arxiv.org/pdf/2109.05446,,,,, +377,"Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks","A Fernández, S del Río, V López… - … : Data Mining and …, 2014 - Wiley Online Library","… Big Data solutions on such platforms. Afterwards, we focus on the MapReduce programming +framework as the most prominent solution for Big … several limitations of this model. We then …",https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1134,http://150.214.190.154/sites/default/files/ficherosPublicaciones/1810_2014-WIRES-Fernandez_etAl-Big_Data_w_Cloud_Computing.pdf,,,Solutions and Mitigations,, +378,Big data and machine learning based secure healthcare framework,"P Kaur, M Sharma, M Mittal - Procedia computer science, 2018 - Elsevier","… framework was used to manage and analyze huge amount of data to benefit the patients as +well as healthcare professionals. HIS framework … and secure big data healthcare framework. …",https://www.sciencedirect.com/science/article/pii/S187705091830752X,https://www.sciencedirect.com/science/article/pii/S187705091830752X/pdf?md5=c9398593062bd8d8d17dd0caf795fea7&pid=1-s2.0-S187705091830752X-main.pdf,,,,, +379,Nl-augmenter: A framework for task-sensitive natural language augmentation,"KD Dhole, V Gangal, S Gehrmann, A Gupta, Z Li… - arXiv preprint arXiv …, 2021 - arxiv.org","… Dilution of Contributions While this is not our intent, there is a risk in large scale collections +of work like this that individual contributions are being less appreciated than releasing them …",https://arxiv.org/abs/2112.02721,https://arxiv.org/pdf/2112.02721,,,,, +380,HyLECA: A Framework for Developing Hybrid Long-term Engaging Controlled Conversational Agents,"E Basar, D Balaji, L He, I Hendrickx… - Proceedings of the 5th …, 2023 - dl.acm.org","… We present HyLECA, an open-source framework designed for the … natural language generation +capabilities of open-domain large … state-of-the-art large language models in simulating a …",https://dl.acm.org/doi/abs/10.1145/3571884.3604404,https://www.researchgate.net/profile/Erkan-Basar-2/publication/372455729_HyLECA_A_Framework_for_Developing_Hybrid_Long-term_Engaging_Controlled_Conversational_Agents/links/64be3c3eb9ed6874a5412d4b/HyLECA-A-Framework-for-Developing-Hybrid-Long-term-Engaging-Controlled-Conversational-Agents.pdf,,,,,Current Issues +381,A risk management framework for distributed scrum using PRINCE2 methodology,"M Esteki, TJ Gandomani, HK Farsani - Bulletin of Electrical Engineering …, 2020 - beei.org","… a risk management framework in Scrum using the PRINCE2 methodology, which includes the +perceived risks in … was introduced on a large scale with two basic frameworks [17]. The first …",https://beei.org/index.php/EEI/article/view/1905,https://beei.org/index.php/EEI/article/download/1905/1486,,,,, +382,A new model for the selection of web development frameworks: application to PHP frameworks,"K Benmoussa, M Laaziri, S Khoulji… - … Journal of Electrical …, 2019 - researchgate.net","… The use of a framework is often essential for medium and large scale developments, but +is … and complete model to compare and evaluate the main PHP frameworks. This model is …",https://www.researchgate.net/profile/Abir-Yamami/publication/330656300_A_new_model_for_the_selection_of_web_development_frameworks_application_to_PHP_frameworks/links/5c4c8fee299bf12be3e5786c/A-new-model-for-the-selection-of-web-development-frameworks-application-to-PHP-frameworks.pdf,https://www.researchgate.net/profile/Abir-Yamami/publication/330656300_A_new_model_for_the_selection_of_web_development_frameworks_application_to_PHP_frameworks/links/5c4c8fee299bf12be3e5786c/A-new-model-for-the-selection-of-web-development-frameworks-application-to-PHP-frameworks.pdf,,,,, +383,Hazard analysis: A deep learning and text mining framework for accident prevention,"B Zhong, X Pan, PED Love, J Sun, C Tao - Advanced Engineering …, 2020 - Elsevier","… learning and text mining framework to analyse hazards automatically. The framework provides +managers with the capability to quickly analyse a large number of hazard records and put …",https://www.sciencedirect.com/science/article/pii/S1474034620301233,,,,,, +384,Framework of spatial decision support system for large-scale public building evacuation,"Z Zhichong, W Yaowu - 2009 WRI Global Congress on …, 2009 - ieeexplore.ieee.org","… ) for large-scale … risk appraisal is the core algorithmic model for environment analysis +module and the basis of evacuation routing optimization. Framework structure of SDSS for large-…",https://ieeexplore.ieee.org/abstract/document/5208961/,,,,,, +385,Integrating and evaluating neural word embeddings in information retrieval,"G Zuccon, B Koopman, P Bruza… - Proceedings of the 20th …, 2015 - dl.acm.org","… translation language model for information retrieval. This … In the language modelling +framework, documents are ranked … of neural translation language models and the large number of …",https://dl.acm.org/doi/abs/10.1145/2838931.2838936,https://eprints.qut.edu.au/91418/1/adcs2015_neural_translation_lm.pdf,,,,, +386,Cyber forensics framework for big data analytics in IoT environment using machine learning,"GS Chhabra, VP Singh, M Singh - Multimedia Tools and Applications, 2020 - Springer","… By big, we mean the huge volume, velocity or/and variety (3 V’s) of data. Hadoop framework +… , working in coordination, store and process the big data. That is, the control structure of the …",https://link.springer.com/article/10.1007/s11042-018-6338-1,,,,,, +387,Multicriteria framework for selecting a process modelling language,"AC Scanavachi Moreira Campos… - Enterprise Information …, 2016 - Taylor & Francis","… a large number modelling languages and also due to the lack of guidelines on evaluating, +and comparing languages … paper proposes a framework for selecting a modelling language in …",https://www.tandfonline.com/doi/abs/10.1080/17517575.2014.906047,,,,,, +388,A framework for understanding and predicting insider attacks,"EE Schultz - Computers & security, 2002 - Elsevier","… The framework presented here is promising in that it synthesizes and … framework is also +unproven. A logical next step is to perform validation testing of the model by collecting a large …",https://www.sciencedirect.com/science/article/pii/S016740480201009X,,,,,,Current Issues +389,Towards a big data framework for analyzing social media content,"JL Jimenez-Marquez, I Gonzalez-Carrasco… - International Journal of …, 2019 - Elsevier","… To the best of our knowledge, a framework for integrating natural language processing, +machine learning and big data techniques for social media content in a methodological way, and …",https://www.sciencedirect.com/science/article/pii/S0268401218305073,https://e-tarjome.com/storage/panel/fileuploads/2019-04-18/1555575045_E10938-e-tarjome.pdf,,,,, +390,Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image Development,"S Rismani, R Shelby, A Smart, R Delos Santos… - Proceedings of the …, 2023 - dl.acm.org","… -to-image (T2I) model user interface by professional visual artists in their creative practice. +The rapid public release and adoption of large generative models in various application areas …",https://dl.acm.org/doi/abs/10.1145/3600211.3604685,https://arxiv.org/pdf/2307.10312,,,,, +391,A neighborhood framework for resource-lean content flagging,"SM Sarwar, D Zlatkova, M Hardalov… - Transactions of the …, 2022 - direct.mit.edu","… model for a target language with limited annotated data by transferring knowledge from another +dataset in a different language, for which a large … -trained multilingual language model ℳ …",https://direct.mit.edu/tacl/article-abstract/doi/10.1162/tacl_a_00472/110995,https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00472/110995,,,,, +392,A compound event framework for understanding extreme impacts,"M Leonard, S Westra, A Phatak… - Wiley …, 2014 - Wiley Online Library","… framework to better define, map, analyze, model, and communicate the risk of compound +events. … For example, a famine that is the consequence of a drought (hazard), large population (…",https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcc.252,https://www.researchgate.net/profile/Michael-Leonard/publication/280301850_A_compound_event_framework_for_understanding_extreme_impacts/links/5ae266390f7e9b28594a270c/A-compound-event-framework-for-understanding-extreme-impacts.pdf,,,,, +393,EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models,"P Wang, N Zhang, X Xie, Y Yao, B Tian… - arXiv preprint arXiv …, 2023 - arxiv.org","… Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, … +, there is no standard implementation framework available for the community, which hinders …",https://arxiv.org/abs/2308.07269,https://arxiv.org/pdf/2308.07269,,,,, +394,Modelling global risk factors affecting construction cost performance,"D Baloi, ADF Price - International journal of project management, 2003 - Elsevier","… decision framework for a systematic modelling, analysis and management of global risk +factors … with extremely low probability of occurrence, but can have huge negative impacts on …",https://www.sciencedirect.com/science/article/pii/S0263786302000170,,,,,Global, +395,A global supply chain risk management framework: An application of text-mining to identify region-specific supply chain risks,"CY Chu, K Park, GE Kremer - Advanced Engineering Informatics, 2020 - Elsevier","… categorization supported by a relatively large dataset covering extant literature from year … +a holistic supply chain risk framework for companies to develop strategies to manage risk. …",https://www.sciencedirect.com/science/article/pii/S1474034620300227,https://par.nsf.gov/servlets/purl/10181981,,,,, +396,Machine learning framework for Hazard Extraction and Analysis of Trends (HEAT) in wildfire response,"SR Andrade, HS Walsh - Safety science, 2023 - Elsevier","… BERT models are pre-trained on large text corpora (eg, wikipedia) with millions of documents +and use a masked language model to learn both left-to-right and right-to-left word context. …",https://www.sciencedirect.com/science/article/pii/S0925753523001947,,,,,, +397,Creating strategic business value from big data analytics: A research framework,"V Grover, RHL Chiang, TP Liang… - Journal of management …, 2018 - Taylor & Francis","… the value proposition of big data analytics, and discuss strategic IT business value and our +BDA value creation and realization framework. The proposed framework focuses on building …",https://www.tandfonline.com/doi/abs/10.1080/07421222.2018.1451951,https://files.stample.com/stample-1564010096810-2868103_1_bda-2018.pdf,,,,, +398,Multi-perspective enterprise modeling (memo) conceptual framework and modeling languages,"U Frank - Proceedings of the 35th Annual Hawaii International …, 2002 - ieeexplore.ieee.org","… modelling languages, they are accompanied by a big challenge at the same time: The more +specialised the concepts of a language, … To give an impression of the language definitions …",https://ieeexplore.ieee.org/abstract/document/993989/,https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aa4539c700707d619b43e1796768e66b7048c859,,,,, +399,Standing on the shoulders of AI: Toward a policy framework for AI use in scholarly publishing,Z Lin - 2023 - osf.io,"… into a chatbot, language model, or similar tool is a violation of our confidentiality agreement. +… Use of large language models and other generative AI tools is not allowed. The reviewer is …",https://osf.io/preprints/psyarxiv/jgck4/,https://osf.io/preprints/psyarxiv/jgck4/download,,,,, +400,A quantitative bow-tie cyber risk classification and assessment framework,"B Sheehan, F Murphy, AN Kia, R Kiely - Journal of Risk Research, 2021 - Taylor & Francis","… The resultant framework is applied to a large city hospital in Europe. The results highlighted +… It also provides a practical framework that allows insurers to assess risks, visualise areas of …",https://www.tandfonline.com/doi/abs/10.1080/13669877.2021.1900337,https://www.tandfonline.com/doi/pdf/10.1080/13669877.2021.1900337,,,,Regional, diff --git a/data_gathering/literature/formatted_papers.md b/data_gathering/literature/formatted_papers.md new file mode 100644 index 00000000..e69de29b diff --git a/data_gathering/literature/scripts.md b/data_gathering/literature/scripts.md new file mode 100644 index 00000000..dd171918 --- /dev/null +++ b/data_gathering/literature/scripts.md @@ -0,0 +1,111 @@ +# Data Processing Scripts for OWASP Top 10 LLM Analysis + +This repository contains a series of Python scripts designed for processing and categorizing data relevant to the study of OWASP Top 10 vulnerabilities in Large Language Models (LLMs). The scripts facilitate tasks such as combining CSV files, categorizing data based on various criteria, and applying these categorizations to a dataset. + +## Script 1: Combining CSV Files + +This script combines multiple CSV files into a single DataFrame. Ensure the correct paths are specified for your CSV files. + +```python +import pandas as pd + +# List of your CSV files with the correct path +csv_files = [ + 'C:\\PATH1.csv', + 'C:\\PATH2.csv', + 'C:\\PATH3.csv', + 'C:\\PATH4.csv' +] + +try: + # Combine all CSV files into one DataFrame + combined_df = pd.concat([pd.read_csv(file) for file in csv_files], ignore_index=True) + print("CSV files successfully combined.") +except Exception as e: + print("An error occurred:", e) +``` + +## Script 2: Categorization Functions + +These functions categorize text data based on predefined criteria such as research methods, focus areas, topics/themes, geographical and temporal focus. + +```python +def categorize_research_methods(text): + categories = [] + if 'case study' in text.lower(): + categories.append('Case Studies') + if 'interview' in text.lower(): + categories.append('Interviews') + if 'ethnography' in text.lower(): + categories.append('Ethnography') + if 'content analysis' in text.lower(): + categories.append('Content Analysis') + if 'survey' in text.lower() or 'questionnaire' in text.lower(): + categories.append('Surveys and Questionnaires') + if 'experiment' in text.lower(): + categories.append('Experiments') + if 'statistical analysis' in text.lower(): + categories.append('Statistical Analysis') + if any(method in categories for method in ['Case Studies', 'Interviews', 'Ethnography', 'Content Analysis']) and any(method in categories for method in ['Surveys and Questionnaires', 'Experiments', 'Statistical Analysis']): + categories.append('Mixed Methods') + return ', '.join(categories) + +def categorize_focus_areas(text): + categories = [] + if 'risk assessment' in text.lower(): + categories.append('Risk Assessment') + if 'expert opinion' in text.lower(): + categories.append('Expert Opinions') + if 'technology assessment' in text.lower(): + categories.append('Technological Assessments') + if 'policy' in text.lower() or 'regulation' in text.lower(): + categories.append('Policy and Regulation') + return ', '.join(categories) + +def categorize_topics_themes(text): + categories = [] + if 'llm security' in text.lower(): + categories.append('LLM Security') + if 'industry application' in text.lower(): + categories.append('Industry Applications') + if 'emerging threat' in text.lower(): + categories.append('Emerging Threats') + if 'solution' in text.lower() or 'mitigation' in text.lower(): + categories.append('Solutions and Mitigations') + return ', '.join(categories) + +def categorize_geographical_focus(text): + categories = [] + if 'global' in text.lower(): + categories.append('Global') + if 'regional' in text.lower() or any(region in text.lower() for region in ['asia', 'europe', 'america', 'africa']): + categories.append('Regional') + return ', '.join(categories) + +def categorize_temporal_focus(text): + categories = [] + if 'historical' in text.lower(): + categories.append('Historical Analyses') + if 'current' in text.lower() or 'present' in text.lower(): + categories.append('Current Issues') + if 'future' in text.lower() or 'prediction' in text.lower(): + categories.append('Future Predictions') + return ', '.join(categories) +``` + +## Script 3: Applying Categorizations + +This script applies the above categorization functions to the DataFrame created by combining CSV files. + +```python +combined_df['Focus Areas'] = combined_df[' Description '].apply(categorize_focus_areas) +combined_df['Topics or Themes'] = combined_df[' Description '].apply(categorize_topics_themes) +combined_df['Geographical Focus'] = combined_df[' Description '].apply(categorize_geographical_focus) +combined_df['Temporal Focus'] = combined_df[' Description '].apply(categorize_temporal_focus) +``` + +--- + +## Contribution + +We welcome contributions and suggestions to improve these scripts. Please feel free to fork the repository, make your changes, and submit a pull request. For any queries or suggestions, kindly open an issue in the repository. diff --git a/data_gathering/mappings/ASVS.md b/data_gathering/mappings/ASVS.md new file mode 100644 index 00000000..71776f09 --- /dev/null +++ b/data_gathering/mappings/ASVS.md @@ -0,0 +1,59 @@ +# OWASP Top 10 for LLMs Mapped to ASVS + +This document outlines how the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) can be addressed through the [Application Security Verification Standard (ASVS)](https://owasp.org/www-project-application-security-verification-standard/) by OWASP. While ASVS is tailored towards web applications, its principles can guide the security of web services and applications utilizing LLMs. + +## ASVS Requirements and LLM Vulnerabilities + +Each LLM vulnerability is mapped to relevant ASVS requirements that can help mitigate associated risks: + +### LLM01: Prompt Injection + +- **V5: Validation, Sanitization, and Encoding** + - Apply strict input validation and sanitization to prevent malicious or unexpected input from affecting LLM outputs. + +### LLM02: Insecure Output Handling + +- **V5: Validation, Sanitization, and Encoding** + - Ensure all outputs are encoded and handled securely to prevent injection attacks or information disclosure. + +### LLM03: Training Data Poisoning + +- **V7: Cryptography** + - Secure the integrity of training data through encryption and integrity checks to mitigate risks of data poisoning. + +### LLM04: Model Denial of Service + +- **V11: Business Logic Verification** + - Implement controls to prevent abuse of LLM features that could lead to denial of service, such as rate limiting and resource management. + +### LLM05: Supply-Chain Vulnerabilities + +- **V12: File and Resources Verification** + - Verify the security of third-party libraries and dependencies to address supply-chain vulnerabilities. + +### LLM06: Sensitive Information Disclosure + +- **V9: Data Protection** + - Protect sensitive data processed by LLMs through encryption, access controls, and data leakage prevention techniques. + +### LLM07: Insecure Plugin Design + +- **V14: Configuration Verification** + - Ensure plugins or extensions for LLMs are securely designed and do not introduce vulnerabilities. + +### LLM08: Excessive Agency + +- **V13: API and Web Service Verification** + - Design APIs interacting with LLMs to limit excessive agency and ensure secure communication. + +### LLM09: Overreliance + +- **V1: Architecture, Design, and Threat Modeling** + - Conduct threat modeling to identify and mitigate risks associated with overreliance on LLM technologies. + +### LLM10: Model Theft + +- **V9: Data Protection** + - Implement measures to protect LLM models as sensitive intellectual property, including access control and encryption. + +**Note:** The ASVS mapping provides a framework for addressing LLM vulnerabilities within web applications and services. It is important to tailor the implementation of these ASVS requirements to the specific context and architecture of applications utilizing LLM technologies. diff --git a/data_gathering/mappings/BSIMM.md b/data_gathering/mappings/BSIMM.md new file mode 100644 index 00000000..8a3a4f72 --- /dev/null +++ b/data_gathering/mappings/BSIMM.md @@ -0,0 +1,59 @@ +# OWASP Top 10 for LLMs Mapped to BSIMM Activities + +This document outlines how the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) can be addressed through the [Building Security In Maturity Model (BSIMM)](https://www.bsimm.com/) framework. BSIMM provides a set of software security practices organized into twelve domains, which can help mitigate these vulnerabilities through proactive and systematic security efforts. + +## General Approach with BSIMM + +To mitigate LLM vulnerabilities, organizations can adopt relevant BSIMM activities across its twelve domains. Here's how specific BSIMM practices can be applied: + +## LLM01: Prompt Injection + +- **Strategy & Metrics (SM)**: Define and measure security goals specific to LLM development. +- **Security Testing (ST)**: Implement automated and manual testing to detect prompt injection vulnerabilities. + +## LLM02: Insecure Output Handling + +- **Secure Coding (SC)**: Train developers on secure coding practices to prevent insecure output handling. +- **Security Testing (ST)**: Use dynamic analysis tools to identify and mitigate output handling issues. + +## LLM03: Training Data Poisoning + +- **Policy & Compliance (PC)**: Establish policies for secure handling and validation of training data. +- **Software Environment (SE)**: Secure the software environment against unauthorized access to training data. + +## LLM04: Model Denial of Service + +- **Architecture Analysis (AA)**: Conduct architecture analysis to identify potential DoS vulnerabilities in LLMs. +- **Performance Testing (PT)**: Simulate high-load scenarios to assess the model's resilience to DoS attacks. + +## LLM05: Supply-Chain Vulnerabilities + +- **Vendor Security Management (VSM)**: Assess and manage security risks associated with third-party components and services. +- **Security Standards and Requirements (SSR)**: Define security requirements for all suppliers and partners. + +## LLM06: Sensitive Information Disclosure + +- **Data Protection (DP)**: Implement data classification and encryption to protect sensitive information. +- **Security Testing (ST)**: Regularly test for vulnerabilities that could lead to information disclosure. + +## LLM07: Insecure Plugin Design + +- **Design Review (DR)**: Perform security design reviews for plugins and extensions. +- **Secure Coding (SC)**: Educate developers on secure plugin design and development practices. + +## LLM08: Excessive Agency + +- **Architecture Analysis (AA)**: Evaluate the decision-making processes of LLMs for security risks. +- **Design Review (DR)**: Review and assess the security implications of LLM agency and control mechanisms. + +## LLM09: Overreliance + +- **Education & Guidance (EG)**: Provide training on the appropriate use of LLM technologies and the risks of overreliance. +- **Strategy & Metrics (SM)**: Develop metrics to measure and manage the reliance on LLMs in critical decision-making processes. + +## LLM10: Model Theft + +- **Software Environment (SE)**: Secure access to model data and runtime environments. +- **Data Protection (DP)**: Use encryption and access controls to protect models from unauthorized access or theft. + +Note: While BSIMM provides a framework for building secure software, it's important to tailor these activities to the specific context of LLM development and deployment. Organizations should assess their security practices against BSIMM to identify areas for improvement and implement the most relevant activities to mitigate the risks associated with LLM vulnerabilities. diff --git a/data_gathering/mappings/CIS_Controls.md b/data_gathering/mappings/CIS_Controls.md new file mode 100644 index 00000000..68c0bc1d --- /dev/null +++ b/data_gathering/mappings/CIS_Controls.md @@ -0,0 +1,55 @@ +# OWASP Top 10 for LLMs Mapped to CIS (Center for Internet Security) Controls + +This document maps the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) to applicable [CIS Controls](https://www.cisecurity.org/controls/), providing guidance on mitigating these vulnerabilities through established cybersecurity best practices. + +## LLM01: Prompt Injection + +- **CIS Control 5**: [Secure Configuration for Hardware and Software on Mobile Devices, Laptops, Workstations, and Servers](https://www.cisecurity.org/controls/secure-configuration-for-hardware-and-software-on-mobile-devices-laptops-workstations-and-servers/) + - Ensure systems processing LLM inputs are configured to resist prompt injection attacks through proper security settings and input validation. + +## LLM02: Insecure Output Handling + +- **CIS Control 6**: [Maintenance, Monitoring, and Analysis of Audit Logs](https://www.cisecurity.org/controls/maintenance-monitoring-and-analysis-of-audit-logs/) + - Monitor and analyze logs to detect and respond to incidents involving insecure output handling. + +## LLM03: Training Data Poisoning + +- **CIS Control 13**: [Data Protection](https://www.cisecurity.org/controls/data-protection/) + - Protect data integrity through regular backups and encryption, ensuring poisoned data can be restored to a secure state. + +## LLM04: Model Denial of Service + +- **CIS Control 9**: [Limitation and Control of Network Ports, Protocols, and Services](https://www.cisecurity.org/controls/limitation-and-control-of-network-ports-protocols-and-services/) + - Manage network configurations to minimize the risk of DoS attacks on LLM systems. + +## LLM05: Supply-Chain Vulnerabilities + +- **CIS Control 15**: [Supply Chain Risk Management](https://www.cisecurity.org/controls/supply-chain-risk-management/) + - Assess and manage the security risks of software and hardware related to LLMs throughout the supply chain. + +## LLM06: Sensitive Information Disclosure + +- **CIS Control 13**: [Data Protection](https://www.cisecurity.org/controls/data-protection/) + - Implement controls to protect sensitive information from unauthorized access and disclosure. + +## LLM07: Insecure Plugin Design + +- **CIS Control 18**: [Application Software Security](https://www.cisecurity.org/controls/application-software-security/) + - Ensure secure development, deployment, and maintenance of plugins used by LLMs. + +## LLM08: Excessive Agency + +- **CIS Control 4**: [Secure Configuration for Hardware and Software on Mobile Devices, Laptops, Workstations, and Servers](https://www.cisecurity.org/controls/secure-configuration-for-hardware-and-software-on-mobile-devices-laptops-workstations-and-servers/) + - Configure LLMs and related systems to limit excessive operational control and agency. + +## LLM09: Overreliance + +- **CIS Control 7**: [Email and Web Browser Protections](https://www.cisecurity.org/controls/email-and-web-browser-protections/) + - While not directly applicable, promoting awareness and safe practices around the use of LLMs can mitigate risks associated with overreliance. + +## LLM10: Model Theft + +- **CIS Control 13**: [Data Protection](https://www.cisecurity.org/controls/data-protection/) + - Secure models and associated data against theft with encryption, access controls, and monitoring. + +Note: While the CIS Controls provide a comprehensive set of best practices for cybersecurity, the mapping to specific LLM vulnerabilities is intended to highlight applicable areas of focus. Organizations should consider a holistic security strategy that encompasses these controls to effectively mitigate risks associated with the use of LLMs. diff --git a/data_gathering/mappings/CODE_OF_CONDUCT.md b/data_gathering/mappings/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..63559a8d --- /dev/null +++ b/data_gathering/mappings/CODE_OF_CONDUCT.md @@ -0,0 +1,45 @@ +# Code of Conduct for the LLM Top 10 Mapping Project + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment include: + +- Using welcoming and inclusive language +- Being respectful of differing viewpoints and experiences +- Gracefully accepting constructive criticism +- Focusing on what is best for the community +- Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +- The use of sexualized language or imagery and unwelcome sexual attention or advances +- Trolling, insulting/derogatory comments, and personal or political attacks +- Public or private harassment +- Publishing others' private information, such as a physical or electronic address, without explicit permission +- Other conduct which could reasonably be considered inappropriate in a professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned with this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at our Slack channel [#team-llm-datagathering-methodology]. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/version/1/4/code-of-conduct.html). + +For answers to common questions about this code of conduct, see [https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). diff --git a/data_gathering/mappings/CONTRIBUTING.md b/data_gathering/mappings/CONTRIBUTING.md new file mode 100644 index 00000000..823279db --- /dev/null +++ b/data_gathering/mappings/CONTRIBUTING.md @@ -0,0 +1,51 @@ +# Contributing to OWASP Top 10 for LLM Mapping + +We welcome contributions from everyone interested in improving and expanding the mappings of the OWASP Top 10 vulnerabilities for Large Language Models (LLMs) to various cybersecurity frameworks and standards. Here's how you can contribute. + +## Types of Contributions + +### Reporting Bugs or Issues + +If you encounter a problem with any mapping, dataset, or scripts, or if you have suggestions for improving them, please open an issue on this GitHub repository with a clear title and a detailed description. Tag the issue with either `bug`, `enhancement`, or `question` to help maintainers triage it appropriately. + +### Suggesting Enhancements + +This project is open to suggestions for enhancements. This can include new mappings, changes to existing mappings, or improvements in the documentation. Open an issue to suggest enhancements, providing as much context and detail as possible. + +### Pull Requests + +Here is a quick guide on how to submit a pull request (PR): + +1. Fork the repository to your own GitHub account. +2. Clone the forked repository to your machine. +3. Create a new branch for your changes. +4. Make your changes on your branch. +5. Push your branch to your GitHub repository. +6. Submit a pull request to the main repository from your fork and branch. +7. Wait for a maintainer to review your PR, and be open to any further discussions or requests for changes. + +**Note**: Before submitting a pull request, please ensure that your changes do not break any existing functionality and that all code conforms to the project's coding standards. + +### Data Contributions + +If you have access to relevant literature, data, or cybersecurity framework information that is not currently included in the mapping, we encourage you to contribute. Please ensure that the data is reliable and appropriately sourced. + +### Documentation + +Improvements to documentation, whether it's a typo fix or an entirely new section, are greatly appreciated. Your documentation changes are more likely to be accepted quickly if they are clear, concise, and targeted. + +## Discussion and Collaboration + +The main channel for discussion and collaboration is on our Slack channel: `#team-llm-datagathering-methodology` + +We use this channel for regular discussions on the project's methodology, future enhancements, and any issues we're currently facing. It's a great place to ask questions, propose ideas, and collaborate with others who are working on similar problems. + +## Code of Conduct + +We have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to adhere to. This outlines our expectations for participant behavior as well as the consequences for unacceptable behavior. + +## Questions? + +If you have any questions, please feel free to ask on the GitHub issues or directly on the Slack channel (`#team-llm-datagathering-methodology`). + +Thank you for contributing to the OWASP Top 10 for LLM Mapping! diff --git a/data_gathering/mappings/CVE_CWE.md b/data_gathering/mappings/CVE_CWE.md new file mode 100644 index 00000000..91229e08 --- /dev/null +++ b/data_gathering/mappings/CVE_CWE.md @@ -0,0 +1,53 @@ +# Mapping OWASP Top 10 for LLMs to CVEs and CWEs + +This document maps the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) to related Common Vulnerabilities and Exposures ([CVEs](https://cve.mitre.org/)) and Common Weakness Enumeration ([CWEs](https://cwe.mitre.org/)). Given the novel nature of LLMs, direct CVE matches were not found at this stage, but relevant CWEs can provide insights into the types of weaknesses these vulnerabilities may exploit. + +## LLM01: Prompt Injection + +- **[CWE-77](https://cwe.mitre.org/data/definitions/77.html)**: Improper Neutralization of Special Elements used in a Command ('Command Injection') +- **[CWE-94](https://cwe.mitre.org/data/definitions/94.html)**: Improper Control of Generation of Code ('Code Injection') + +## LLM02: Insecure Output Handling + +- **[CWE-79](https://cwe.mitre.org/data/definitions/79.html)**: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') +- **[CWE-116](https://cwe.mitre.org/data/definitions/116.html)**: Improper Encoding or Escaping of Output + +## LLM03: Training Data Poisoning + +- **[CWE-506](https://cwe.mitre.org/data/definitions/506.html)**: Embedded Malicious Code +- **[CWE-915](https://cwe.mitre.org/data/definitions/915.html)**: Improperly Controlled Modification of Dynamically-Determined Object Attributes + +## LLM04: Model Denial of Service + +- **[CWE-400](https://cwe.mitre.org/data/definitions/400.html)**: Uncontrolled Resource Consumption + +## LLM05: Supply-Chain Vulnerabilities + +- **[CWE-829](https://cwe.mitre.org/data/definitions/829.html)**: Inclusion of Functionality from Untrusted Control Sphere +- **[CWE-937](https://cwe.mitre.org/data/definitions/937.html)**: Using Components with Known Vulnerabilities + +## LLM06: Sensitive Information Disclosure + +- **[CWE-200](https://cwe.mitre.org/data/definitions/200.html)**: Exposure of Sensitive Information to an Unauthorized Actor + +## LLM07: Insecure Plugin Design + +- **[CWE-749](https://cwe.mitre.org/data/definitions/749.html)**: Exposed Dangerous Method or Function +- **[CWE-1203](https://cwe.mitre.org/data/definitions/1203.html)**: Insecure Direct Object References + +## LLM08: Excessive Agency + +- **[CWE-807](https://cwe.mitre.org/data/definitions/807.html)**: Reliance on Untrusted Inputs in a Security Decision +- No direct CVE mapping available. + +## LLM09: Overreliance + +- **[CWE-1048](https://cwe.mitre.org/data/definitions/1048.html)**: Software Reliance on Single Factor Authentication in a Security Decision +- No direct CVE mapping available. + +## LLM10: Model Theft + +- **[CWE-494](https://cwe.mitre.org/data/definitions/494.html)**: Download of Code Without Integrity Check +- **[CWE-1241](https://cwe.mitre.org/data/definitions/1241.html)**: Improper Protection of Sensitive Information During Manufacturing or Distribution + +Note: Identifying specific CVE entries for LLM vulnerabilities is challenging due to the specificity of CVEs to software products or systems. However, the listed CWE entries provide a framework for understanding the types of weaknesses these vulnerabilities might exploit. diff --git a/data_gathering/mappings/CycloneDX_Software-Bill-of-Materials(SBOM).md b/data_gathering/mappings/CycloneDX_Software-Bill-of-Materials(SBOM).md new file mode 100644 index 00000000..76505dcf --- /dev/null +++ b/data_gathering/mappings/CycloneDX_Software-Bill-of-Materials(SBOM).md @@ -0,0 +1,169 @@ +# OWASP Top 10 for LLM Applications - CycloneDX SBOM Mapping + +This document provides a detailed mapping of the OWASP Top 10 vulnerabilities specific to Large Language Model (LLM) applications to the [CycloneDX Machine Learning Software Bill of Materials (SBOM)](https://cyclonedx.org/) structure, with expanded descriptions and mitigation strategies. + +## Overview + +The OWASP Top 10 for LLM Applications identifies the most critical security risks to LLM applications. By mapping these vulnerabilities to the CycloneDX SBOM, organizations can document, understand, and address these risks more effectively. More information about the OWASP Top 10 for LLM Applications can be found [here](https://owasp.org/www-project-top-10-for-large-language-model-applications/). + +## Vulnerabilities and SBOM Mapping + +### LLM01: Prompt Injection + +- **Description**: Attackers manipulate the model's output by crafting malicious inputs, exploiting the model's response generation process. +- **SBOM Component**: `ExternalDependencies` +- **Mitigation**: Implement input validation and sanitization libraries. Document these libraries in the SBOM under `ExternalDependencies`, including their version and security features. More details on input validation can be found [here](https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/07-Input_Validation_Testing/README). +- **Considerations**: Evaluate the risk based on the model's exposure to user-generated content and document any additional controls like rate limiting or monitoring for suspicious activity. + +### LLM02: Insecure Output Handling + +- **Description**: The model generates outputs that, if not properly sanitized, could lead to cross-site scripting (XSS), command injection, or other injection attacks when displayed or executed. +- **SBOM Component**: `Services`, `Data` +- **Mitigation**: Use output encoding libraries and data sanitization tools. Document these and their configurations in the SBOM, focusing on how they prevent specific attack vectors. OWASP's guide to output encoding can be found [here](https://owasp.org/www-project-proactive-controls/v3/en/c4-encode-escape-data). +- **Considerations**: Regularly update the documentation as new output handling mechanisms are implemented or as existing ones are updated. + +### LLM03: Training Data Poisoning + +- **Description**: Malicious actors introduce biased or malicious data into the training dataset, aiming to manipulate the model's behaviour. +- **SBOM Component**: `Data` +- **Mitigation**: Document the sources of training data, including any mechanisms for vetting or validating this data. Outline procedures for data integrity checks and periodic reviews of data sources for potential contamination. +- **Considerations**: Include information on anomaly detection systems or data sanitization processes used to cleanse training data. + +### LLM04: Model Denial of Service + +- **Description**: Overloading the model with a high volume of requests or complex queries, causing performance degradation or service unavailability. +- **SBOM Component**: `Services` +- **Mitigation**: Implement rate limiting and load balancing. Document the architecture and tools used to mitigate DoS attacks in the SBOM, including any third-party services or libraries. Information on DoS mitigation strategies can be found [here](https://www.cloudflare.com/en-ca/resource-hub/five-best-practices-for-mitigating-ddos-attacks/). +- **Considerations**: Regular testing for vulnerabilities to DoS attacks and updates to mitigation strategies should be documented in version updates within the SBOM. + +### LLM05: Supply-Chain Vulnerabilities + +- **Description**: Vulnerabilities introduced through compromised or malicious third-party components, such as libraries, packages, or data sources. +- **SBOM Component**: `ExternalDependencies` +- **Mitigation**: Conduct security audits of all third-party components. Document each component's origin, version, and the results of security assessments in the SBOM. Guidelines for securing the software supply chain can be found [here](https://www.cisa.gov/sites/default/files/publications/defending_against_software_supply_chain_attacks_508_1.pdf). +- **Considerations**: Establish a regular review process for third-party dependencies to ensure they remain secure over time. + +### LLM06: Sensitive Information Disclosure + +- **Description**: The model inadvertently exposes sensitive information in its responses, which could lead to privacy breaches. +- **SBOM Component**: `Data`, `Services` +- **Mitigation**: Implement data masking and anonymization techniques. Document the mechanisms in place to protect sensitive information, including any privacy-enhancing technologies (PETs) used. +- **Considerations**: Policies for data retention, anonymization, and response filtering should be clearly documented and regularly reviewed. + +### LLM07: Insecure Plugin Design + +- **Description**: Plugins or extensions that add functionality to the LLM application also introduce new vulnerabilities. +- **SBOM Component**: `Extensions` +- **Mitigation**: Establish security guidelines for plugin development. Document all plugins, their security features, and any known vulnerabilities in the SBOM. +- **Considerations**: Regular security assessments of plugins and updates to their documentation in the SBOM are crucial for maintaining security. + +### LLM08: Excessive Agency + +- **Description**: Allowing the model too much autonomy in decision-making processes, potentially leading to unintended or harmful actions. +- **SBOM Component**: `Properties` +- **Mitigation**: Define strict operational boundaries for the model. Document these boundaries, including any control mechanisms or oversight procedures, in the SBOM. +- **Considerations**: Regular reviews and updates to the model's operational guidelines are necessary to adapt to new insights or changes in the model's capabilities. + +### LLM09: Overreliance + +- **Description**: Excessive dependence on the LLM for critical decision-making without adequate human oversight, increasing the risk of impactful errors. +- **SBOM Component**: `Policies` +- **Mitigation**: Implement policies for human review and oversight of critical decisions. Document these policies in the SBOM, detailing the roles and responsibilities of human supervisors. +- **Considerations**: The effectiveness of oversight mechanisms should be periodically evaluated and documented in the SBOM updates. + +### LLM10: Model Theft + +- **Description**: Unauthorized copying or extraction of the model, leading to intellectual property theft or unauthorized use. +- **SBOM Component**: `Licenses` +- **Mitigation**: Use digital rights management (DRM) and encryption to protect the model. Document the licensing terms, protection measures, and any breaches or attempted thefts in the SBOM. +- **Considerations**: Regularly update the SBOM with information on new security measures or incidents related to model theft. + +## Conclusion + +By mapping the OWASP Top 10 vulnerabilities for LLM Applications to the CycloneDX SBOM, organizations can create a comprehensive documentation and mitigation strategy for securing their machine learning systems. This detailed approach ensures that all aspects of LLM application security are considered, documented, and actively managed. + + +# OWASP CycloneDX SBOM for Machine Learning Projects + +This document outlines the structure of a Software Bill of Materials (SBOM) for machine learning projects, following the OWASP CycloneDX standard. It aims to capture the essential components, libraries, and dependencies that are typically involved in such projects, with a focus on security as per the OWASP Top 10 for LLMs. + +## Project Overview + +- **Name**: Machine Learning Project (Example) +- **Description**: A project leveraging Large Language Models to perform text analysis, sentiment analysis, and predictive modelling. +- **Version**: 1.0.0 +- **Authors**: [Project Team Members] +- **License**: [Applicable License] + +## Components + +### Machine Learning Frameworks + +- **TensorFlow** + - **Version**: 2.6.0 + - **Supplier**: TensorFlow Authors + - **Licenses**: Apache License 2.0 + - **Website**: [https://www.tensorflow.org/](https://www.tensorflow.org/) + +- **PyTorch** + - **Version**: 1.9.0 + - **Supplier**: PyTorch Authors + - **Licenses**: BSD License + - **Website**: [https://pytorch.org/](https://pytorch.org/) + +### Libraries for Data Processing + +- **Pandas** + - **Version**: 1.3.1 + - **Supplier**: PyData Development Team + - **Licenses**: BSD License + - **Website**: [https://pandas.pydata.org/](https://pandas.pydata.org/) + +- **NumPy** + - **Version**: 1.21.1 + - **Supplier**: NumPy Developers + - **Licenses**: BSD License + - **Website**: [https://numpy.org/](https://numpy.org/) + +### Security Tools and Libraries + +- **Bandit** + - **Version**: 1.7.0 + - **Supplier**: PyCQA + - **Licenses**: Apache License 2.0 + - **Website**: [https://bandit.readthedocs.io/en/latest/](https://bandit.readthedocs.io/en/latest/) + +- **OWASP Dependency-Check** + - **Version**: 6.2.2 + - **Supplier**: OWASP + - **Licenses**: Apache License 2.0 + - **Website**: [https://owasp.org/www-project-dependency-check/](https://owasp.org/www-project-dependency-check/) + +### Datasets + +- **Sentiment140** + - **Version**: [Dataset Version] + - **Supplier**: Stanford University + - **Licenses**: [Dataset License] + - **Website**: [http://help.sentiment140.com/for-students](http://help.sentiment140.com/for-students) + +## Dependencies + +[IN PROGRESS - List of any external dependencies not included above, with versions, suppliers, and licenses.] + +## Known Vulnerabilities + +[IN PROGRESS - List of known vulnerabilities associated with the components and dependencies listed, if any.] + +## Acknowledgments + +- **OWASP CycloneDX**: For providing the SBOM standard. +- **OWASP Top 10 for LLMs Team**: For insights into security considerations for machine learning projects. + +## How to Contribute + +For instructions on how to contribute to this SBOM or the associated machine-learning project, please see [CONTRIBUTING.md](./CONTRIBUTING.md). + +## License + +This SBOM is shared under the [MIT License](./LICENSE), unless otherwise noted for specific components or datasets. diff --git a/data_gathering/mappings/ENISA.md b/data_gathering/mappings/ENISA.md new file mode 100644 index 00000000..98d7ed30 --- /dev/null +++ b/data_gathering/mappings/ENISA.md @@ -0,0 +1,59 @@ +# OWASP Top 10 for LLMs Mapped to ENISA Recommendations + +This document outlines how the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) can be addressed using the [European Union Agency for Cybersecurity (ENISA) recommendations](https://www.enisa.europa.eu/). ENISA provides comprehensive guidelines for improving cybersecurity, which can be leveraged to mitigate risks associated with LLM vulnerabilities. + +## ENISA Cybersecurity Recommendations + +To enhance the security posture of LLMs and mitigate vulnerabilities, organizations can adopt the following ENISA recommendations: + +### LLM01: Prompt Injection + +- **Risk Management**: Implement a thorough risk assessment process to identify and mitigate risks associated with prompt injection vulnerabilities. +- **Security Measures**: Apply input validation and sanitization techniques to prevent malicious inputs from affecting LLM outputs. + +### LLM02: Insecure Output Handling + +- **Data Protection**: Ensure that data handling and output generation processes incorporate security measures to prevent data leaks and tampering. +- **Incident Response**: Develop and implement an incident response plan that includes procedures for handling incidents involving insecure output handling. + +### LLM03: Training Data Poisoning + +- **Supply Chain Security**: Secure the supply chain for training data to prevent poisoning attacks, including vetting sources and implementing integrity checks. +- **Awareness and Training**: Increase awareness among stakeholders involved in data collection and model training about the risks of data poisoning. + +### LLM04: Model Denial of Service + +- **Business Continuity**: Prepare business continuity and disaster recovery plans that include scenarios for dealing with model denial of service attacks. +- **Technical Measures**: Implement rate limiting, resource allocation, and other technical measures to protect against DoS attacks. + +### LLM05: Supply-Chain Vulnerabilities + +- **Supply Chain Security**: Conduct security assessments of third-party vendors and integrate security considerations into the procurement process. +- **Third-Party Risk Management**: Establish a comprehensive third-party risk management framework to continuously monitor and manage the security of third-party components. + +### LLM06: Sensitive Information Disclosure + +- **Data Protection and Privacy**: Apply strict data protection measures, including encryption and access control, to safeguard sensitive information processed by LLMs. +- **Compliance**: Ensure compliance with relevant data protection regulations, such as the [General Data Protection Regulation (GDPR)](https://gdpr-info.eu/), to prevent unauthorized disclosure of personal data. + +### LLM07: Insecure Plugin Design + +- **Secure Development**: Follow secure development practices for the design and implementation of plugins, including security testing and code reviews. +- **Vendor Security Assessment**: Assess the security of third-party plugins before integration into the LLM ecosystem. + +### LLM08: Excessive Agency + +- **Ethical Considerations**: Address ethical considerations in the design and deployment of LLMs to ensure that they do not exceed their intended agency. +- **Security by Design**: Incorporate security and ethical guidelines into the development lifecycle of LLMs to control their decision-making capabilities. + +### LLM09: Overreliance + +- **Awareness and Training**: Conduct training sessions to educate users about the limitations of LLMs and the risks associated with overreliance. +- **Human Oversight**: Implement mechanisms for human oversight and intervention in critical decision-making processes involving LLMs. + +### LLM10: Model Theft + +- **Intellectual Property Protection**: Protect intellectual property related to LLMs through legal and technical measures, including access controls and encryption. +- **Incident Response**: Include model theft scenarios in the incident response plan, outlining steps to detect, respond to, and recover from such incidents. + +Note: The application of ENISA's recommendations requires a strategic approach tailored to the specific context and risks associated with LLM technologies. Organizations should engage in continuous improvement processes to address emerging vulnerabilities and threats in the evolving landscape of large language models. diff --git a/data_gathering/mappings/FAIR.md b/data_gathering/mappings/FAIR.md new file mode 100644 index 00000000..530f306f --- /dev/null +++ b/data_gathering/mappings/FAIR.md @@ -0,0 +1,95 @@ +# OWASP Top 10 for LLMs Mapped to FAIR Risk Assessment Framework + +This document outlines an approach to evaluate the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) using the [Factor Analysis of Information Risk (FAIR)](https://www.fairinstitute.org/what-is-fair) framework. The goal is to quantify the risk associated with each vulnerability in terms of its impact on the confidentiality, integrity, and availability of information assets. + +## General Approach to Applying FAIR + +For each LLM vulnerability, the following FAIR components can be analyzed: + +- **Threat Event Frequency (TEF)**: Estimate how often a threat event exploiting the vulnerability might occur. +- **Vulnerability (VULN)**: Assess the likelihood that the threat event will successfully exploit the vulnerability. +- **Contact Frequency (COF)**: Determine the frequency with which threat agents come into contact with the vulnerability. +- **Probability of Action (POA)**: Estimate the likelihood that the threat agent will act on the vulnerability. +- **Loss Magnitude (LM)**: Estimate the potential impact or loss magnitude resulting from a successful exploit. + +## LLM01: Prompt Injection + +- **TEF**: High in environments where user inputs are frequently processed. +- **VULN**: Moderate to high, depending on input validation measures. +- **COF**: High in interactive systems accessible by numerous users. +- **POA**: High, given the low cost and potential impact of exploitation. +- **LM**: Can vary from low to high, based on the sensitivity of the manipulated outputs. + +## LLM02: Insecure Output Handling + +- **TEF**: Moderate, depending on the application's output handling mechanisms. +- **VULN**: High if outputs are not properly sanitized or encoded. +- **COF**: Moderate in systems where outputs are dynamically generated based on user inputs. +- **POA**: Moderate, as exploiting these vulnerabilities requires specific knowledge. +- **LM**: Varies, potentially high if leading to unauthorized actions or data exposure. + +## LLM03: Training Data Poisoning + +- **TEF**: Low to moderate, as it requires access to the training data pipeline. +- **VULN**: High, since poisoning can significantly impact model behavior. +- **COF**: Low, given the controlled environments of training data collection and processing. +- **POA**: Low to moderate, dependent on the attacker's motivation and resources. +- **LM**: High, due to the potential for widespread impact on model decisions. + +## LLM04: Model Denial of Service + +- **TEF**: Moderate, especially in publicly accessible models. +- **VULN**: Moderate to high, based on resource management and input validation. +- **COF**: Moderate to high for internet-facing models. +- **POA**: High, given the potential disruption and relatively low effort required. +- **LM**: High, due to operational disruption and potential recovery costs. + +## LLM05: Supply-Chain Vulnerabilities + +- **TEF**: Low to moderate, depending on the security of the supply chain. +- **VULN**: High, as compromise in the supply chain can have widespread effects. +- **COF**: Low, requiring specific targeting or insider access. +- **POA**: Moderate, influenced by the attractiveness of the target and the attacker's capabilities. +- **LM**: High, given the potential for systemic weaknesses and widespread impact. + +## LLM06: Sensitive Information Disclosure + +- **TEF**: Moderate, especially in systems handling sensitive data. +- **VULN**: High, if data protection measures are inadequate. +- **COF**: Moderate to high, depending on system accessibility. +- **POA**: High, due to the value of sensitive information. +- **LM**: High, considering the potential for financial, reputational, and legal impacts. + +## LLM07: Insecure Plugin Design + +- **TEF**: Low to moderate, based on the plugin ecosystem's security practices. +- **VULN**: High for systems heavily reliant on plugins. +- **COF**: Low, requiring specific knowledge and access to exploit. +- **POA**: Moderate, contingent upon the perceived value of exploitation. +- **LM**: Varies, can be high if leading to system compromise or data breaches. + +## LLM08: Excessive Agency + +- **TEF**: Low, as it requires specific conditions and knowledge to exploit. +- **VULN**: Moderate, dependent on the implementation of decision-making constraints. +- **COF**: Low, limited to scenarios where the model has significant control. +- **POA**: Low to moderate, based on the complexity of exploiting such vulnerabilities. +- **LM**: Moderate to high, due to potential unintended actions or decisions. + +## LLM09: Overreliance + +- **TEF**: High, as overreliance is a common issue in technology adoption. +- **VULN**: Not applicable, as overreliance is more about user behavior than a technical vulnerability. +- **COF**: High, inherent in the deployment of LLM technologies. +- **POA**: Not applicable. +- **LM**: Moderate to high, due to potential operational and strategic risks. + +## LLM10: Model Theft + +- **TEF**: Low to moderate, depending on access controls and the value of the model. +- **VULN**: High for valuable models without adequate protection. +- **COF**: Low, requiring targeted efforts to access and exfiltrate the model. +- **POA**: Moderate, driven by the potential gains from stealing the model. +- **LM**: High, due to intellectual property loss and competitive disadvantage. + +Note: The FAIR analysis for each vulnerability is a high-level estimation intended to guide risk assessment efforts. Organizations should perform detailed FAIR analyses based on their specific contexts, assets, and threat landscapes. diff --git a/data_gathering/mappings/ISO20547-4:2020.md b/data_gathering/mappings/ISO20547-4:2020.md new file mode 100644 index 00000000..88dd30d4 --- /dev/null +++ b/data_gathering/mappings/ISO20547-4:2020.md @@ -0,0 +1,49 @@ +# OWASP Top 10 for LLMs Mapped to ISO/IEC 20547-4:2020 + +This document outlines the application of [ISO/IEC 20547-4:2020](https://www.iso.org/standard/72089.html), focusing on security and privacy in big data systems, to address the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#). The standard provides a comprehensive framework for managing security and privacy risks in big data architectures, which can be adapted to the specific challenges posed by LLMs. + +## Security and Privacy Considerations for LLM Vulnerabilities + +Mapping each LLM vulnerability to relevant aspects of ISO/IEC 20547-4:2020: + +## LLM01: Prompt Injection + +- **Data Processing Security**: Implement secure processing controls to validate and sanitize inputs, preventing prompt injection vulnerabilities. + +## LLM02: Insecure Output Handling + +- **Data Management and Privacy**: Apply principles of data minimization and secure data handling to manage outputs securely, ensuring they are properly encoded and sanitized. + +## LLM03: Training Data Poisoning + +- **Data Source Security**: Ensure the integrity and security of data sources for training LLMs, including mechanisms for validating and vetting training data. + +## LLM04: Model Denial of Service + +- **System and Infrastructure Security**: Design robust infrastructure capable of handling high loads and resisting denial of service attacks through rate limiting and resource allocation. + +## LLM05: Supply-Chain Vulnerabilities + +- **Data Storage and Infrastructure Security**: Assess and secure the supply chain for software and data components, including third-party services and libraries used by LLMs. + +## LLM06: Sensitive Information Disclosure + +- **Privacy Protection**: Implement strong data protection measures, including encryption and access controls, to prevent unauthorized disclosure of sensitive information. + +## LLM07: Insecure Plugin Design + +- **System and Application Security**: Ensure plugins or extensions for LLMs are developed with security in mind, including regular security assessments and updates. + +## LLM08: Excessive Agency + +- **Data Processing and Management**: Clearly define and enforce boundaries for LLM decision-making capabilities, ensuring they align with security and privacy requirements. + +## LLM09: Overreliance + +- **Human Factor and Training**: Address the risks associated with overreliance on LLM technologies through user education and awareness programs. + +## LLM10: Model Theft + +- **Intellectual Property Protection**: Secure LLM models as valuable intellectual property, applying appropriate security measures to prevent unauthorized access and theft. + +Note: Implementing the guidelines of ISO/IEC 20547-4:2020 for LLM technologies requires a comprehensive approach, taking into account the unique characteristics and risks associated with large-scale data processing systems. Organizations should conduct regular security and privacy assessments to identify and mitigate potential vulnerabilities in their LLM implementations. diff --git a/data_gathering/mappings/ISO27001.md b/data_gathering/mappings/ISO27001.md new file mode 100644 index 00000000..25f8346e --- /dev/null +++ b/data_gathering/mappings/ISO27001.md @@ -0,0 +1,59 @@ +# OWASP Top 10 for LLMs Mapped to ISO/IEC 27001 Controls + +This document outlines how the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) can be addressed through the implementation of [ISO/IEC 27001](https://www.iso.org/isoiec-27001-information-security.html) controls. ISO 27001 provides a comprehensive set of information security standards that can be leveraged to mitigate risks associated with these vulnerabilities. + +## ISO/IEC 27001 Controls and Principles + +For each LLM vulnerability, relevant ISO 27001 controls and principles are suggested to enhance security and manage risks: + +## LLM01: Prompt Injection + +- **A.14 System acquisition, development and maintenance**: Ensure security is integrated into the development lifecycle, including input validation mechanisms to prevent prompt injection. [(A.14 Reference)](https://www.iso.org/standard/54534.html) +- **A.18 Compliance**: Ensure compliance with legal and technical requirements to protect against security breaches stemming from prompt injection. [(A.18 Reference)](https://www.iso.org/standard/54535.html) + +## LLM02: Insecure Output Handling + +- **A.14 System acquisition, development and maintenance**: Implement secure coding practices and output encoding to prevent insecure output handling. [(A.14 Reference)](https://www.iso.org/standard/54534.html) +- **A.10 Cryptography**: Utilize encryption and key management practices to secure data in transit and at rest, mitigating risks of data exposure through insecure outputs. [(A.10 Reference)](https://www.iso.org/standard/54533.html) + +## LLM03: Training Data Poisoning + +- **A.12 Operations security**: Apply secure procedures for managing data processing and storage to prevent training data poisoning. [(A.12 Reference)](https://www.iso.org/standard/54532.html) +- **A.16 Information security incident management**: Establish incident management processes to respond to incidents of data poisoning effectively. [(A.16 Reference)](https://www.iso.org/standard/54536.html) + +## LLM04: Model Denial of Service + +- **A.17 Information security aspects of business continuity management**: Develop and implement business continuity plans that address the risk of model denial of service. [(A.17 Reference)](https://www.iso.org/standard/54537.html) +- **A.13 Communications security**: Secure network services against DoS attacks through traffic filtering, monitoring, and management. [(A.13 Reference)](https://www.iso.org/standard/54531.html) + +## LLM05: Supply-Chain Vulnerabilities + +- **A.15 Supplier relationships**: Manage risks related to suppliers and service providers to address supply-chain vulnerabilities. [(A.15 Reference)](https://www.iso.org/standard/54538.html) +- **A.14 System acquisition, development and maintenance**: Ensure that information security is a key criterion in the development and acquisition of software products and services. [(A.14 Reference)](https://www.iso.org/standard/54534.html) + +## LLM06: Sensitive Information Disclosure + +- **A.8 Asset management**: Classify information assets and apply appropriate controls to prevent unauthorized access and disclosure. [(A.8 Reference)](https://www.iso.org/standard/54529.html) +- **A.13 Communications security**: Implement network security controls and encryption to protect sensitive data during transmission. [(A.13 Reference)](https://www.iso.org/standard/54531.html) + +## LLM07: Insecure Plugin Design + +- **A.14 System acquisition, development and maintenance**: Integrate security considerations into the development and maintenance of plugins. [(A.14 Reference)](https://www.iso.org/standard/54534.html) +- **A.12 Operations security**: Apply strict access controls and monitoring to manage the risks associated with third-party plugins. [(A.12 Reference)](https://www.iso.org/standard/54532.html) + +## LLM08: Excessive Agency + +- **A.6 Organization of information security**: Define roles and responsibilities clearly to manage the risks of excessive agency in LLMs. [(A.6 Reference)](https://www.iso.org/standard/54527.html) +- **A.14 System acquisition, development and maintenance**: Incorporate functionality that ensures LLMs operate within their intended boundaries. [(A.14 Reference)](https://www.iso.org/standard/54534.html) + +## LLM09: Overreliance + +- **A.7 Human resource security**: Provide awareness training to staff to mitigate the risks associated with overreliance on LLM technologies. [(A.7 Reference)](https://www.iso.org/standard/54528.html) +- **A.5 Information security policies**: Develop policies that address the appropriate use of LLM technologies and define acceptable levels of reliance. [(A.5 Reference)](https://www.iso.org/standard/54526.html) + +## LLM10: Model Theft + +- **A.8 Asset management**: Identify and classify LLM models as critical information assets, implementing strong access control and protection measures. [(A.8 Reference)](https://www.iso.org/standard/54529.html) +- **A.13 Communications security**: Secure the transmission of model data to prevent unauthorized interception and theft. [(A.13 Reference)](https://www.iso.org/standard/54531.html) + +Note: Implementing ISO 27001 controls requires a comprehensive approach to information security management. Organizations should conduct regular risk assessments to identify and address specific vulnerabilities associated with LLM technologies, applying the relevant ISO 27001 controls to mitigate these risks effectively. diff --git a/data_gathering/mappings/LICENSE b/data_gathering/mappings/LICENSE new file mode 100644 index 00000000..37a0902c --- /dev/null +++ b/data_gathering/mappings/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2024 emmanuelgjr + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data_gathering/mappings/MITREATLAS.md b/data_gathering/mappings/MITREATLAS.md new file mode 100644 index 00000000..ffc79f11 --- /dev/null +++ b/data_gathering/mappings/MITREATLAS.md @@ -0,0 +1,94 @@ +# OWASP Top 10 for LLMs Mapped to MITRE ATLAS with Mitigations + +This document provides a comprehensive mapping of the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) to [MITRE's Adversarial Tactics, Techniques, and Common Knowledge (ATLAS)](https://atlas.mitre.org/) framework. It includes a wide range of tactics and techniques for each vulnerability, along with suggested mitigations based on ATLAS and general cybersecurity best practices. + +## LLM01: Prompt Injection + +- **ATLAS Techniques**: + - [T1193 - Spearphishing Attachment](https://atlas.mitre.org/techniques/T1193) + - [T1059 - Command and Scripting Interpreter](https://atlas.mitre.org/techniques/T1059) + - [T1140 - Deobfuscate/Decode Files or Information](https://atlas.mitre.org/techniques/T1140) + - [T1204 - User Execution](https://atlas.mitre.org/techniques/T1204) +- **Mitigations**: Implement robust input validation. Educate users on secure coding practices. Use context-aware filtering. Employ behavior monitoring and anomaly detection. + +## LLM02: Insecure Output Handling + +- **ATLAS Techniques**: + - [T1203 - Exploitation for Client Execution](https://atlas.mitre.org/techniques/T1203) + - [T1021 - Remote Services](https://atlas.mitre.org/techniques/T1021) + - [T1064 - Scripting](https://atlas.mitre.org/techniques/T1064) + - [T1559 - Inter-Process Communication](https://atlas.mitre.org/techniques/T1559) +- **Mitigations**: Sanitize outputs. Use output encoding and secure rendering techniques. Monitor for unusual output patterns. Implement output control mechanisms. + +## LLM03: Training Data Poisoning + +- **ATLAS Techniques**: + - [T1588 - Obtain Capabilities](https://atlas.mitre.org/techniques/T1588) + - [T1496 - Resource Hijacking](https://atlas.mitre.org/techniques/T1496) + - [T1565 - Data Manipulation](https://atlas.mitre.org/techniques/T1565) + - [T1199 - Trusted Relationship](https://atlas.mitre.org/techniques/T1199) +- **Mitigations**: Validate and sanitize training data. Use anomaly detection. Implement robust access controls. Regularly update and audit data sources. + +## LLM04: Model Denial of Service + +- **ATLAS Techniques**: + - [T1499 - Endpoint Denial of Service](https://atlas.mitre.org/techniques/T1499) + - [T1485 - Data Destruction](https://atlas.mitre.org/techniques/T1485) + - [T1498 - Network Denial of Service](https://atlas.mitre.org/techniques/T1498) + - [T1490 - Inhibit System Recovery](https://atlas.mitre.org/techniques/T1490) +- **Mitigations**: Implement rate limiting and computational resource management. Validate inputs to prevent DoS attacks. Employ redundancy and resilient design. Monitor system performance and set up alerts for unusual activity. + +## LLM05: Supply-Chain Vulnerabilities + +- **ATLAS Techniques**: + - [T1195 - Supply Chain Compromise](https://atlas.mitre.org/techniques/T1195) + - [T1190 - Exploit Public-Facing Application](https://atlas.mitre.org/techniques/T1190) + - [T1185 - Man in the Middle](https://atlas.mitre.org/techniques/T1185) + - [T1601 - Gather Victim Identity Information](https://atlas.mitre.org/techniques/T1601) +- **Mitigations**: Conduct security audits of third-party components. Use trusted sources and monitor for vulnerabilities. Implement strict access controls and secure communication protocols. Regularly update and patch software components. + +## LLM06: Sensitive Information Disclosure + +- **ATLAS Techniques**: + - [T1530 - Data from Information Repositories](https://atlas.mitre.org/techniques/T1530) + - [T1482 - Domain Trust Discovery](https://atlas.mitre.org/techniques/T1482) + - [T1497 - Virtualization/Sandbox Evasion](https://atlas.mitre.org/techniques/T1497) + - [T1564 - Hide Artifacts](https://atlas.mitre.org/techniques/T1564) +- **Mitigations**: Implement access controls and encrypt data. Monitor data access patterns for unusual activities. Employ data loss prevention techniques. Regularly audit data storage and transmission security. + +## LLM07: Insecure Plugin Design + +- **ATLAS Techniques**: + - [T1211 - Exploitation for Defense Evasion](https://atlas.mitre.org/techniques/T1211) + - [T1553 - Subvert Trust Controls](https://atlas.mitre.org/techniques/T1553) + - [T1555 - Credentials from Password Stores](https://atlas.mitre.org/techniques/T1555) + - [T1574 - Hijack Execution Flow](https://atlas.mitre.org/techniques/T1574) +- **Mitigations**: Follow secure design principles for plugins. Audit and review plugin code regularly. Use code signing and integrity verification. Educate developers about secure plugin development practices. + +## LLM08: Excessive Agency + +- **ATLAS Techniques**: + - [T1562 - Impair Defenses](https://atlas.mitre.org/techniques/T1562) + - [T1548 - Abuse Elevation Control Mechanism](https://atlas.mitre.org/techniques/T1548) + - [T1550 - Use Alternate Authentication Material](https://atlas.mitre.org/techniques/T1550) + - [T1556 - Modify Authentication Process](https://atlas.mitre.org/techniques/T1556) +- **Mitigations**: Limit LLM decision-making capabilities. Implement oversight mechanisms for actions suggested by LLMs. Regularly review and update access control policies. Conduct regular security audits and risk assessments. + +## LLM09: Overreliance + +- **ATLAS Techniques**: + - [T1608 - Stage Capabilities](https://atlas.mitre.org/techniques/T1608) + - [T1558 - Steal or Forge Kerberos Tickets](https://atlas.mitre.org/techniques/T1558) + - [T1525 - Implant Internal Image](https://atlas.mitre.org/techniques/T1525) +- **Mitigations**: Educate stakeholders on LLM capabilities and limitations. Develop policies for responsible use and implement oversight mechanisms. Conduct regular training and awareness programs. Implement fail-safes and manual oversight where necessary. + +## LLM10: Model Theft + +- **ATLAS Techniques**: + - [T1602 - Data Encrypted for Impact](https://atlas.mitre.org/techniques/T1602) + - [T1531 - Account Access Removal](https://atlas.mitre.org/techniques/T1531) + - [T1583 - Acquire Infrastructure](https://atlas.mitre.org/techniques/T1583) + - [T1586 - Compromise Accounts](https://atlas.mitre.org/techniques/T1586) +- **Mitigations**: Encrypt model data and use access controls. Implement secure storage and transmission practices. Regularly audit and monitor access to model data. Educate employees on phishing and social engineering tactics. + +Note: The mappings and mitigations are approximations and should be adapted to specific contexts and evolving cybersecurity landscapes. For a comprehensive understanding of tactics, techniques, and mitigations, consult the [MITRE ATLAS documentation](https://atlas.mitre.org/). diff --git a/data_gathering/mappings/MITREATT&CK.md b/data_gathering/mappings/MITREATT&CK.md new file mode 100644 index 00000000..c75b3142 --- /dev/null +++ b/data_gathering/mappings/MITREATT&CK.md @@ -0,0 +1,67 @@ +# OWASP Top 10 for LLMs Mapped to MITRE ATT&CK with Mitigations + +This document outlines the potential exploitation of the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) within the context of the [MITRE ATT&CK framework](https://attack.mitre.org/). It identifies relevant tactics and techniques adversaries might use and suggests mitigations to protect against these threats. + +## MITRE ATT&CK Tactics, Techniques, and Mitigations for LLM Vulnerabilities + +### LLM01: Prompt Injection + +- **Tactic**: Execution [(T1059)](https://attack.mitre.org/tactics/TA0002/) +- **Technique**: Command and Scripting Interpreter [(T1059.001)](https://attack.mitre.org/techniques/T1059/001/) +- **Mitigation**: Implement input validation and sanitization. Conduct regular security reviews and training for developers on secure coding practices. + +### LLM02: Insecure Output Handling + +- **Tactic**: Initial Access [(T1190)](https://attack.mitre.org/tactics/TA0001/) +- **Technique**: Exploit Public-Facing Application [(T1190)](https://attack.mitre.org/techniques/T1190/) +- **Mitigation**: Use content security policies and output encoding to prevent execution of malicious scripts. Regularly update and patch software components. + +### LLM03: Training Data Poisoning + +- **Tactic**: Persistence [(T1136)](https://attack.mitre.org/tactics/TA0003/) +- **Technique**: Create Account [(T1136)](https://attack.mitre.org/techniques/T1136/) +- **Mitigation**: Secure access to training data. Implement robust data validation and anomaly detection systems to identify and mitigate poisoned data. + +### LLM04: Model Denial of Service + +- **Tactic**: Impact [(T1485)](https://attack.mitre.org/tactics/TA0040/) +- **Technique**: Data Destruction [(T1485)](https://attack.mitre.org/techniques/T1485/) +- **Mitigation**: Design systems with scalability and fault tolerance in mind. Use rate limiting and monitor workloads to detect and mitigate DoS attacks. + +### LLM05: Supply-Chain Vulnerabilities + +- **Tactic**: Initial Access [(T1195)](https://attack.mitre.org/tactics/TA0001/) +- **Technique**: Supply Chain Compromise [(T1195)](https://attack.mitre.org/techniques/T1195/) +- **Mitigation**: Conduct security assessments of third-party vendors. Monitor for vulnerabilities in third-party components and apply patches promptly. + +### LLM06: Sensitive Information Disclosure + +- **Tactic**: Collection [(T1119)](https://attack.mitre.org/tactics/TA0009/) +- **Technique**: Automated Collection [(T1119)](https://attack.mitre.org/techniques/T1119/) +- **Mitigation**: Encrypt sensitive data at rest and in transit. Implement access controls and monitor access logs for unauthorized data access attempts. + +### LLM07: Insecure Plugin Design + +- **Tactic**: Persistence [(T1176)](https://attack.mitre.org/tactics/TA0003/) +- **Technique**: Browser Extensions [(T1176)](https://attack.mitre.org/techniques/T1176/) +- **Mitigation**: Follow secure development practices for plugins. Conduct security reviews and vulnerability assessments regularly. + +### LLM08: Excessive Agency + +- **Tactic**: Privilege Escalation [(T1068)](https://attack.mitre.org/tactics/TA0004/) +- **Technique**: Exploitation for Privilege Escalation [(T1068)](https://attack.mitre.org/techniques/T1068/) +- **Mitigation**: Limit LLM decision-making capabilities to those absolutely necessary. Implement oversight and review mechanisms for critical actions. + +### LLM09: Overreliance + +- **Tactic**: Human-operated (Custom Tactic) +- **Technique**: Misuse of Enterprise Tools (Custom Technique) +- **Mitigation**: Educate users on the limitations and proper use of LLMs. Implement checks and balances to ensure human oversight in decision-making processes. + +### LLM10: Model Theft + +- **Tactic**: Exfiltration [(T1041)](https://attack.mitre.org/tactics/TA0010/) +- **Technique**: Exfiltration Over C2 Channel [(T1041)](https://attack.mitre.org/techniques/T1041/) +- **Mitigation**: Secure LLM models with strong access controls and encryption. Monitor for unusual access patterns or data exfiltration attempts. + +Note: The suggested mitigations are based on general cybersecurity best practices and the specific context of MITRE ATT&CK. Organizations should tailor these mitigations to their specific operational environment and the unique challenges posed by the deployment of LLM technologies. diff --git a/data_gathering/mappings/NIST.md b/data_gathering/mappings/NIST.md new file mode 100644 index 00000000..47f5e283 --- /dev/null +++ b/data_gathering/mappings/NIST.md @@ -0,0 +1,80 @@ +# OWASP Top 10 for LLMs Mapped to the NIST Cybersecurity Framework + +This document aligns the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) with the [NIST Cybersecurity Framework's five core functions](https://www.nist.gov/cyberframework): Identify, Protect, Detect, Respond, and Recover. It suggests actions and considerations for addressing these vulnerabilities within the framework's structure. + +## LLM01: Prompt Injection + +- **Identify**: Recognize potential sources and impacts of prompt injection attacks. +- **Protect**: Implement [input validation and sanitization](https://owasp.org/www-community/controls/Input_Validation) to prevent malicious prompt injections. +- **Detect**: Monitor system logs and outputs for anomalies indicative of prompt injection. +- **Respond**: Establish procedures to isolate and mitigate the impact of a detected prompt injection attack. +- **Recover**: Implement measures to restore any corrupted data or systems affected by prompt injection. + +## LLM02: Insecure Output Handling + +- **Identify**: Assess output handling processes for vulnerabilities. +- **Protect**: Utilize [output encoding](https://cheatsheetseries.owasp.org/cheatsheets/Output_Encoding_Cheat_Sheet.html) and implement content security policies. +- **Detect**: Use tools to detect instances of insecure output handling or its consequences. +- **Respond**: Follow a response plan to address and mitigate any damages caused by insecure output handling. +- **Recover**: Restore systems and data integrity following an incident involving insecure output handling. + +## LLM03: Training Data Poisoning + +- **Identify**: Catalog and assess the sources of training data for integrity and security. +- **Protect**: Secure the data supply chain and implement [data validation and filtering](https://owasp.org/www-community/controls/Data_Validation). +- **Detect**: Employ anomaly detection to identify unusual data patterns or inputs. +- **Respond**: Take corrective action to remove poisoned data and adjust model training processes. +- **Recover**: Re-train models with clean, validated data sets. + +## LLM04: Model Denial of Service + +- **Identify**: Evaluate the model and infrastructure for vulnerabilities that could lead to denial of service. +- **Protect**: Implement [rate limiting](https://owasp.org/www-community/controls/Rate_limiting) and resource management controls. +- **Detect**: Monitor for unusual traffic patterns or resource utilization spikes. +- **Respond**: Activate a response plan to mitigate and isolate the denial of service attacks. +- **Recover**: Restore normal operations and service levels. + +## LLM05: Supply-Chain Vulnerabilities + +- **Identify**: Map out and review the security of the supply chain components. +- **Protect**: Ensure secure software development practices and [third-party component validation](https://owasp.org/www-community/controls/Software_Composition_Analysis). +- **Detect**: Monitor for vulnerabilities or incidents related to supply chain components. +- **Respond**: React to supply chain threats or breaches with a coordinated strategy. +- **Recover**: Address and remediate any supply chain related vulnerabilities or compromises. + +## LLM06: Sensitive Information Disclosure + +- **Identify**: Understand the types of sensitive information the model may access or generate. +- **Protect**: Apply [data encryption](https://owasp.org/www-community/controls/Encryption_at_Rest), access controls, and [privacy-enhancing technologies](https://www.nist.gov/privacy-framework). +- **Detect**: Use [data loss prevention (DLP) tools](https://www.nist.gov/cyberframework/cybersecurity-framework-functions/detect) to monitor for unauthorized information disclosure. +- **Respond**: Execute response plans for unauthorized disclosure incidents. +- **Recover**: Take steps to mitigate the impact of the disclosure and prevent recurrence. + +## LLM07: Insecure Plugin Design + +- **Identify**: Assess plugins for security risks and design flaws. +- **Protect**: Adopt [secure plugin development and review practices](https://owasp.org/www-community/controls/Secure_Coding). +- **Detect**: Implement monitoring for exploits targeting plugin vulnerabilities. +- **Respond**: Prepare to quickly address and patch discovered plugin vulnerabilities. +- **Recover**: Recover from plugin-related security incidents by restoring affected systems and data. + +## LLM08: Excessive Agency + +- **Identify**: Define appropriate levels of autonomy and agency for LLMs. +- **Protect**: Limit LLM capabilities to those necessary for their intended functions. +- **Detect**: Monitor for actions indicating excessive agency or unauthorized activities. +- **Respond**: Respond to incidents where LLMs exceed their defined agency limits. +- **Recover**: Adjust LLM configurations and capabilities to prevent future occurrences. + +## LLM09: Overreliance + +- **Identify**: Recognize dependencies on LLMs and potential risks of overreliance. +- **Protect**: Develop policies to ensure balanced use of LLMs within operational processes. +- **Detect**: Observe for signs of overreliance impacting decision-making or performance. +- **Respond**: Address instances of overreliance through training and process adjustment. +- **Recover**: Implement strategies to reduce reliance on LLMs where inappropriate. + +## LLM10: Model Theft + +- **Identify**: Understand the value and sensitivity of the model and its data. +- **Protect**: Secure models with [encryption and access controls](https://owasp.org/www-community/controls/Access_Control_Cheat_Sheet). diff --git a/data_gathering/mappings/OPEN_CRE.md b/data_gathering/mappings/OPEN_CRE.md new file mode 100644 index 00000000..e394d8a9 --- /dev/null +++ b/data_gathering/mappings/OPEN_CRE.md @@ -0,0 +1,49 @@ +# OWASP Top 10 for LLMs Mapped to OPENCRE + +This document outlines a mapping of the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) to cybersecurity practices and controls that are aligned with the [Open Control Requirement Enumeration (OPENCRE)](https://www.opencre.org/). OPENCRE serves as a bridge between various cybersecurity frameworks, enabling the application of a harmonized set of controls to address specific vulnerabilities. + +## OPENCRE-Aligned Cybersecurity Practices + +For each LLM vulnerability, relevant cybersecurity practices that are commonly recognized across multiple frameworks aggregated by OPENCRE are suggested: + +### LLM01: Prompt Injection + +- **Control Implementation**: Implement input validation and encoding controls to prevent malicious input from affecting LLM outputs. This practice is universally recognized across cybersecurity frameworks for mitigating injection vulnerabilities. + +### LLM02: Insecure Output Handling + +- **Data Protection**: Apply secure coding practices to sanitize and properly handle all outputs, preventing data leaks or exposure. Encryption and proper error handling are key components. + +### LLM03: Training Data Poisoning + +- **Data Integrity and Supply Chain Security**: Ensure the integrity of training data through validation, checksums, and secure supply chain practices. Regular audits and supplier assessments can mitigate risks of poisoning. + +### LLM04: Model Denial of Service + +- **Availability and Performance Management**: Implement rate limiting, resource allocation, and performance monitoring controls to protect against denial of service attacks, ensuring availability. + +### LLM05: Supply-Chain Vulnerabilities + +- **Third-Party Risk Management**: Conduct thorough security assessments of third-party vendors and integrate continuous monitoring of supply chain security to address vulnerabilities. + +### LLM06: Sensitive Information Disclosure + +- **Privacy and Data Protection**: Enhance data protection measures, including encryption and access controls, to safeguard sensitive information against unauthorized disclosure. + +### LLM07: Insecure Plugin Design + +- **Secure Development Lifecycle**: Integrate security into the plugin development lifecycle, including threat modeling, code review, and security testing, to ensure plugins are securely designed and implemented. + +### LLM08: Excessive Agency + +- **Ethical AI and Decision Management**: Establish guidelines and controls for ethical AI use, ensuring LLMs do not exceed their intended decision-making capabilities. Regular reviews and audits can ensure compliance. + +### LLM09: Overreliance + +- **Awareness and Training**: Develop and deliver training programs to educate users on the limitations of LLMs, promoting balanced reliance on technology with human oversight. + +### LLM10: Model Theft + +- **Intellectual Property Protection and Access Control**: Secure LLM models as intellectual property, implementing strong access controls and encryption to prevent unauthorized access and theft. + +Note: The application of OPENCRE-aligned cybersecurity practices requires a comprehensive understanding of the organization's cybersecurity posture and the specific risks associated with LLM technologies. Organizations are encouraged to leverage OPENCRE's mapping capabilities to identify and implement the most relevant controls from various cybersecurity frameworks to address these vulnerabilities effectively. diff --git a/data_gathering/mappings/README.md b/data_gathering/mappings/README.md new file mode 100644 index 00000000..f77343eb --- /dev/null +++ b/data_gathering/mappings/README.md @@ -0,0 +1,79 @@ +# OWASP Top 10 for LLMs Mapped to Cybersecurity Frameworks and Standards + +Welcome to the "owaspllmtop10mapping" repository. This repository provides comprehensive mappings of the OWASP Top 10 vulnerabilities for Large Language Models (LLMs) to a range of established cybersecurity frameworks and standards. We aim to offer a resource that helps organizations align their LLM security practices with globally recognized cybersecurity guidelines. +Our baseline and guide is [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) and it's [charter](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/wiki/Charter). + +## Mappings Overview + +This repository includes mappings to the following frameworks and standards: + +1. **[NIST Cybersecurity Framework](https://www.nist.gov/cyberframework)** + - Provides comprehensive guidelines for managing cybersecurity risk. + - A foundational framework for cybersecurity recognized worldwide. + +2. **ISO/IEC Standards** + - [ISO/IEC 27001](https://www.iso.org/standard/54534.html) (Information Security Management) + - [ISO/IEC 20547-4:2020](https://www.iso.org/standard/74438.html) (Big Data Reference Architecture Security and Privacy) + - Crucial for global business compliance and establishing security controls. + +3. **[MITRE ATT&CK](https://attack.mitre.org/)** + - A detailed knowledge base for understanding and defending against cyber attacks. + - Practical for threat modelling and security analysis. + +4. **[CIS Controls](https://www.cisecurity.org/controls/)** + - Developed by the Centre for Internet Security, offering actionable controls. + - Well-regarded for practicality in strengthening cybersecurity defences. + +5. **CVEs and CWEs** + - [Common Vulnerabilities and Exposures (CVEs)](https://cve.mitre.org/) + - [Common Weakness Enumeration (CWEs)](https://cwe.mitre.org/) + - Essential for identifying and cataloging vulnerabilities. + +6. **[FAIR](https://www.fairinstitute.org/what-is-fair)** + - Factor Analysis of Information Risk focuses on risk quantification and management. + - Helps organizations quantify cybersecurity risk in financial terms. + +7. **[STRIDE](https://en.wikipedia.org/wiki/STRIDE_(security))** + - A threat modelling methodology for identifying security threats. + - Often used in the early stages of software development. + +8. **[ENISA](https://www.enisa.europa.eu/)** + - The European Union Agency for Network and Information Security provides broad cybersecurity advice. + - Relevant especially for compliance and best practices in European contexts. + +9. **[ASVS](https://owasp.org/www-project-application-security-verification-standard/)** + - The Application Security Verification Standard, important for web application security. + - Provides a basis for testing and assessing web application security controls. + +10. **[SAMM](https://owaspsamm.org/model/)** + - Software Assurance Maturity Model, useful for integrating security into software development. + - Helps in benchmarking and improving software security practices. + +11. **[MITRE ATLAS](https://atlas.mitre.org/)** + - Focused on adversarial behaviours and may not cover all aspects of cybersecurity management. + - Specific and detailed for threat modelling and analysis. + +12. **[BSIMM](https://www.bsimm.com/)** + - Building Security In Maturity Model, a tool for measuring and improving software security initiatives. + - Best suited for software security practices within organizations. + +13. **[OPENCRE](https://www.opencre.org/)** + - A facilitator for understanding and implementing cybersecurity controls across different standards. + - Acts as a bridge between various frameworks rather than a standalone guide. + +14. **[CycloneDX Machine Learning Software Bill of Materials (SBOM)](https://cyclonedx.org/)** + - Standard that provides advanced supply chain capabilities for cyber risk reduction. + - Standard capable of representing software, hardware, services, and other types of inventory. + +## Contributing + +We welcome contributions from the community. Whether you're adding new mappings, refining existing ones, or providing translations, your input is valuable. Please see our [CONTRIBUTING.md](/CONTRIBUTING.md) file for guidelines on how to contribute. + +## License + +This project is licensed under the [MIT License](/LICENSE). See the LICENSE file for more details. + +## Acknowledgments + +- OWASP Foundation for their continuous efforts in improving web security. +- All contributors who have dedicated their time and effort to enhance this repository. diff --git a/data_gathering/mappings/SAMM.md b/data_gathering/mappings/SAMM.md new file mode 100644 index 00000000..5a2d2321 --- /dev/null +++ b/data_gathering/mappings/SAMM.md @@ -0,0 +1,59 @@ +# OWASP Top 10 for LLMs Mapped to SAMM + +This document outlines the application of the [Software Assurance Maturity Model (SAMM)](https://owaspsamm.org/model/) to address the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#). SAMM provides a comprehensive framework for ensuring software security that can be adapted to mitigate risks associated with LLMs. + +## SAMM Security Practices and LLM Vulnerabilities + +For each LLM vulnerability, relevant SAMM activities and practices are suggested to help mitigate associated risks: + +### LLM01: Prompt Injection + +- **Design**: Incorporate threat modeling to identify and mitigate potential injection points. +- **Implementation**: Enforce input validation and sanitization to prevent prompt injection attacks. + +### LLM02: Insecure Output Handling + +- **Implementation**: Use secure coding practices to encode and safely handle all outputs. +- **Verification**: Perform regular security testing to identify and rectify insecure output handling. + +### LLM03: Training Data Poisoning + +- **Design**: Establish secure design principles that include validation and verification of training data sources. +- **Implementation**: Implement controls to ensure the integrity and security of training data. + +### LLM04: Model Denial of Service + +- **Operations**: Monitor and manage operational loads to prevent denial of service. Implement rate limiting and resource management strategies. +- **Verification**: Conduct performance and security testing to ensure resilience against DoS attacks. + +### LLM05: Supply-Chain Vulnerabilities + +- **Governance**: Develop and enforce policies for secure third-party component use. +- **Design**: Assess the security posture of third-party components and services integrated with LLMs. + +### LLM06: Sensitive Information Disclosure + +- **Design**: Classify data and define controls for handling sensitive information. +- **Implementation**: Apply encryption and access controls to protect sensitive data processed by LLMs. + +### LLM07: Insecure Plugin Design + +- **Design**: Ensure security considerations are integrated into the design of plugins and extensions. +- **Implementation**: Securely develop and maintain plugins, including regular security assessments. + +### LLM08: Excessive Agency + +- **Design**: Define clear boundaries for LLM decision-making capabilities within the system design. +- **Governance**: Establish oversight mechanisms for ethical and secure use of LLM technologies. + +### LLM09: Overreliance + +- **Education & Guidance**: Provide training on the capabilities and limitations of LLMs to prevent overreliance. +- **Governance**: Monitor and evaluate the use of LLM technologies to ensure balanced and secure application. + +### LLM10: Model Theft + +- **Implementation**: Protect LLM intellectual property through robust access controls and encryption. +- **Verification**: Regularly audit and test the security measures in place to protect LLM models from theft. + +Note: The application of SAMM practices to LLM vulnerabilities requires a tailored approach that considers the specific context and use cases of LLM technologies within an organization. Regular assessments and improvements to the software security assurance program are essential to address evolving risks and ensure the secure deployment of LLMs. diff --git a/data_gathering/mappings/STRIDE.md b/data_gathering/mappings/STRIDE.md new file mode 100644 index 00000000..cd871730 --- /dev/null +++ b/data_gathering/mappings/STRIDE.md @@ -0,0 +1,55 @@ +# OWASP Top 10 for LLMs Mapped to STRIDE + +This document maps the [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/#) to the [STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service (DoS), Elevation of Privilege)](https://learn.microsoft.com/en-us/previous-versions/commerce-server/ee823878(v=cs.20)?redirectedfrom=MSDN) threat model categories. STRIDE helps in identifying and categorizing common security threats into six types: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privileges. + +## LLM01: Prompt Injection + +- **STRIDE Category**: Spoofing +- **Description**: Prompt injection can allow an attacker to masquerade as a legitimate user, manipulating the LLM to generate desired outputs. + +## LLM02: Insecure Output Handling + +- **STRIDE Category**: Tampering/Information Disclosure +- **Description**: Insecure handling of outputs can lead to tampering or unintended information disclosure through manipulated outputs. + +## LLM03: Training Data Poisoning + +- **STRIDE Category**: Tampering +- **Description**: Poisoning the training data can tamper with the model's learning process, affecting its outputs and decision-making capabilities. + +## LLM04: Model Denial of Service + +- **STRIDE Category**: Denial of Service +- **Description**: Overloading the model with complex inputs or exploiting vulnerabilities to consume resources can deny service to legitimate users. + +## LLM05: Supply-Chain Vulnerabilities + +- **STRIDE Category**: Tampering +- **Description**: Exploiting vulnerabilities in the supply chain can lead to tampering with the model or its environment. + +## LLM06: Sensitive Information Disclosure + +- **STRIDE Category**: Information Disclosure +- **Description**: This vulnerability can lead to the unintended release of sensitive or confidential information. + +## LLM07: Insecure Plugin Design + +- **STRIDE Category**: Tampering/Elevation of Privilege +- **Description**: Insecure plugins can be exploited to tamper with the model or elevate privileges within the system. + +## LLM08: Excessive Agency + +- **STRIDE Category**: Elevation of Privilege +- **Description**: Giving the model excessive decision-making capabilities could inadvertently elevate its privileges beyond intended limits. + +## LLM09: Overreliance + +- **STRIDE Category**: Repudiation +- **Description**: Overreliance on LLMs without proper oversight or auditing mechanisms can lead to situations where actions cannot be adequately attributed or denied. + +## LLM10: Model Theft + +- **STRIDE Category**: Information Disclosure/Tampering +- **Description**: Stealing a model can lead to information disclosure about its training data or algorithms and could allow for tampering with the model itself. + +Note: The mappings provided are interpretations based on the nature of each vulnerability and how they might be exploited according to the STRIDE model. This exercise highlights the importance of considering various types of threats when assessing the security of LLMs.