Skip to content

TalEliyahu/Awesome-AI-Security

Repository files navigation

Awesome AI Security Awesome

Curated resources, research, and tools for securing AI systems.


Table of Contents


Best Practices, Frameworks & Controls

Governance & Management Frameworks

Controls & Verification Standards

Top 10s

Scoring & Rating Systems

Testing, Evaluation & Red Teaming

Implementation Guides & Patterns

Agentic Systems (Standards, Governance & Patterns)

Threat Modeling

  • OWASP - Multi-Agentic System Threat Modeling Guide - Applies OWASP’s agentic threat taxonomy to multi-agent systems and demonstrates modeling using the MAESTRO framework with worked examples.
  • AWS - Threat modeling your generative AI workload to evaluate security risk - Practical, four-question approach (what are we working on; what can go wrong; what are we going to do about it; did we do a good enough job) with concrete deliverables: DFDs and assumptions, threat statements using AWS’s threat grammar, mapped mitigations, and validation; includes worked examples and AWS Threat Composer templates.
  • Microsoft - Threat Modeling AI/ML Systems and Dependencies - Practical guidance for threat modeling AI/ML: “Key New Considerations” questions plus a threats→mitigations catalog (adversarial perturbation, data poisoning, model inversion, membership inference, model stealing) based on “Failure Modes in Machine Learning”; meant for security design reviews of products that use or depend on AI/ML.

Critical Infrastructure


Tools

Inclusion criteria (open-source tools): must have 220+ GitHub stars, active maintenance in the last 12 months, and ≥3 contributors.

Prompt-Injection Detection & Mitigation

Detect and stop prompt-injection (direct/indirect) across inputs, context, and outputs; filter hostile content before it reaches tools or models.

  • (none from your current list yet)

Jailbreak & Policy Enforcement (Guardrails)

Enforce safety policies and block jailbreaks at runtime via rules/validators/DSLs, with optional human-in-the-loop for sensitive actions.

Model Artifact Scanners

Analyze serialized model files for unsafe deserialization and embedded code; verify integrity/metadata and block or quarantine on fail.

Agent Tooling and MCP Security

Scan/audit MCP servers & client configs; detect tool poisoning, unsafe flows; constrain tool access with least-privilege and audit trails.

Honeypots & Deception (MCP/LLM)

  • Beelzebub GitHub Repo stars - Beelzebub is a honeypot framework designed to provide a secure environment for detecting and analyzing cyber attacks. It offers a low code approach for easy implementation and uses AI to mimic the behavior of a high-interaction honeypot.

Tool manifest/metadata validators

Servers & Dev tooling

  • PortSwigger - MCP Server GitHub Repo stars
  • ToolHive GitHub Repo stars - MCP server orchestrator for desktop, CLI, and Kubernetes Operator: discover and deploy servers in isolated containers with restricted permissions, manage secrets, use an optional egress proxy, auto-configure popular MCP clients (e.g., GitHub Copilot, Cursor), and manage at scale via CRDs/registry.

Execution Sandboxing for Agent Code

Run untrusted or LLM-triggered code in isolated sandboxes (FS/network/process limits) to contain RCE and reduce blast radius.

  • E2B GitHub Repo stars - SDK + self-hostable infra to run untrusted, LLM-generated code in isolated cloud sandboxes (Firecracker microVMs).

  • microsandbox GitHub Repo stars - self-hosted microVM (libkrun) sandbox for untrusted AI/user code.

Gateways & Policy Proxies

Centralize auth, quotas/rate limits, cost caps, egress/DLP filters, and guardrail orchestration across all model/providers.

  • (none from your current list yet)

Code Review

  • Claude Code Security Reviewer GitHub Repo stars - An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.
  • Vulnhuntr GitHub Repo stars - Vulnhuntr leverages the power of LLMs to automatically create and analyze entire code call chains starting from remote user input and ending at server output for detection of complex, multi-step, security-bypassing vulnerabilities that go far beyond what traditional static code analysis tools are capable of performing.

Red-Teaming Harnesses & Automated Security Testing

Automate attack suites (prompt-injection, leakage, jailbreak, goal-based tasks) in CI; score results and produce regression evidence.

Prompt-injection test suites

Data-leakage/secret-exfil test suites

Jailbreak catalogs & adversarial prompts

Adversarial-robustness (evasion) toolkits

Goal-directed agent attack tasks

CI pipelines & regression gates

  • promptfoo GitHub Repo stars
  • Agentic Radar GitHub Repo stars
  • DeepTeam GitHub Repo stars
  • Buttercup GitHub Repo stars - Trail of Bits’ AIxCC Cyber Reasoning System: runs OSS-Fuzz-style campaigns to find vulns, then uses a multi-agent LLM patcher to generate & validate fixes for C/Java repos; ships SigNoz observability; requires at least one LLM API key.
  • Giskard GitHub Repo stars - Pre-deployment/CI evaluation harness for LLM/RAG: runs scan checks (prompt injection, harmful output, sensitive-information disclosure, robustness), auto-generates RAG evaluation datasets and component scores (retriever, generator, rewriter, router), exports shareable reports, and integrates with CI for regression gates.

Scoring/leaderboards & evidence reports

  • (none from your current list yet)

Supply Chain: AI/ML BOM and Attestation

Generate and verify AI/ML BOMs, signatures, and provenance for models/datasets/dependencies; enforce allow/deny policies.

  • (none from your current list yet)

Vector/Memory Store Security

Harden RAG memory: isolate namespaces, sanitize queries/content, detect poisoning/outliers, and prevent secret/PII retention.

  • (none from your current list yet)

Data/Model Poisoning Defenses

Detect and mitigate dataset/model poisoning and backdoors; validate training/fine-tuning integrity and prune suspicious behaviors.

Sensitive Data Leak Prevention (DLP for AI)

Prevent secret/PII exfiltration in prompts/outputs via detection, redaction, and policy checks at I/O boundaries.

  • Presidio GitHub Repo stars - PII/PHI detection & redaction for text, images, and structured data; use as a pre/post-LLM DLP filter and for dataset sanitization.

Monitoring, Logging & Anomaly Detection

Collect AI-specific security logs/signals; detect abuse patterns (PI/jailbreak/leakage), enrich alerts, and support forensics.

  • LangKit GitHub Repo stars - LLM observability metrics toolkit (whylogs-compatible): prompt-injection/jailbreak similarity, PII patterns, hallucination/consistency, relevance, sentiment/toxicity, readability.

  • Alibi Detect GitHub Repo stars - Production drift/outlier/adversarial detection for tabular, text, images, and time series; online/offline detectors with TF/PyTorch backends; returns scores, thresholds, and flags for alerting.


Attack & Defense Matrices

Matrix-style resources covering adversarial TTPs and curated defensive techniques for AI systems.

Attack

Defense


Checklists


Supply Chain Security

Guidance and standards for securing the AI/ML software supply chain (models, datasets, code, pipelines). Primarily specs and frameworks; includes vetted TPRM templates.

Standards & Specs

Normative formats and specifications for transparency and traceability across AI components and dependencies.

  • OWASP - AI Bill of Materials (AIBOM) GitHub Repo stars - Bill of materials format for AI components, datasets, and model dependencies.

Third-Party Assessment

Questionnaires and templates to assess external vendors, model providers, and integrators for security, privacy, and compliance.

  • FS-ISAC - Generative AI Vendor Evaluation & Qualitative Risk Assessment - Assessment Tool XLSXGuide PDF - Vendor due-diligence toolkit for GenAI: risk tiering by use case, integration and data sensitivity; questionnaires across privacy, security, model development and validation, integration, legal and compliance; auto-generated reporting.

Videos & Playlists

Monthly curated playlists of AI-security talks, demos, incidents, and tooling.


Newsletter

  • Adversarial AI Digest - A digest of AI security research, threats, governance challenges, and best practices for securing AI systems.

Datasets

Dataset indexes & portals

  • Kaggle - Community-contributed datasets (IDS, phishing, malware URLs, incidents).
  • Hugging Face - Search HF datasets tagged/related to cybersecurity and threat intel.
  • SafetyPrompts - living index of LLM safety datasets & evals (jailbreak, prompt injection, toxicity, privacy), with filters and a maintained sheet.
  • Awesome Cybersecurity Datasets GitHub Repo stars

Cybersecurity Skills

Interactive CTFs and self-contained labs for hands-on security skills (web, pwn, crypto, forensics, reversing). Used to assess practical reasoning, tool use, and end-to-end task execution.

CTF Challenges

  • InterCode-CTF GitHub Repo stars - 100 picoCTF challenges (high-school level); categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous. [Dataset+Benchmark] arXiv
  • NYU CTF Bench GitHub Repo stars - 200 CSAW challenges (2017-2023); difficulty very easy → hard; categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous. [Dataset+Benchmark] arXiv
  • CyBench GitHub Repo stars - 40 tasks from HackTheBox, Sekai CTF, Glacier, HKCert (2022-2024); categories: cryptography, web, binary exploitation (pwn), reverse engineering, forensics, miscellaneous; difficulty grounded by first-solve time (FST). [Dataset+Benchmark] arXiv
  • pwn.college CTF Archive GitHub Repo stars - large collection of runnable CTF challenges; commonly used as a source corpus for research. [Dataset]

Secure Code

Detection (classify vulnerable code)
  • Devign / CodeXGLUE-Vul GitHub Repo stars - function-level C vuln detection. [Dataset+Benchmark]

  • DiverseVul GitHub Repo stars - multi-CWE function-level detection (C/C++). [Dataset]

  • Big-Vul GitHub Repo stars - real-world C/C++ detection (often with localization). [Dataset]

  • Py150k HF Downloads - ≈150k Python snippets (GitHub). Static analysis with Bandit, Semgrep, Snyk identified 42,753 vulnerabilities across 26,147 snippets; common CWEs: XSS (18%), SQLi (15%), Improper Input Validation (12%), OS Command Injection (10%), Information Exposure (8%). Collected from GitHub with dedup/fork removal, only parsable code (AST checks, ≤30k nodes), and permissive licenses. Used for: training and fine-tuning (e.g., CodeGen, CodeGen2/2.5, CodeLlama, CrystalCoder, CodeT5+).

Repair & Patch Mining
  • CVEfixes GitHub Repo stars - CVE-linked fix commits for security repair. [Dataset]
  • Also used for repair: Big-Vul (generate minimal diffs, then build + scan).
Runnable / Scanner Evaluation
  • OWASP Benchmark (Java) GitHub Repo stars - runnable Java app with seeded vulns; supports SAST/DAST/IAST evaluation and scoring. [Dataset+Benchmark]
  • Juliet (NIST SARD) (C/C++ mirror GitHub Repo starsJava mirror GitHub Repo stars) - runnable CWE cases for detect → fix → re-test. [Dataset+Benchmark]

Phishing

Phishing dataset gap: there isn’t a public corpus that, per page, stores the URL plus full HTML/CSS/JS, images, favicon, and a screenshot. Most sources are just URL feeds; pages vanish quickly; older benchmarks drift, so models don’t generalize well. Collect a per-URL archive of all page resources, with caveats that screenshots are viewport-only and some assets may be blocked by browser safety.

  • PhishTank - Continuously updated dataset (API/feed); community-verified phishing URLs; labels zero-day phishing; offers webpage screenshots.
  • OpenPhish - Regularly updated phishing URLs with fields such as webpage info, hostname, supported language, IP presence, country code, and SSL certificate; includes brand-target stats.
  • PhreshPhish - 372k HTML–URL samples (119k phishing / 253k benign) with full-page HTML, URLs, timestamps, and brand targets (~185 brands) across 50+ languages; suitable for training and evaluating URL/page-based phishing detection.
  • Phishing.Database - Continuously updated lists of phishing domains/links/IPs (ACTIVE/INACTIVE/INVALID and NEW last hour/today); repo resets daily-download lists; status validated via PyFunceble.
  • UCI – Phishing Websites - 11,055 URLs (phishing and legitimate) with 30 engineered features across URL, content, and third-party signals.
  • Mendeley – Phishing Websites Dataset - Labeled phishing/legitimate samples; provides webpage content (HTML) for each URL.; useful for training/eval.
  • UCI – PhiUSIIL Phishing URL - 235,795 URLs (134,850 legitimate; 100,945 phishing) with 54 URL/content features; labels: Class 1 = legitimate, Class 0 = phishing.
  • MillerSmiles - Large archive of phishing email scams with the URLs used; long-running email corpus (not a live feed).

Cybersecurity Knowledge

Structured Q&A datasets assessing security knowledge and terminology. Used to evaluate factual recall and conceptual understanding.

Secure Coding & Vulnerability Detection

Code snippet datasets labeled as vulnerable or secure, often tied to CWEs (Common Weakness Enumeration). Used to evaluate the model’s ability to recognize insecure code patterns and suggest secure fixes.

  • SecCodePLT HF Downloads

  • Py150k HF Downloads - ≈150k Python files from GitHub (deduped/fork-removed); Static analysis with Bandit, Semgrep, Snyk identified 42,753 vulnerabilities across 26,147 snippets; common CWEs: XSS (18%), SQLi (15%), Improper Input Validation (12%), OS Command Injection (10%), Information Exposure (8%). Collected from GitHub with dedup/fork removal, only parsable code (AST checks, ≤30k nodes), and permissive licenses. Used for: training and fine-tuning (e.g., CodeGen, CodeGen2/2.5, CodeLlama, CrystalCoder, CodeT5+).

Malware Behavior & Dynamic Analysis

  • Avast–CTU Public CAPEv2 Dataset GitHub Repo stars - 48,976 sandbox JSON reports (CAPEv2) across 10 families (Adload, Emotet, HarHar, Lokibot, njRAT, Qakbot, Swisyn, Trickbot, Ursnif, Zeus); per-sample metadata: sha256, family, type (banker, trojan, pws, coinminer, rat, keylogger), detection date. Two versions: Full (~13 GB) and Reduced (~566 MB) keeping behavior.summary + static.pe (avoids label leakage). Used for: behavior-based malware classification & concept-drift studies. - arXiv

Deepfake

Audio (Speech) Deepfakes

  • ASVspoof 5 - train / dev / eval - Train: 8 TTS attacks; Dev: 8 unseen (validation/fusion); Eval: 16 unseen incl. adversarial/codec. Labels: bona-fide / spoofed. arXiv
  • In-the-Wild (ITW) - 58 politicians/celebrities with per-speaker pairing; ≈20.7 h bona-fide + 17.2 h spoofed, scraped from social/video platforms. Labels: bona-fide / spoofed. arXiv
  • MLAAD (+M-AILABS) - Multilingual synthetic TTS corpus (hundreds of hours; many models/languages). Labels: bona-fide (M-AILABS) / spoof (MLAAD). arXiv
  • LlamaPartialSpoof - LLM-driven attacker styles; includes full and partial (spliced) spoofs. Labels: bona-fide / fully-spoofed / partially-spoofed. arXiv
  • Fake-or-Real (FoR) - >195k utterances; four variants: for-original, for-norm, for-2sec, for-rerec. Labels: real / synthetic.
  • CodecFake - codec-based deepfake audio dataset (Interspeech 2024); Labels: real / codec-generated fake. arXiv

Video Deepfakes

Jailbreak

Adversarial prompt datasets-both text-only and multimodal-designed to bypass safety mechanisms or test refusal logic. Used to test how effectively a model resists jailbreaks and enforces policy-based refusal.

  • CySecBench GitHub Repo stars cybersecurity-domain jailbreak dataset with 12,662 close-ended prompts across multiple attack categories; paper introduces an obfuscation-based jailbreaking method and LLM evals.
  • JailBreakV-28K GitHub Repo stars multimodal jailbreak benchmark with ~28k test cases (20k text-based transfer attacks + 8k image-based) to assess MLLM robustness; HF page includes a mini-leaderboard and image types.
  • Do-Not-Answer GitHub Repo stars refusal-evaluation set of 939 “should-refuse” prompts plus an automatic evaluator; answering instead of refusing can be used as a jailbreak-success signal.

Prompt Injection

Public prompt-injection datasets have recurring limitations: partial staleness as models and defenses evolve, CTF skew toward basic instruction following, and label mixing across toxicity, jailbreak roleplay, and true injections that inflates measured true positive rates and distorts evaluation.

  • prompt-injection-attack-datasetHF downloads 3.7k rows pairing benign task prompts with attack variants (naive / escape / ignore / fake-completion / combined). Columns for both target and injected tasks; train split only.
  • prompt-injections-benchmark HF downloads 5,000 prompts labeled jailbreak / benign for robustness evals.
  • prompt_injections HF downloads ~1k short injection prompts; multilingual (EN, FR, DE, ES, IT, PT, RO); single train split; CSV/Parquet.
  • prompt-injection HF downloads Large-scale injection/benign corpus (~327k rows, train/test) for training baselines and detectors.
  • prompt-injection-safety HF downloads 60k rows (train 50k / test 10k); 3-way labels: benign 0, injection 1, harmful request 2; Parquet.

System Prompts

Collections of leaked, official, and synthetic system prompts and paired responses used to study guardrails and spot system prompt exposure. Used to build leakage detectors, craft targeted guardrail tests (consent gates, tool use rules, safety policies), and reproduce vendor behaviors for evaluation.

  • Official_LLM_System_Prompts HF downloads - leaked and date-stamped prompts from proprietary assistants (OpenAI, Anthropic, MS Copilot, GitHub Copilot, Grok, Perplexity); 29 rows.
  • system-prompt-leakage HF downloads - synthetic prompts + responses for leakage detection; train 283,353 / test 71,351 (binary leakage labels).
  • system-prompts-and-models-of-ai-tools GitHub stars - community collection of prompts and internal tool configs for code/IDE agents and apps (Cursor, VSCode Copilot Agent, Windsurf, Devin, v0, etc.); includes a security notice.
  • system_prompts_leaks GitHub stars - collection of extracted system prompts from popular chatbots like ChatGPT, Claude & Gemini
  • leaked-system-prompts GitHub stars - leaked prompts across many services; requires verifiable sources or reproducible prompts for PRs.
  • chatgpt_system_prompt GitHub stars - community collection of GPT system prompts, prompt-injection/leak techniques, and protection prompts.
  • CL4R1T4S GitHub stars - extracted/leaked prompts, guidelines, and tooling references spanning major assistants and agents (OpenAI, Google, Anthropic, xAI, Perplexity, Cursor, Devin, etc.).
  • grok-prompts GitHub stars - official xAI repository publishing Grok’s system prompts for chat/X features (DeepSearch, Ask Grok, Explain, etc.).
  • Prompt-Leakage Finetune GitHub stars - adversarial attack prompts (~1,300) used to instruction-tune refusal to system-prompt extraction (synthetic + Gandalf subset).

Courses & Certifications

Career Pathways

Courses (includes labs)

Professional Certifications (exam-based)


Training

Provider Training Portals

Guided Tracks

CTFs & Challenges

Bespoke


Models

Cybersecurity-Tuned Text Generation

  • segolilylabs/Lily-Cybersecurity-7B-v0.2-GGUF HF downloads - quantized GGUF build of a 7B cybersecurity-tuned chat model.
  • DeepHat/DeepHat-V1-7B HF downloads - 7B cybersecurity-oriented text-generation model.
  • clouditera/secgpt HF downloads - cybersecurity-tuned instruction model (CN/EN) with released weights (variants incl. 1.5B/7B/14B); built on Qwen2.5-Instruct/DeepSeek-R1, Apache-2.0, supports vLLM deployment. GitHub Repo stars
  • ZySec-AI/SecurityLLM HF downloads - cybersecurity-focused chat model (“ZySec-7B”); weights available. Community GGUF quantization exists for llama.cpp.

Domain-Adapted Text LMs (Security / CTI)

Safety / Policy Classifiers (Guardrails & Moderation)

Prompt-Injection & Jailbreak Detection (Classifiers)

Code Security (Code understanding & vuln detection)

Deepfake / Anti-Spoofing (Speech)


Research Working Groups

📌 (More working groups to be added.)


Communities & Social Groups


Benchmarks

Code Security (Generated Code)

Purpose: Evaluates the security of model-generated code using CWE-tagged prompts and static analysis.

  • LLMSecEval GitHub Repo stars - Prompt-based, CWE-mapped security benchmark for code-generation models; generate from each prompt and score with static analysis (e.g., CodeQL / Semgrep / Bandit) to label outputs secure vs. vulnerable and compute per-CWE metrics. Used for: benchmarking generated-code security. arXiv

Adversarial Resilience

Purpose: Evaluates how AI systems withstand adversarial attacks, including evasion, poisoning, and model extraction. Ensures AI remains functional under manipulation.
NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify risks related to adversarial attacks.
  • Manage: Implement mitigation strategies to ensure resilience.

Autonomous Pentesting & Exploit Generation

AutoPenBench GitHub Repo stars - 33 tasks: 22 in-vitro fundamentals (incl. 4 crypto) + 11 real-world CVEs for autonomous pentesting evaluation. arXivBest for: controlled, task-based coverage across fundamentals and known CVEs (repeatable, fine-grained scoring).

AI-Pentest-Benchmark GitHub Repo stars - 13 full vulnerable VMs (from VulnHub), 152 subtasks across Recon (72), Exploit (44), PrivEsc (22), and General (14), for end-to-end recon → exploit → privesc benchmarking. arXivBest for: realistic, end-to-end machine takeovers stressing planning, tool use, and multi-step reasoning.

CVE-Bench GitHub Repo stars - 40 real-world web CVEs in dockerized apps; evaluates agent-driven exploit generation/execution. arXivBest for: focused testing of exploitability against real CVEs (web).

NYU CTF Bench GitHub Repo stars - 200 dockerized CSAW challenges (web, pwn, rev, forensics, crypto, misc.) for skill-granular agent evaluation. arXivBest for: CTF-style, per-skill assessment and tool-use drills.

Agent Misuse & Harm InductioN

AgentHarm HF downloads human-authored harmful agent tasks for tool-using agents with benign counterparts, synthetic proxy tools, and a reproducible scoring harness; 110 base tasks (440 with augmentation), 11 categories, 104 tools. arXivBest for: measuring refusal vs completion on multi-step tool use and the impact of jailbreaks.

Purple Llama – CyberSecEval GitHub Repo stars - evaluates models’ propensity to assist cyber-offense (exploit/malware) and to generate insecure code; graded-risk tasks with a reproducible harness. Best for: dangerous-capability / misuse-risk scoring (text/IDE, non-agent).

Prompt Injection & Jailbreak Detection

Purpose: Evaluates resistance to prompt-injection and jailbreak attempts in chat/RAG/agent contexts.
NIST AI RMF Alignment: Measure, Manage

  • Lakera PINT Benchmark GitHub Repo stars Prompt-injection benchmark with a curated multilingual test suite, explicit categories (injections, jailbreaks, hard negatives, benign chats/docs), and a reproducible scoring harness (PINT score + notebooks) for fair detector comparison and regression tracking.

  • JailbreakBench GitHub Repo stars standardized jailbreak prompts + scoring harness; measures refusal/compliance and jailbreak success across models and settings.

Model & Data Integrity

Purpose: Assesses AI models for unauthorized modifications, including backdoors and dataset poisoning. Supports trustworthiness and security of model outputs.
NIST AI RMF Alignment: Map, Measure

  • Map: Understand and identify risks to model/data integrity.

  • Measure: Evaluate and mitigate risks through validation techniques.

  • CVE-Bench - @uiuc-kang-lab GitHub Repo stars - How well AI agents can exploit real-world software vulnerabilities that are listed in the CVE database.

Governance & Compliance

Purpose: Ensures AI security aligns with governance frameworks, industry regulations, and security policies. Supports auditability and risk management.
NIST AI RMF Alignment: Govern

  • Govern: Establish policies, accountability structures, and compliance controls.

Privacy & Data Protection

Purpose: Evaluates AI for risks like data leakage, membership inference, and model inversion. Helps ensure privacy preservation and compliance.
NIST AI RMF Alignment: Measure, Manage

  • Measure: Identify and assess AI-related privacy risks.
  • Manage: Implement security controls to mitigate privacy threats.

Explainability & Trustworthiness

Purpose: Assesses AI for transparency, fairness, and bias mitigation. Ensures AI operates in an interpretable and ethical manner.
NIST AI RMF Alignment: Govern, Map, Measure

  • Govern: Establish policies for fairness, bias mitigation, and transparency.
  • Map: Identify potential explainability risks in AI decision-making.
  • Measure: Evaluate AI outputs for fairness, bias, and interpretability.

Incident Response

Incident Repositories, Trackers & Monitors

Guides & Playbooks

Regulatory Incident Reporting


Reports and Research

Vendor Reports

Research Papers

Research Feed

Industry Alliance & Nonprofit Reports

📌 (More to be added - A collection of AI security reports, white papers, and academic studies.)


Foundations: Glossary, SoK/Surveys & Taxonomies

(Core references and syntheses for orientation and shared language.)

Glossary

(Authoritative definitions for AI/ML security, governance, and risk-use to align terminology across docs and reviews.)

SoK & Surveys

(Systematizations of Knowledge (SoK), surveys, systematic reviews, and mapping studies.)

Taxonomy

(Reusable classification schemes-clear dimensions, categories, and labeling rules for attacks, defenses, datasets, and risks.)


Podcasts

  • The MLSecOps Podcast - Insightful conversations with industry leaders and AI experts, exploring the fascinating world of machine learning security operations.

Market Landscape

Curated market maps of tools and vendors for securing LLM and agentic AI applications across the lifecycle.


Startups Blogs

A curated list of startups securing agentic AI applications, organized by the OWASP Agentic AI lifecycle (Scope & Plan → Govern). Each company appears once in its best-fit stage based on public positioning, and links point to blog/insights for deeper context. Some startups span multiple stages; placements reflect primary focus.

Inclusion criteria

  1. Startup has not been acquired
  2. Has an active blog
  3. Has an active GitHub organization/repository

Scope & Plan

Design-time security: non-human identities, agent threat modeling, privilege boundaries/authn, and memory scoping/isolation.

no startups here with active blog and active GitHub account

Develop & Experiment

Secure agent loops and tool use; validate I/O contracts; embed policy hooks; test resilience during co-engineering.

no startups here with active blog and active GitHub account

Augment & Fine-Tune Data

Sanitize/trace data and reasoning; validate alignment; protect sensitive memory with privacy controls before deployment.

Test & Evaluate

Adversarial testing for goal drift, prompt injection, and tool misuse; red-team sims; sandboxed calls; decision validation.

Release

Sign models/plugins/memory; verify SBOMs; enforce cryptographically validated policies; register agents/capabilities.

no startups here with active blog and active GitHub account

Deploy

Zero-trust activation: rotate ephemeral creds, apply allowlists/LLM firewalls, and fine-grained least-privilege authorization.

Operate

Monitor memory mutations for drift/poisoning, detect abnormal loops/misuse, enforce HITL overrides, and scan plugins-continuous, real-time vigilance for resilient operations as systems scale and self-orchestrate.

Monitor

Correlate agent steps/tools/comms; detect anomalies (e.g., goal reversal); keep immutable logs for auditability.

Govern

Enforce role/task policies, version/retire agents, prevent privilege creep, and align evidence with AI regulations.


Related Awesome Lists


Common Acronyms

Acronym Full Form
AI Artificial Intelligence
AGI Artificial General Intelligence
ALBERT A Lite BERT
AOC Area Over Curve
ASR Attack Success Rate
BERT Bidirectional Encoder Representations from Transformers
BGMAttack Black-box Generative Model-based Attack
CBA Composite Backdoor Attack
CCPA California Consumer Privacy Act
CNN Convolutional Neural Network
CoT Chain-of-Thought
DAN Do Anything Now
DFS Depth-First Search
DNN Deep Neural Network
DPO Direct Preference Optimization
DP Differential Privacy
FL Federated Learning
GA Genetic Algorithm
GDPR General Data Protection Regulation
GPT Generative Pre-trained Transformer
GRPO Group Relative Policy Optimization
HIPAA Health Insurance Portability and Accountability Act
ICL In-Context Learning
KL Kullback-Leibler Divergence
LAS Leakage-Adjusted Simulatability
LM Language Model
LLM Large Language Model
Llama Large Language Model Meta AI
LoRA Low-Rank Adapter
LRM Large Reasoning Model
MCTS Monte-Carlo Tree Search
MIA Membership Inference Attack
MDP Masking-Differential Prompting
MLM Masked Language Model
MLLM Multimodal Large Language Model
MLRM Multimodal Large Reasoning Model
MoE Mixture-of-Experts
NLP Natural Language Processing
OOD Out Of Distribution
ORM Outcome Reward Model
PI Prompt Injection
PII Personally Identifiable Information
PAIR Prompt Automatic Iterative Refinement
PLM pre-trained Language Model
PRM Process Reward Model
QA Question-Answering
RAG Retrieval-Augmented Generation
RL Reinforcement Learning
RLHF Reinforcement Learning from Human Feedback
RLVR Reinforcement Learning with Verifiable Reward
RoBERTa Robustly optimized BERT approach
SCM Structural Causal Model
SGD Stochastic Gradient Descent
SOTA State of the Art
TAG Gradient Attack on Transformer-based Language Models
VR Verifiable Reward
XLNet Transformer-XL with autoregressive and autoencoding pre-training

Contributing

Contributions are welcome! If you have new resources, tools, or insights to add, feel free to submit a pull request.

This repository follows the Awesome Manifesto guidelines.


License

License: MIT

© 2025 Tal Eliyahu. Licensed under the MIT License. See LICENSE.

About

Curated resources, research, and tools for securing AI systems

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •