📊 LLM4RE-Datasets

A comprehensive replication package for browsing and exploring datasets used in Large Language Models for Requirements Engineering research.

📋 Overview

This project provides a systematic collection and web-based interface for browsing datasets employed in Large Language Models for Requirements Engineering (LLM4RE) research. The replication package includes the complete methodology for dataset collection and characterization, enabling researchers to reproduce and extend the findings.

🏗️ Project Structure

📁 Root Directory

index.html - Main entry point for the web application
dashboard.html - Interactive analytics dashboard with advanced filtering capabilities
script.js - JavaScript functionality for dataset browsing and filtering
styles.css - Cascading Style Sheets for web application styling
LICENSE - Project licensing information

📚 Literature Review (`literature-review/`)

Contains the systematic literature review artifacts:

publications.xlsx - Comprehensive database of reviewed publications
datasets.csv - Extracted dataset information from literature review
ORKG comparative.csv - Comparative analysis with Open Research Knowledge Graph

💾 Data (`data/`)

Processed and curated dataset information:

datasets - datasets.csv - Main dataset catalog with comprehensive metadata including:
- Dataset identification codes and names
- Descriptions and references
- Temporal information (publication years)
- Licensing and artifact type classifications
- Requirements Engineering stage and task categorizations
- Domain-specific classifications
- Size metrics and language specifications
- Label information and publication mappings

📈 Figures (`figures/`)

Generated visualizations and analytical charts:

dataset_distribution_stacked.png - Stacked distribution of datasets across categories
dataset_distribution_stacked_high_res.png - High-resolution version of distribution chart
re_stage_task_bubble.png - Bubble chart mapping RE stages to tasks
year_dataset_line.png - Temporal analysis of dataset publication trends

🔧 Scripts (`script/`)

Analysis and visualization generation tools:

plot.py - Python script for generating statistical visualizations and analytical charts

✨ Features

🔍 Interactive Dataset Browser - Comprehensive search and filtering capabilities
📊 Analytics Dashboard - Dynamic visualizations and statistical analysis
🏷️ Advanced Filtering - Multi-dimensional filtering by license, artifact type, granularity, RE stage, task, domain, language, and year
📈 Data Visualization - Statistical charts and trend analysis
🔗 Reference Integration - Direct links to original publications and dataset sources

🌐 Public Access

A publicly accessible version of this dataset collection is available at: https://nlp4se.github.io/LLM4RE-Datasets/

🛠️ Development

AI-Assisted Development: This replication package was developed with the assistance of AI tools, specifically utilizing Claude (Anthropic) for code generation and implementation support. The methodology and data collection processes remain fully transparent and reproducible.

📄 License

This project is licensed under the terms specified in the LICENSE file. Please refer to the license file for detailed usage permissions and restrictions.

📞 Contact

Primary Contact: Quim Motger
Email: joaquim.motger (at) upc (dot) edu
Affiliation: GESSI Research Group, Universitat Politècnica de Catalunya (UPC)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 LLM4RE-Datasets

📋 Overview

🏗️ Project Structure

📁 Root Directory

📚 Literature Review (`literature-review/`)

💾 Data (`data/`)

📈 Figures (`figures/`)

🔧 Scripts (`script/`)

✨ Features

🌐 Public Access

🛠️ Development

📄 License

📞 Contact

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
figures		figures
literature-review		literature-review
script		script
.gitignore		.gitignore
404.html		404.html
LICENSE		LICENSE
README.md		README.md
dashboard.html		dashboard.html
index.html		index.html
script.js		script.js
styles.css		styles.css

License

nlp4se/LLM4RE-Datasets

Folders and files

Latest commit

History

Repository files navigation

📊 LLM4RE-Datasets

📋 Overview

🏗️ Project Structure

📁 Root Directory

📚 Literature Review (literature-review/)

💾 Data (data/)

📈 Figures (figures/)

🔧 Scripts (script/)

✨ Features

🌐 Public Access

🛠️ Development

📄 License

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

📚 Literature Review (`literature-review/`)

💾 Data (`data/`)

📈 Figures (`figures/`)

🔧 Scripts (`script/`)

Packages