Skip to content

Commit 9727ce7

Browse files
JoeKarowDeanEbysoul-codesKristijanArmeniVEDA95
authored
release: v0.8.0 (#178)
Co-authored-by: DeanEby <[email protected]> Co-authored-by: soul-codes <[email protected]> Co-authored-by: Kristijan Armeni <[email protected]> Co-authored-by: VEDA95 <[email protected]>
1 parent 2c0a451 commit 9727ce7

File tree

99 files changed

+5015
-361
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

99 files changed

+5015
-361
lines changed

.ai-context/README.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Mango Tango CLI - AI Context Documentation
2+
3+
## Repository Overview
4+
5+
**Mango Tango CLI** is a Python terminal-based tool for social media data
6+
analysis and visualization. It provides a modular, extensible architecture
7+
that separates core application logic from analysis modules, ensuring
8+
consistent UX while allowing easy contribution of new analyzers.
9+
10+
### Purpose & Domain
11+
12+
- **Social Media Analytics**: Hashtag analysis, n-gram analysis, temporal
13+
patterns, user coordination
14+
- **Modular Architecture**: Clear separation between data import/export,
15+
analysis, and presentation
16+
- **Interactive Workflows**: Terminal-based UI with web dashboard capabilities
17+
- **Extensible Design**: Plugin-like analyzer system for easy expansion
18+
19+
### Tech Stack
20+
21+
- **Core**: Python 3.12, Inquirer (CLI), TinyDB (metadata)
22+
- **Data**: Polars/Pandas, PyArrow, Parquet files
23+
- **Web**: Dash, Shiny for Python, Plotly
24+
- **Dev Tools**: Black, isort, pytest, PyInstaller
25+
26+
## Semantic Code Structure
27+
28+
### Entry Points
29+
30+
- `mangotango.py` - Main application bootstrap
31+
- `python -m mangotango` - Standard execution command
32+
33+
### Core Architecture (MVC-like)
34+
35+
- **Application Layer** (`app/`): Workspace logic, analysis orchestration
36+
- **View Layer** (`components/`): Terminal UI components using inquirer
37+
- **Model Layer** (`storage/`): Data persistence, project/analysis models
38+
39+
### Domain Separation
40+
41+
1. **Core Domain**: Application, Terminal Components, Storage IO
42+
2. **Edge Domain**: Data import/export (`importing/`), preprocessing
43+
3. **Content Domain**: Analyzers (`analyzers/`), web presenters
44+
45+
### Key Data Flow
46+
47+
1. Import (CSV/Excel) → Parquet → Semantic preprocessing
48+
2. Primary Analysis → Secondary Analysis → Web Presentation
49+
3. Export → User-selected formats (XLSX, CSV, etc.)
50+
51+
## Key Concepts
52+
53+
### Analyzer System
54+
55+
- **Primary Analyzers**: Core data processing (hashtags, ngrams, temporal)
56+
- **Secondary Analyzers**: User-friendly output transformation
57+
- **Web Presenters**: Interactive dashboards using Dash/Shiny
58+
- **Interface Pattern**: Declarative input/output schema definitions
59+
60+
### Context Pattern
61+
62+
Dependency injection through context objects:
63+
64+
- `AppContext`: Application-wide dependencies
65+
- `ViewContext`: UI state and terminal context
66+
- `AnalysisContext`: Analysis execution environment
67+
- Analyzer contexts: File paths, preprocessing, app hooks
68+
69+
### Data Semantics
70+
71+
- Column semantic types guide user in analysis selection
72+
- Preprocessing maps user data to expected analyzer inputs
73+
- Type-safe data models using Pydantic
74+
75+
## Development Patterns
76+
77+
### Code Organization
78+
79+
- Domain-driven module structure
80+
- Interface-first analyzer design
81+
- Context-based dependency injection
82+
- Test co-location with implementation
83+
84+
### Key Conventions
85+
86+
- Black + isort formatting (enforced by pre-commit)
87+
- Type hints throughout (modern Python syntax)
88+
- Parquet for data persistence
89+
- Pydantic models for validation
90+
91+
## Getting Started
92+
93+
### For Development
94+
95+
1. **Setup**: See @.ai-context/setup-guide.md
96+
2. **Architecture**: See @.ai-context/architecture-overview.md
97+
3. **Symbol Reference**: See @.ai-context/symbol-reference.md
98+
4. **Development Guide**: See @docs/dev-guide.md
99+
100+
### For AI Assistants
101+
102+
- **Claude Code users**: See @CLAUDE.md (includes Serena integration)
103+
- **Cursor users**: See @.cursorrules
104+
- **Deep semantic analysis**: Explore @.serena/memories/
105+
106+
### Quick References
107+
108+
- **Commands**: @.serena/memories/suggested_commands.md
109+
- **Style Guide**: @.serena/memories/code_style_conventions.md
110+
- **Task Checklist**: @.serena/memories/task_completion_checklist.md
111+
112+
## External Dependencies
113+
114+
### Data Processing
115+
116+
- `polars` - Primary data processing library
117+
- `pandas` - Secondary support for Plotly integration
118+
- `pyarrow` - Parquet file format support
119+
120+
### Web Framework
121+
122+
- `dash` - Interactive web dashboards
123+
- `shiny` - Python Shiny for modern web UIs
124+
- `plotly` - Visualization library
125+
126+
### CLI & Storage
127+
128+
- `inquirer` - Interactive terminal prompts
129+
- `tinydb` - Lightweight JSON database
130+
- `platformdirs` - Cross-platform data directories
131+
132+
### Development
133+
134+
- `black` - Code formatter
135+
- `isort` - Import organizer
136+
- `pytest` - Testing framework
137+
- `pyinstaller` - Executable building
138+
139+
## Project Status
140+
141+
- **License**: PolyForm Noncommercial License 1.0.0
142+
- **Author**: CIB Mango Tree / Civic Tech DC
143+
- **Branch Strategy**: feature branches → develop → main
144+
- **CI/CD**: GitHub Actions for testing, formatting, builds
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Architecture Overview
2+
3+
## High-Level Component Diagram
4+
5+
```mermaid
6+
flowchart TD
7+
User[User] --> Terminal[Terminal Interface]
8+
Terminal --> App[Application Layer]
9+
App --> Storage[Storage Layer]
10+
11+
App --> Importers[Data Importers]
12+
App --> Preprocessing[Semantic Preprocessor]
13+
App --> Analyzers[Analyzer System]
14+
15+
Importers --> Parquet[(Parquet Files)]
16+
Preprocessing --> Parquet
17+
Analyzers --> Parquet
18+
19+
Analyzers --> Primary[Primary Analyzers]
20+
Analyzers --> Secondary[Secondary Analyzers]
21+
Analyzers --> WebPresenters[Web Presenters]
22+
23+
WebPresenters --> Dash[Dash Apps]
24+
WebPresenters --> Shiny[Shiny Apps]
25+
26+
Storage --> TinyDB[(TinyDB)]
27+
Storage --> FileSystem[(File System)]
28+
```
29+
30+
## Core Abstractions
31+
32+
### Application Layer (`app/`)
33+
34+
Central orchestration and workspace management
35+
36+
Key Classes:
37+
38+
- `App` - Main application controller, orchestrates all operations
39+
- `AppContext` - Dependency injection container for application-wide services
40+
- `ProjectContext` - Project-specific operations and column mapping
41+
- `AnalysisContext` - Analysis execution environment and progress tracking
42+
- `AnalysisOutputContext` - Handles analysis result management
43+
- `AnalysisWebServerContext` - Web server lifecycle management
44+
- `SettingsContext` - Configuration and user preferences
45+
46+
### View Layer (`components/`)
47+
48+
Terminal UI components using inquirer
49+
50+
Key Components:
51+
52+
- `ViewContext` - UI state management and terminal context
53+
- `main_menu()` - Application entry point menu
54+
- `splash()` - Application branding and welcome
55+
- Menu flows: project selection, analysis creation, parameter customization
56+
- Server management: web server lifecycle, export workflows
57+
58+
### Model Layer (`storage/`)
59+
60+
Data persistence and state management
61+
62+
Key Classes:
63+
64+
- `Storage` - Main storage controller, manages projects and analyses
65+
- `ProjectModel` - Project metadata and configuration
66+
- `AnalysisModel` - Analysis metadata, parameters, and state
67+
- `SettingsModel` - User preferences and application settings
68+
- `FileSelectionState` - File picker state management
69+
- `TableStats` - Data statistics and preview information
70+
71+
## Data Flow Architecture
72+
73+
### Import → Analysis → Export Pipeline
74+
75+
```mermaid
76+
sequenceDiagram
77+
participant User
78+
participant Terminal
79+
participant App
80+
participant Importer
81+
participant Preprocessor
82+
participant Analyzer
83+
participant WebServer
84+
85+
User->>Terminal: Select data file
86+
Terminal->>App: Create project
87+
App->>Importer: Import CSV/Excel
88+
Importer->>App: Parquet file path
89+
App->>Preprocessor: Apply column semantics
90+
Preprocessor->>App: Processed data path
91+
User->>Terminal: Configure analysis
92+
Terminal->>App: Run analysis
93+
App->>Analyzer: Execute with context
94+
Analyzer->>App: Analysis results
95+
App->>WebServer: Start dashboard
96+
WebServer->>User: Interactive visualization
97+
```
98+
99+
### Context-Based Dependency Injection
100+
101+
Each layer receives context objects containing exactly what it needs:
102+
103+
```python
104+
# Analyzer Context Pattern
105+
class AnalysisContext:
106+
input_path: Path # Input parquet file
107+
output_path: Path # Where to write results
108+
preprocessing: Callable # Column mapping function
109+
progress_callback: Callable # Progress reporting
110+
parameters: dict # User-configured parameters
111+
112+
class AnalysisWebServerContext:
113+
primary_output_path: Path
114+
secondary_output_paths: list[Path]
115+
dash_app: dash.Dash # For dashboard creation
116+
server_config: dict
117+
```
118+
119+
## Core Domain Patterns
120+
121+
### Analyzer Interface System
122+
123+
Declarative analysis definition
124+
125+
```python
126+
# interface.py
127+
interface = AnalyzerInterface(
128+
input=AnalyzerInput(
129+
columns=[
130+
AnalyzerInputColumn(
131+
name="author_id",
132+
semantic_type=ColumnSemantic.USER_ID,
133+
required=True
134+
)
135+
]
136+
),
137+
outputs=[
138+
AnalyzerOutput(
139+
name="hashtag_analysis",
140+
columns=[...],
141+
internal=False # User-consumable
142+
)
143+
],
144+
params=[
145+
AnalyzerParam(
146+
name="time_window",
147+
param_type=ParamType.TIME_BINNING,
148+
default="1D"
149+
)
150+
]
151+
)
152+
```
153+
154+
### Three-Stage Analysis Pipeline
155+
156+
1. **Primary Analyzers** - Raw data processing
157+
- Input: Preprocessed parquet files
158+
- Output: Normalized analysis results
159+
- Examples: hashtag extraction, n-gram generation, temporal aggregation
160+
161+
2. **Secondary Analyzers** - Result transformation
162+
- Input: Primary analyzer outputs
163+
- Output: User-friendly reports and summaries
164+
- Examples: statistics calculation, trend analysis
165+
166+
3. **Web Presenters** - Interactive visualization
167+
- Input: Primary + secondary outputs
168+
- Output: Dash/Shiny web applications
169+
- Examples: interactive charts, data exploration interfaces
170+
171+
## Integration Points
172+
173+
### External Data Sources
174+
175+
- **CSV Importer**: Handles delimiter detection, encoding issues
176+
- **Excel Importer**: Multi-sheet support, data type inference
177+
- **File System**: Project directory structure, workspace management
178+
179+
### Web Framework Integration
180+
181+
- **Dash Integration**: Plotly-based interactive dashboards
182+
- **Shiny Integration**: Modern Python web UI framework
183+
- **Server Management**: Background process handling, port management
184+
185+
### Export Capabilities
186+
187+
- **XLSX Export**: Formatted Excel files with multiple sheets
188+
- **CSV Export**: Standard comma-separated values
189+
- **Parquet Export**: Native format for data interchange
190+
191+
## Key Architectural Decisions
192+
193+
### Parquet-Centric Data Flow
194+
195+
- All analysis data stored as Parquet files
196+
- Enables efficient columnar operations with Polars
197+
- Provides schema validation and compression
198+
- Facilitates data sharing between analysis stages
199+
200+
### Context Pattern for Decoupling
201+
202+
- Eliminates direct dependencies between layers
203+
- Enables testing with mock contexts
204+
- Allows analyzer development without application knowledge
205+
- Supports different execution environments (CLI, web, testing)
206+
207+
### Domain-Driven Module Organization
208+
209+
- Clear boundaries between core, edge, and content domains
210+
- Enables independent development of analyzers
211+
- Supports plugin-like extensibility
212+
- Facilitates maintenance and testing
213+
214+
### Semantic Type System
215+
216+
- Guides users in column selection for analyses
217+
- Enables automatic data validation and preprocessing
218+
- Supports analyzer input requirements
219+
- Provides consistent UX across different data sources

0 commit comments

Comments
 (0)