Flowbots

Flowbots is an advanced text processing and analysis system that combines the power of ruby-nano-bots, workflow orchestration, and natural language processing to provide a flexible and powerful tool for document analysis and topic modeling.

Features

Text processing workflows for individual files and batch processing
Advanced NLP methods including tokenization, part-of-speech tagging, and named entity recognition
Topic modeling with dynamic model training and inference
Flexible workflow system using Jongleur for task orchestration
Redis-based data persistence using Ohm models
Custom nano-bot cartridges for specialized AI-powered tasks
Robust error handling and logging system

System Architecture

Class Diagram

classDiagram
    class CLI {
        +version()
        +workflows()
        +train_topic_model(folder)
        +process_text(file)
    }

    class Workflows {
        -prompt: TTY::Prompt
        +list_and_select()
        +run(workflow_name)
        -get_workflows()
        -display_workflows(workflows)
        -select_workflow(workflows)
        -extract_workflow_description(file)
    }

    class WorkflowOrchestrator {
        -agents: Map
        +add_agent(role, cartridge_file)
        +define_workflow(workflow_definition)
        +run_workflow()
    }

    class WorkflowAgent {
        -role: String
        -state: Map
        -bot: NanoBot
        +process(input)
        +save_state()
        +load_state()
    }

    class Task {
        <<abstract>>
        +execute()
    }

    class TextProcessingWorkflow {
        -input_file_path: String
        -orchestrator: WorkflowOrchestrator
        +run()
    }

    class TopicModelTrainerWorkflow {
        -input_folder_path: String
        -orchestrator: WorkflowOrchestrator
        +run()
    }

    class TextProcessor {
        <<abstract>>
        +process(text)
    }

    class NLPProcessor {
        -nlp_model: Object
        +process(segment, options)
    }

    class TopicModelProcessor {
        -model_path: String
        -model: Object
        -model_params: Map
        +load_or_create_model()
        +train_model(documents, iterations)
        +infer_topics(document)
    }

    class FileLoader {
        -file_data: Textfile
        +initialize(file_path)
    }

    class Textfile {
        +path: String
        +name: String
        +content: String
        +preprocessed_content: String
        +metadata: Map
        +topics: Set~Topic~
        +segments: List~Segment~
        +lemmas: List~Lemma~
    }

    class Segment {
        +text: String
        +tokens: List
        +tagged: Map
        +words: List~Word~
    }

    class Word {
        +word: String
        +pos: String
        +tag: String
        +dep: String
        +ner: String
    }

    class Topic {
        +name: String
        +description: String
        +vector: List
    }

    CLI --> Workflows : uses
    Workflows --> TextProcessingWorkflow : runs
    Workflows --> TopicModelTrainerWorkflow : runs
    TextProcessingWorkflow --> WorkflowOrchestrator : uses
    TopicModelTrainerWorkflow --> WorkflowOrchestrator : uses
    WorkflowOrchestrator --> WorkflowAgent : manages
    WorkflowOrchestrator --> Task : executes
    Task <|-- FileLoaderTask
    Task <|-- PreprocessTextFileTask
    Task <|-- TextSegmentTask
    Task <|-- TokenizeSegmentsTask
    Task <|-- NlpAnalysisTask
    Task <|-- TopicModelingTask
    Task <|-- LlmAnalysisTask
    Task <|-- DisplayResultsTask
    TextProcessor <|-- NLPProcessor
    TextProcessor <|-- TopicModelProcessor
    NlpAnalysisTask --> NLPProcessor : uses
    TopicModelingTask --> TopicModelProcessor : uses
    FileLoaderTask --> FileLoader : uses
    Textfile "1" *-- "many" Segment
    Segment "1" *-- "many" Word
    Textfile "1" *-- "many" Topic
    Textfile "1" *-- "many" Lemma

Project Structure

The Flowbots project is organized into several key directories:

/lib: Main application code
- /components: Core system components
- /processors: Text and NLP processors
- /tasks: Individual workflow tasks
- /workflows: Workflow definitions
- /ohm: Ohm model definitions
- /utils: Utility functions and classes
/nano-bots/cartridges: Nano-bot cartridge definitions
/test: Test files and test helpers
/log: Log files

Key Components

CLI: The main entry point for user interaction, allowing users to select and run workflows.
WorkflowOrchestrator: Manages the execution of workflows and their constituent tasks.
Task Processors: Specialized classes for text processing, NLP analysis, and topic modeling.
Ohm Models: Data persistence layer for storing document information and workflow states.
NanoBot Integration: Utilizes nano-bot cartridges for specialized AI-powered tasks.
Logging System: Comprehensive logging for debugging and monitoring.

Detailed Operation

1. Workflow Initialization

When a user selects a workflow through the CLI, the system initializes the chosen workflow (e.g., TextProcessingWorkflow or TopicModelTrainerWorkflow). The WorkflowOrchestrator sets up the task graph based on the workflow definition.

2. Task Execution

The WorkflowOrchestrator executes tasks in the defined order. Each task follows a similar pattern:

Retrieve necessary data from Redis or Ohm models.
Process the data using specialized processors (e.g., NLPProcessor, TopicModelProcessor).
Store the results back in Redis (for temporary storage) or Ohm models (for persistence).

3. Data Flow

Redis is used for storing temporary data and passing information between tasks. This includes file IDs, current batch information, and intermediate processing results.
Ohm models, backed by Redis, are used for persistent storage of document information, segments, tokens, and analysis results.

4. NLP and Topic Modeling

The NlpAnalysisTask uses the ruby-spacy gem to perform tasks like tokenization, part-of-speech tagging, and named entity recognition.
The TopicModelingTask uses the tomoto gem to implement topic modeling algorithms.

5. LLM Integration

The LlmAnalysisTask integrates with external language models through the NanoBot system. This allows for high-level analysis and insights generation based on the processed text data.

6. Error Handling and Logging

Each task and the WorkflowOrchestrator include error handling mechanisms. Errors are caught, logged, and in some cases, trigger the ExceptionAgent for detailed error analysis.

7. Batch Processing

For the TopicModelTrainerWorkflow, files are processed in batches. The WorkflowOrchestrator manages the batch state, ensuring all files in a batch are processed before moving to the next batch.

8. Result Presentation

The DisplayResultsTask formats the analysis results and presents them to the user through the CLI. This may include summaries, topic distributions, and insights generated by the LLM.

Key Interactions

CLI <-> WorkflowOrchestrator: The CLI initiates workflow execution and receives final results.
WorkflowOrchestrator <-> Tasks: The orchestrator manages task execution order and handles task results.
Tasks <-> Redis: Tasks use Redis for short-term storage and inter-task communication.
Tasks <-> Ohm Models: Tasks interact with Ohm models for persistent storage of document data and analysis results.
NLP and Topic Modeling Tasks <-> External Libraries: These tasks utilize external Ruby gems for specialized processing.
LlmAnalysisTask <-> NanoBot: This task interacts with the NanoBot system to leverage external language models.

This architecture allows Flowbots to process text data through a series of specialized tasks, each building upon the results of previous tasks, to provide comprehensive text analysis and insights.

Ruby Gems Used in Flowbots

Workflow and Task Management

jongleur: Core component for defining and executing task workflows, providing workflow orchestration and task management capabilities.

Data Persistence

ohm: Object-hash mapping for Redis, used as the data persistence layer for storing document information and workflow states.

Parallel Processing

parallel: Enables parallel processing, with potential use for parallel execution of tasks (not prominently used in the current implementation).

Development and Debugging

pry and pry-stack_explorer: Enhanced REPL and debugging tools for development and debugging purposes.

Natural Language Processing

ruby-spacy: Ruby bindings for the Spacy NLP library, used for Natural Language Processing tasks.
lingua: Provides additional natural language detection and processing capabilities.
pragmatic_segmenter: Used for text segmentation, splitting text into meaningful segments.
pragmatic_tokenizer: Handles text tokenization, breaking text into individual tokens.

Command-Line Interface

thor: Used for building command-line interfaces, specifically for creating the CLI for Flowbots.

Parsing and Data Handling

treetop: A parsing expression grammar (PEG) parser generator, used for custom grammar parsing, particularly for Markdown with YAML front matter.
yaml: Handles YAML parsing and generation, particularly for configuration files and document front matter.

Terminal Output Formatting

tty-box, tty-cursor, tty-prompt, tty-screen, tty-spinner, tty-table: Various terminal output formatting and interaction tools for creating rich command-line interfaces and displaying formatted output.

Topic Modeling

tomoto: Used for implementing topic modeling algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
assets		assets
bin		bin
characters		characters
doc		doc
docker		docker
examples		examples
exception_reports		exception_reports
exe		exe
lib		lib
llm_analysis		llm_analysis
nano-bots @ 0d0fa72		nano-bots @ 0d0fa72
test		test
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.kodiak.toml		.kodiak.toml
.overcommit.yml		.overcommit.yml
.prettierignore		.prettierignore
.rubocop.yml		.rubocop.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
compressed_prompt_test.rb		compressed_prompt_test.rb
docker-compose.yml		docker-compose.yml
finalReportGenerator.yml		finalReportGenerator.yml
final_report.md		final_report.md
flowbots.gemspec		flowbots.gemspec
job.md		job.md
promptCompressor.yml		promptCompressor.yml
promptCompressorAssessment.yml		promptCompressorAssessment.yml
promptCompressorTest.yml		promptCompressorTest.yml
promptCompressorTestEval.yml		promptCompressorTestEval.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flowbots

Features

System Architecture

Class Diagram

Project Structure

Key Components

Detailed Operation

1. Workflow Initialization

2. Task Execution

3. Data Flow

4. NLP and Topic Modeling

5. LLM Integration

6. Error Handling and Logging

7. Batch Processing

8. Result Presentation

Key Interactions

Ruby Gems Used in Flowbots

Workflow and Task Management

Data Persistence

Parallel Processing

Development and Debugging

Natural Language Processing

Command-Line Interface

Parsing and Data Handling

Terminal Output Formatting

Topic Modeling

About

Releases

Packages

Languages

License

b08x/flowbots

Folders and files

Latest commit

History

Repository files navigation

Flowbots

Features

System Architecture

Class Diagram

Project Structure

Key Components

Detailed Operation

1. Workflow Initialization

2. Task Execution

3. Data Flow

4. NLP and Topic Modeling

5. LLM Integration

6. Error Handling and Logging

7. Batch Processing

8. Result Presentation

Key Interactions

Ruby Gems Used in Flowbots

Workflow and Task Management

Data Persistence

Parallel Processing

Development and Debugging

Natural Language Processing

Command-Line Interface

Parsing and Data Handling

Terminal Output Formatting

Topic Modeling

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages