Skip to content

gskaggs/pdf-dictate

Repository files navigation

PDF Dictate

A powerful PDF management and editing application with AI-powered assistance. Upload, edit, and fill PDF forms using voice transcription and intelligent suggestions.

PDF Dictate in Action

Features

📄 PDF Management

  • Upload PDFs: Drag and drop or browse to upload PDF documents
  • File Management: View all your PDFs with file sizes and modification dates
  • URL-safe Naming: Automatic renaming for web compatibility

✏️ Advanced PDF Editing

  • Interactive PDF Editor: Powered by NutrientViewer (PSPDFKit) for professional PDF editing
  • Form Field Support: Edit and fill PDF forms seamlessly
  • Real-time Saving: Save your changes directly to the server

🤖 AI-Powered Assistance

  • Voice Transcription: Real-time speech-to-text transcription while editing
  • AI Suggestions: Intelligent form field suggestions based on context and voice input
  • Screen Context: AI analyzes your screen content for better suggestions
  • Smart Fill: Apply AI suggestions with keyboard shortcuts (Cmd/Ctrl + Enter)

🎯 User Experience

  • Responsive Design: Works seamlessly across different screen sizes
  • Real-time Status: Live indicators for recording, AI mode, and connection status
  • Smooth Transitions: Animated panel transitions for optimal workflow
  • Visual Feedback: Color-coded form field updates and status indicators

Getting Started

Prerequisites

  • Node.js 18+
  • npm, yarn, pnpm, or bun

Installation

  1. Clone the repository:
git clone <your-repo-url>
cd pdf-dictate
  1. Install dependencies:
npm install
# or
yarn install
# or
pnpm install
  1. Set up your environment variables (if any API keys are required for AI features)

  2. Run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
  1. Open http://localhost:3000 in your browser

Usage

Basic PDF Management

  1. Upload a PDF: Click "Upload PDF" on the homepage and select your file
  2. View PDFs: Browse your uploaded PDFs in the table view
  3. Edit a PDF: Click the "Edit" button next to any PDF

PDF Editing with AI

  1. Open Editor: Click "Edit" on any PDF to open the interactive editor
  2. Enable AI Mode: Click the "AI Mode" button to activate voice transcription and AI assistance
  3. Grant Permissions: Allow screen sharing and microphone access when prompted
  4. Start Editing: Click on any form field in the PDF
  5. Voice Input: Speak naturally - your voice will be transcribed in real-time
  6. AI Suggestions: The AI will provide intelligent suggestions for form fields
  7. Apply Suggestions: Press Cmd + Enter (Mac) or Ctrl + Enter (Windows/Linux) to apply AI suggestions
  8. Save Changes: Click "Save" to persist your changes

Keyboard Shortcuts

  • Cmd/Ctrl + Enter: Apply AI suggestion to current form field

Technology Stack

  • Frontend: Next.js 14 with React
  • PDF Rendering: NutrientViewer (PSPDFKit Web)
  • UI Components: Custom components with Tailwind CSS
  • Icons: Lucide React
  • Real-time Transcription: WebSocket-based transcription service
  • AI Integration: Custom AI suggestion API
  • File Management: Server-side PDF storage and processing

Project Structure

src/
├── app/
│   ├── page.tsx              # Homepage with PDF management
│   ├── edit/[name]/page.tsx  # PDF editor with AI features
│   ├── layout.tsx            # Root layout with NutrientViewer scripts
│   └── globals.css           # Global styles
├── components/
│   └── ui/                   # Reusable UI components
└── hooks/
    └── useRealtimeTranscription.ts  # Transcription hook

API Endpoints

  • GET /api/pdfs - List all uploaded PDFs
  • GET /api/pdfs/[name] - Serve specific PDF file
  • POST /api/upload - Upload new PDF
  • POST /api/pdfs/[name]/save - Save edited PDF
  • POST /api/suggestions - Get AI suggestions based on context

Features in Detail

AI Mode

When AI Mode is activated:

  • Screen Capture: Captures your screen for visual context
  • Voice Recording: Continuous voice transcription
  • Context Analysis: AI analyzes both visual and audio context
  • Smart Suggestions: Provides relevant suggestions for form fields
  • Visual Feedback: Form fields flash green when filled with AI suggestions

PDF Editor Features

  • Form Field Focus: Automatic detection when form fields are selected
  • Real-time Updates: Immediate visual feedback for changes
  • Professional Tools: Full PDF editing capabilities via NutrientViewer
  • Save Management: Robust saving with error handling and status indicators

Browser Requirements

  • Modern browser with WebRTC support for screen sharing
  • Microphone access for voice transcription
  • Camera/screen sharing permissions for AI context

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

[Add your license information here]

Support

For issues and questions, please open an issue on GitHub.

About

OpenAI Realtime assists you as you fill out a PDF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published