Skip to content

Conversation

@gbsierra
Copy link
Contributor

@gbsierra gbsierra commented Nov 2, 2025

PDF Tool Implementation - Issue #172

Summary

Adds PDF extraction capabilities to the BrowserOS agent, enabling processing of PDF documents.

New PDF Capabilities

Raw Extraction

  • format={metadata: true} → Document info (title, author, page count)
  • format={text: true} → Raw text from pages
  • format={find: {query: "search"}} → Text search with page locations
  • format={outline: true} → Table of contents

AI-Powered Extraction

  • format={companies: [], ceos: []} → Custom structured data
  • Uses task parameter describing what to extract

Page Targeting

  • page: [1, 3, 5] → Specific pages
  • pages: "all" → Entire document
  • pages: {start: 1, end: 10} → Page ranges

Agent Usage Examples

// Metadata extraction
{ format: { metadata: true } }

// Text search
{ format: { find: { query: "machine learning" } } }

// AI-powered extraction
{
  format: { companies: [], ceos: [] },
  task: "Extract company names and CEOs from this report"
}

Architecture Overview

PDF processing occurs in the sidepanel due to Chrome extension restrictions.

Agent → PdfExtractTool → Sidepanel → PDF Services → Response

Key Components:

  • PdfExtractTool: Agent-facing interface that validates inputs, resolves PDF URLs, sends processing requests to the sidepanel, and returns results—ensures seamless integration with the agent's tool system.
  • PdfRequestHandler: Sidepanel message handler that receives requests, manages execution-scoped caching (max 3 PDFs to avoid reinitialization), and delegates to processing services—optimizes performance by preventing redundant PDF loads.
  • PdfProcessingService: Central orchestrator that loads PDFs via PDF.js, parses page parameters, routes to raw or AI extraction modes, and assembles responses—handles the core workflow logic to keep operations modular and error-resistant.
  • PdfService: Low-level PDF loader using PDF.js to fetch and parse documents from URLs—isolates PDF.js dependencies for security and reusability across extractions.
  • PdfExtractionService: Specialized handler for raw operations (text, search, outline) using PDF.js page proxies—performs efficient, non-AI extractions with built-in limits (e.g., 50 pages) to prevent resource overuse.

File Structure and Flow

PDF Tool Architecture:
├── PdfExtractTool (src/lib/tools/PdfExtract.ts)
│   └── Sends PDF_PARSE_REQUEST via chrome.runtime.sendMessage
│
├── Sidepanel Message Handler (src/sidepanel/hooks/useMessageHandler.ts)
│   └── Receives PDF_PARSE_REQUEST, delegates to PdfRequestHandler
│
├── PdfRequestHandler (src/lib/services/PdfRequestHandler.ts)
│   ├── Manages execution-scoped PDF caching (max 3 PDFs per execution)
│   └── Coordinates processing with PdfProcessingService
│
├── PdfProcessingService (src/lib/services/PdfProcessingService.ts)
│   ├── Orchestrates PDF operations
│   ├── Applies 50-page performance limits
│   └── Routes to specialized services based on format
│
├── PdfService (src/lib/services/PdfService.ts)
│   ├── Loads PDF documents using PDF.js v5.4.296
│   ├── Configures worker for extension environment
│   └── Extracts metadata
│
└── PdfExtractionService (src/lib/services/PdfExtractionService.ts)
    ├── Text extraction from pages
    ├── Text search functionality
    └── Outline/bookmarks extraction

Files Changed

New Files

  • src/lib/tools/PdfExtract.ts - Main PDF extraction tool
  • src/lib/services/PdfRequestHandler.ts - Request handling & caching
  • src/lib/services/PdfProcessingService.ts - Processing orchestration
  • src/lib/services/PdfService.ts - PDF.js core operations
  • src/lib/services/PdfExtractionService.ts - Text/search/outline extraction
  • src/lib/types/pdf.ts - PDF type definitions
  • src/lib/utils/PdfPageUtils.ts - Page parameter utilities

Modified Files

  • src/sidepanel/hooks/useMessageHandler.ts - Added PDF request routing
  • src/lib/tools/index.ts - Added PdfExtract export
  • src/lib/types/messaging.ts - Added PDF message types
  • package.json - Added pdfjs-dist dependency
  • src/lib/agent/BrowserAgent.prompt.ts - Added PDF tool descriptions to prompts
  • src/lib/agent/BrowserAgent.ts - Integrated PDF tool into agent logic
  • src/lib/execution/Execution.ts - Added PDF execution handling
  • src/lib/runtime/ExecutionContext.ts - Updated context for PDF operations
  • webpack.config.js - Configured PDF.js worker and build settings

Dependencies

  • pdfjs-dist: ^5.4.296 - PDF processing library (added to package.json)

Demo Videos

Video 1: Demo (2x speed, 720p)

pdftool_2x.mp4

Things to consider:

  • PDF tool currently requires side panel to be open (it will open, if not). Consider ways around chrome extension limitations to run pdf tool in background.
  • Consider parallelizing PDF tool actions
  • Currently limits include: 50 pages, 3 pdfs cached
  • Consider security and privacy implications such as prompt injection

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds comprehensive PDF extraction capabilities to the BrowserOS agent using PDF.js v5.4.296. The implementation provides both cost-effective raw extraction (metadata, text, search, outline) and AI-powered structured data extraction.

Architecture

The PDF tool operates in the sidepanel due to Chrome extension limitations. The flow is: PdfExtractTool → Chrome messaging → PdfRequestHandlerPdfProcessingService → PDF.js services. Key features include:

  • Execution-scoped caching: Caches up to 3 PDFs per execution to avoid reloading
  • 50-page performance limit: Prevents resource exhaustion on large documents
  • Dual extraction modes: Raw (no LLM cost) and AI-powered (structured data)
  • Chrome PDF viewer URL resolution: Handles chrome-extension:// URLs correctly

Key Issues Found

  1. Logic Error (src/lib/services/PdfProcessingService.ts:69-71): disableWorker: true option passed to retry logic but PdfService.loadDocument() doesn't accept this parameter. The corrupted PDF fallback won't work as designed.

  2. Style Violations: Excessive comments throughout violate CLAUDE.md style guide (lines 31-43). Files like PdfExtract.ts contain multi-paragraph block comments explaining obvious operations. Should be condensed to brief section separators.

  3. Debug Logging: Multiple console.log statements left after debugging across 5 files. Should be removed per custom rule about excessive logging.

Strengths

  • Well-structured service layer with clear separation of concerns
  • Comprehensive error handling with graceful degradation
  • Proper type definitions using Zod schemas
  • Good security practices (disables eval, proper URL validation)
  • Cache cleanup on execution end prevents memory leaks
  • Excellent agent prompt documentation with usage examples

Confidence Score: 4/5

  • Safe to merge with minor fixes needed for logging cleanup and comment style
  • Score of 4 reflects solid implementation with one logic bug (disableWorker parameter) and style violations (excessive comments, debug logging). Core functionality is sound with proper error handling, caching, and security practices. No critical issues blocking merge.
  • src/lib/services/PdfProcessingService.ts requires fix for disableWorker parameter. src/lib/tools/PdfExtract.ts needs comment cleanup to match style guide.

Important Files Changed

File Analysis

Filename Score Overview
src/lib/tools/PdfExtract.ts 4/5 PDF extraction tool with comprehensive error handling, proper URL resolution, and cross-process messaging. Code is well-documented but has excessive comments per style guide.
src/lib/services/PdfRequestHandler.ts 4/5 Message routing and caching logic with execution-scoped cache management. Solid implementation with good logging.
src/lib/services/PdfProcessingService.ts 4/5 Orchestrates PDF operations with 50-page limits for performance. Has retry logic for corrupted PDFs but disableWorker fallback parameter isn't used correctly.
src/lib/services/PdfService.ts 5/5 Core PDF.js wrapper with proper worker configuration and metadata extraction. Clean implementation.
src/sidepanel/hooks/useMessageHandler.ts 4/5 Integrates PDF handler into message routing system with singleton pattern. Proper cache clearing on execution end.
src/lib/agent/BrowserAgent.prompt.ts 4/5 Comprehensive PDF tool documentation in agent prompt with usage examples and cost awareness guidance. Well-structured instructions.

Sequence Diagram

sequenceDiagram
    participant Agent as BrowserAgent
    participant Tool as PdfExtractTool
    participant Runtime as Chrome Runtime
    participant Sidepanel as Sidepanel Handler
    participant Handler as PdfRequestHandler
    participant Processing as PdfProcessingService
    participant PDFjs as PDF.js Services
    
    Agent->>Tool: Execute pdf_extract(format, pages)
    Tool->>Tool: Get current page URL
    Tool->>Tool: Resolve PDF viewer URL
    Tool->>Runtime: Open sidepanel if needed
    Note over Runtime: Wait 500ms for init
    
    Tool->>Runtime: sendMessage(PDF_PARSE_REQUEST)
    Note over Tool: Set up response listener
    
    Runtime->>Sidepanel: Route message
    Sidepanel->>Handler: handleRequest(message)
    
    alt Cache Hit
        Handler->>Handler: Get cached PDF from executionId map
        Handler->>Processing: processRequest(request, cachedDoc)
    else Cache Miss
        Handler->>Processing: processRequest(request)
        Processing->>PDFjs: PdfService.loadDocument(url)
        PDFjs-->>Processing: PDFDocumentProxy
        Handler->>Handler: Cache document (max 3 per execution)
    end
    
    Processing->>Processing: Parse page parameters
    Processing->>Processing: Apply 50-page limit if needed
    
    alt Raw Extraction
        Processing->>PDFjs: PdfExtractionService.extractText()
        PDFjs-->>Processing: ExtractedPage[]
    else AI Extraction
        Processing->>PDFjs: PdfExtractionService.extractText()
        PDFjs-->>Processing: Raw text pages
        Note over Processing: Text returned to Tool for LLM
    end
    
    Processing-->>Handler: PdfParseResponse
    Handler->>Runtime: sendMessage(PDF_PARSE_RESPONSE)
    
    Runtime->>Tool: Deliver response via listener
    Tool->>Tool: Match requestId
    
    alt Raw Mode
        Tool-->>Agent: Return formatted text/metadata
    else AI Mode
        Tool->>Agent: Invoke LLM with PDF content
        Agent->>Tool: LLM structured extraction
        Tool-->>Agent: Return structured data
    end
    
    Note over Agent: Execution completes
    Agent->>Runtime: sendMessage(PDF_CLEAR_CACHE)
    Runtime->>Sidepanel: Route clear message
    Sidepanel->>Handler: clearCache(executionId)
    Handler->>Handler: Remove cached PDFs for execution
Loading

16 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +69 to +71
doc = await this.pdfService.loadDocument(request.url, {
disableWorker: true // Try without worker for corrupted PDFs
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: disableWorker: true option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept disableWorker in options - it only uses isEvalSupported. This retry logic won't work as intended for corrupted PDFs.

Suggested change
doc = await this.pdfService.loadDocument(request.url, {
disableWorker: true // Try without worker for corrupted PDFs
})
doc = await this.pdfService.loadDocument(request.url, {
isEvalSupported: false // Retry with explicit isEvalSupported false
})
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/services/PdfProcessingService.ts
Line: 69:71

Comment:
**logic:** `disableWorker: true` option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept `disableWorker` in options - it only uses `isEvalSupported`. This retry logic won't work as intended for corrupted PDFs.

```suggestion
              doc = await this.pdfService.loadDocument(request.url, {
                isEvalSupported: false  // Retry with explicit isEvalSupported false
              })
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant