implement new PDF tool for agent #173

gbsierra · 2025-11-02T18:12:37Z

PDF Tool Implementation - Issue #172

Summary

Adds PDF extraction capabilities to the BrowserOS agent, enabling processing of PDF documents.

New PDF Capabilities

Raw Extraction

format={metadata: true} → Document info (title, author, page count)
format={text: true} → Raw text from pages
format={find: {query: "search"}} → Text search with page locations
format={outline: true} → Table of contents

AI-Powered Extraction

format={companies: [], ceos: []} → Custom structured data
Uses task parameter describing what to extract

Page Targeting

page: [1, 3, 5] → Specific pages
pages: "all" → Entire document
pages: {start: 1, end: 10} → Page ranges

Agent Usage Examples

// Metadata extraction
{ format: { metadata: true } }

// Text search
{ format: { find: { query: "machine learning" } } }

// AI-powered extraction
{
  format: { companies: [], ceos: [] },
  task: "Extract company names and CEOs from this report"
}

Architecture Overview

PDF processing occurs in the sidepanel due to Chrome extension restrictions.

Agent → PdfExtractTool → Sidepanel → PDF Services → Response

Key Components:

PdfExtractTool: Agent-facing interface that validates inputs, resolves PDF URLs, sends processing requests to the sidepanel, and returns results—ensures seamless integration with the agent's tool system.
PdfRequestHandler: Sidepanel message handler that receives requests, manages execution-scoped caching (max 3 PDFs to avoid reinitialization), and delegates to processing services—optimizes performance by preventing redundant PDF loads.
PdfProcessingService: Central orchestrator that loads PDFs via PDF.js, parses page parameters, routes to raw or AI extraction modes, and assembles responses—handles the core workflow logic to keep operations modular and error-resistant.
PdfService: Low-level PDF loader using PDF.js to fetch and parse documents from URLs—isolates PDF.js dependencies for security and reusability across extractions.
PdfExtractionService: Specialized handler for raw operations (text, search, outline) using PDF.js page proxies—performs efficient, non-AI extractions with built-in limits (e.g., 50 pages) to prevent resource overuse.

File Structure and Flow

PDF Tool Architecture:
├── PdfExtractTool (src/lib/tools/PdfExtract.ts)
│   └── Sends PDF_PARSE_REQUEST via chrome.runtime.sendMessage
│
├── Sidepanel Message Handler (src/sidepanel/hooks/useMessageHandler.ts)
│   └── Receives PDF_PARSE_REQUEST, delegates to PdfRequestHandler
│
├── PdfRequestHandler (src/lib/services/PdfRequestHandler.ts)
│   ├── Manages execution-scoped PDF caching (max 3 PDFs per execution)
│   └── Coordinates processing with PdfProcessingService
│
├── PdfProcessingService (src/lib/services/PdfProcessingService.ts)
│   ├── Orchestrates PDF operations
│   ├── Applies 50-page performance limits
│   └── Routes to specialized services based on format
│
├── PdfService (src/lib/services/PdfService.ts)
│   ├── Loads PDF documents using PDF.js v5.4.296
│   ├── Configures worker for extension environment
│   └── Extracts metadata
│
└── PdfExtractionService (src/lib/services/PdfExtractionService.ts)
    ├── Text extraction from pages
    ├── Text search functionality
    └── Outline/bookmarks extraction

Files Changed

New Files

src/lib/tools/PdfExtract.ts - Main PDF extraction tool
src/lib/services/PdfRequestHandler.ts - Request handling & caching
src/lib/services/PdfProcessingService.ts - Processing orchestration
src/lib/services/PdfService.ts - PDF.js core operations
src/lib/services/PdfExtractionService.ts - Text/search/outline extraction
src/lib/types/pdf.ts - PDF type definitions
src/lib/utils/PdfPageUtils.ts - Page parameter utilities

Modified Files

src/sidepanel/hooks/useMessageHandler.ts - Added PDF request routing
src/lib/tools/index.ts - Added PdfExtract export
src/lib/types/messaging.ts - Added PDF message types
package.json - Added pdfjs-dist dependency
src/lib/agent/BrowserAgent.prompt.ts - Added PDF tool descriptions to prompts
src/lib/agent/BrowserAgent.ts - Integrated PDF tool into agent logic
src/lib/execution/Execution.ts - Added PDF execution handling
src/lib/runtime/ExecutionContext.ts - Updated context for PDF operations
webpack.config.js - Configured PDF.js worker and build settings

Dependencies

pdfjs-dist: ^5.4.296 - PDF processing library (added to package.json)

Demo Videos

Video 1: Demo (2x speed, 720p)

pdftool_2x.mp4

Things to consider:

PDF tool currently requires side panel to be open (it will open, if not). Consider ways around chrome extension limitations to run pdf tool in background.
Consider parallelizing PDF tool actions
Currently limits include: 50 pages, 3 pdfs cached
Consider security and privacy implications such as prompt injection

greptile-apps

Greptile Overview

Greptile Summary

Adds comprehensive PDF extraction capabilities to the BrowserOS agent using PDF.js v5.4.296. The implementation provides both cost-effective raw extraction (metadata, text, search, outline) and AI-powered structured data extraction.

Architecture

The PDF tool operates in the sidepanel due to Chrome extension limitations. The flow is: PdfExtractTool → Chrome messaging → PdfRequestHandler → PdfProcessingService → PDF.js services. Key features include:

Execution-scoped caching: Caches up to 3 PDFs per execution to avoid reloading
50-page performance limit: Prevents resource exhaustion on large documents
Dual extraction modes: Raw (no LLM cost) and AI-powered (structured data)
Chrome PDF viewer URL resolution: Handles chrome-extension:// URLs correctly

Key Issues Found

Logic Error (src/lib/services/PdfProcessingService.ts:69-71): disableWorker: true option passed to retry logic but PdfService.loadDocument() doesn't accept this parameter. The corrupted PDF fallback won't work as designed.
Style Violations: Excessive comments throughout violate CLAUDE.md style guide (lines 31-43). Files like PdfExtract.ts contain multi-paragraph block comments explaining obvious operations. Should be condensed to brief section separators.
Debug Logging: Multiple console.log statements left after debugging across 5 files. Should be removed per custom rule about excessive logging.

Strengths

Well-structured service layer with clear separation of concerns
Comprehensive error handling with graceful degradation
Proper type definitions using Zod schemas
Good security practices (disables eval, proper URL validation)
Cache cleanup on execution end prevents memory leaks
Excellent agent prompt documentation with usage examples

Confidence Score: 4/5

Safe to merge with minor fixes needed for logging cleanup and comment style
Score of 4 reflects solid implementation with one logic bug (disableWorker parameter) and style violations (excessive comments, debug logging). Core functionality is sound with proper error handling, caching, and security practices. No critical issues blocking merge.
src/lib/services/PdfProcessingService.ts requires fix for disableWorker parameter. src/lib/tools/PdfExtract.ts needs comment cleanup to match style guide.

Important Files Changed

File Analysis

Filename	Score	Overview
src/lib/tools/PdfExtract.ts	4/5	PDF extraction tool with comprehensive error handling, proper URL resolution, and cross-process messaging. Code is well-documented but has excessive comments per style guide.
src/lib/services/PdfRequestHandler.ts	4/5	Message routing and caching logic with execution-scoped cache management. Solid implementation with good logging.
src/lib/services/PdfProcessingService.ts	4/5	Orchestrates PDF operations with 50-page limits for performance. Has retry logic for corrupted PDFs but disableWorker fallback parameter isn't used correctly.
src/lib/services/PdfService.ts	5/5	Core PDF.js wrapper with proper worker configuration and metadata extraction. Clean implementation.
src/sidepanel/hooks/useMessageHandler.ts	4/5	Integrates PDF handler into message routing system with singleton pattern. Proper cache clearing on execution end.
src/lib/agent/BrowserAgent.prompt.ts	4/5	Comprehensive PDF tool documentation in agent prompt with usage examples and cost awareness guidance. Well-structured instructions.

Sequence Diagram

sequenceDiagram
    participant Agent as BrowserAgent
    participant Tool as PdfExtractTool
    participant Runtime as Chrome Runtime
    participant Sidepanel as Sidepanel Handler
    participant Handler as PdfRequestHandler
    participant Processing as PdfProcessingService
    participant PDFjs as PDF.js Services
    
    Agent->>Tool: Execute pdf_extract(format, pages)
    Tool->>Tool: Get current page URL
    Tool->>Tool: Resolve PDF viewer URL
    Tool->>Runtime: Open sidepanel if needed
    Note over Runtime: Wait 500ms for init
    
    Tool->>Runtime: sendMessage(PDF_PARSE_REQUEST)
    Note over Tool: Set up response listener
    
    Runtime->>Sidepanel: Route message
    Sidepanel->>Handler: handleRequest(message)
    
    alt Cache Hit
        Handler->>Handler: Get cached PDF from executionId map
        Handler->>Processing: processRequest(request, cachedDoc)
    else Cache Miss
        Handler->>Processing: processRequest(request)
        Processing->>PDFjs: PdfService.loadDocument(url)
        PDFjs-->>Processing: PDFDocumentProxy
        Handler->>Handler: Cache document (max 3 per execution)
    end
    
    Processing->>Processing: Parse page parameters
    Processing->>Processing: Apply 50-page limit if needed
    
    alt Raw Extraction
        Processing->>PDFjs: PdfExtractionService.extractText()
        PDFjs-->>Processing: ExtractedPage[]
    else AI Extraction
        Processing->>PDFjs: PdfExtractionService.extractText()
        PDFjs-->>Processing: Raw text pages
        Note over Processing: Text returned to Tool for LLM
    end
    
    Processing-->>Handler: PdfParseResponse
    Handler->>Runtime: sendMessage(PDF_PARSE_RESPONSE)
    
    Runtime->>Tool: Deliver response via listener
    Tool->>Tool: Match requestId
    
    alt Raw Mode
        Tool-->>Agent: Return formatted text/metadata
    else AI Mode
        Tool->>Agent: Invoke LLM with PDF content
        Agent->>Tool: LLM structured extraction
        Tool-->>Agent: Return structured data
    end
    
    Note over Agent: Execution completes
    Agent->>Runtime: sendMessage(PDF_CLEAR_CACHE)
    Runtime->>Sidepanel: Route clear message
    Sidepanel->>Handler: clearCache(executionId)
    Handler->>Handler: Remove cached PDFs for execution

_{16 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-02T18:20:05Z

src/lib/services/PdfProcessingService.ts

+              doc = await this.pdfService.loadDocument(request.url, {
+                disableWorker: true // Try without worker for corrupted PDFs
+              })


logic: disableWorker: true option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept disableWorker in options - it only uses isEvalSupported. This retry logic won't work as intended for corrupted PDFs.

Suggested change

doc = await this.pdfService.loadDocument(request.url, {

disableWorker: true // Try without worker for corrupted PDFs

})

doc = await this.pdfService.loadDocument(request.url, {

isEvalSupported: false // Retry with explicit isEvalSupported false

})

Prompt To Fix With AI

This is a comment left during a code review. Path: src/lib/services/PdfProcessingService.ts Line: 69:71 Comment: **logic:** `disableWorker: true` option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept `disableWorker` in options - it only uses `isEvalSupported`. This retry logic won't work as intended for corrupted PDFs. ```suggestion doc = await this.pdfService.loadDocument(request.url, { isEvalSupported: false // Retry with explicit isEvalSupported false }) ``` How can I resolve this? If you propose a fix, please make it concise.

gbsierra and others added 3 commits September 8, 2025 18:53

pr for CLA agreement

f3ae396

Merge branch 'browseros-ai:main' into main

af97d93

implement new PDF tool for agent

6694b32

greptile-apps bot reviewed Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implement new PDF tool for agent #173

implement new PDF tool for agent #173

Uh oh!

gbsierra commented Nov 2, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

implement new PDF tool for agent #173

Are you sure you want to change the base?

implement new PDF tool for agent #173

Uh oh!

Conversation

gbsierra commented Nov 2, 2025

PDF Tool Implementation - Issue #172

Summary

New PDF Capabilities

Raw Extraction

AI-Powered Extraction

Page Targeting

Agent Usage Examples

Architecture Overview

File Structure and Flow

Files Changed

New Files

Modified Files

Dependencies

Demo Videos

Video 1: Demo (2x speed, 720p)

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Architecture

Key Issues Found

Strengths

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant