-
Notifications
You must be signed in to change notification settings - Fork 43
implement new PDF tool for agent #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds comprehensive PDF extraction capabilities to the BrowserOS agent using PDF.js v5.4.296. The implementation provides both cost-effective raw extraction (metadata, text, search, outline) and AI-powered structured data extraction.
Architecture
The PDF tool operates in the sidepanel due to Chrome extension limitations. The flow is: PdfExtractTool → Chrome messaging → PdfRequestHandler → PdfProcessingService → PDF.js services. Key features include:
- Execution-scoped caching: Caches up to 3 PDFs per execution to avoid reloading
- 50-page performance limit: Prevents resource exhaustion on large documents
- Dual extraction modes: Raw (no LLM cost) and AI-powered (structured data)
- Chrome PDF viewer URL resolution: Handles
chrome-extension://URLs correctly
Key Issues Found
-
Logic Error (src/lib/services/PdfProcessingService.ts:69-71):
disableWorker: trueoption passed to retry logic butPdfService.loadDocument()doesn't accept this parameter. The corrupted PDF fallback won't work as designed. -
Style Violations: Excessive comments throughout violate CLAUDE.md style guide (lines 31-43). Files like
PdfExtract.tscontain multi-paragraph block comments explaining obvious operations. Should be condensed to brief section separators. -
Debug Logging: Multiple
console.logstatements left after debugging across 5 files. Should be removed per custom rule about excessive logging.
Strengths
- Well-structured service layer with clear separation of concerns
- Comprehensive error handling with graceful degradation
- Proper type definitions using Zod schemas
- Good security practices (disables eval, proper URL validation)
- Cache cleanup on execution end prevents memory leaks
- Excellent agent prompt documentation with usage examples
Confidence Score: 4/5
- Safe to merge with minor fixes needed for logging cleanup and comment style
- Score of 4 reflects solid implementation with one logic bug (disableWorker parameter) and style violations (excessive comments, debug logging). Core functionality is sound with proper error handling, caching, and security practices. No critical issues blocking merge.
- src/lib/services/PdfProcessingService.ts requires fix for disableWorker parameter. src/lib/tools/PdfExtract.ts needs comment cleanup to match style guide.
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/lib/tools/PdfExtract.ts | 4/5 | PDF extraction tool with comprehensive error handling, proper URL resolution, and cross-process messaging. Code is well-documented but has excessive comments per style guide. |
| src/lib/services/PdfRequestHandler.ts | 4/5 | Message routing and caching logic with execution-scoped cache management. Solid implementation with good logging. |
| src/lib/services/PdfProcessingService.ts | 4/5 | Orchestrates PDF operations with 50-page limits for performance. Has retry logic for corrupted PDFs but disableWorker fallback parameter isn't used correctly. |
| src/lib/services/PdfService.ts | 5/5 | Core PDF.js wrapper with proper worker configuration and metadata extraction. Clean implementation. |
| src/sidepanel/hooks/useMessageHandler.ts | 4/5 | Integrates PDF handler into message routing system with singleton pattern. Proper cache clearing on execution end. |
| src/lib/agent/BrowserAgent.prompt.ts | 4/5 | Comprehensive PDF tool documentation in agent prompt with usage examples and cost awareness guidance. Well-structured instructions. |
Sequence Diagram
sequenceDiagram
participant Agent as BrowserAgent
participant Tool as PdfExtractTool
participant Runtime as Chrome Runtime
participant Sidepanel as Sidepanel Handler
participant Handler as PdfRequestHandler
participant Processing as PdfProcessingService
participant PDFjs as PDF.js Services
Agent->>Tool: Execute pdf_extract(format, pages)
Tool->>Tool: Get current page URL
Tool->>Tool: Resolve PDF viewer URL
Tool->>Runtime: Open sidepanel if needed
Note over Runtime: Wait 500ms for init
Tool->>Runtime: sendMessage(PDF_PARSE_REQUEST)
Note over Tool: Set up response listener
Runtime->>Sidepanel: Route message
Sidepanel->>Handler: handleRequest(message)
alt Cache Hit
Handler->>Handler: Get cached PDF from executionId map
Handler->>Processing: processRequest(request, cachedDoc)
else Cache Miss
Handler->>Processing: processRequest(request)
Processing->>PDFjs: PdfService.loadDocument(url)
PDFjs-->>Processing: PDFDocumentProxy
Handler->>Handler: Cache document (max 3 per execution)
end
Processing->>Processing: Parse page parameters
Processing->>Processing: Apply 50-page limit if needed
alt Raw Extraction
Processing->>PDFjs: PdfExtractionService.extractText()
PDFjs-->>Processing: ExtractedPage[]
else AI Extraction
Processing->>PDFjs: PdfExtractionService.extractText()
PDFjs-->>Processing: Raw text pages
Note over Processing: Text returned to Tool for LLM
end
Processing-->>Handler: PdfParseResponse
Handler->>Runtime: sendMessage(PDF_PARSE_RESPONSE)
Runtime->>Tool: Deliver response via listener
Tool->>Tool: Match requestId
alt Raw Mode
Tool-->>Agent: Return formatted text/metadata
else AI Mode
Tool->>Agent: Invoke LLM with PDF content
Agent->>Tool: LLM structured extraction
Tool-->>Agent: Return structured data
end
Note over Agent: Execution completes
Agent->>Runtime: sendMessage(PDF_CLEAR_CACHE)
Runtime->>Sidepanel: Route clear message
Sidepanel->>Handler: clearCache(executionId)
Handler->>Handler: Remove cached PDFs for execution
16 files reviewed, 1 comment
| doc = await this.pdfService.loadDocument(request.url, { | ||
| disableWorker: true // Try without worker for corrupted PDFs | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: disableWorker: true option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept disableWorker in options - it only uses isEvalSupported. This retry logic won't work as intended for corrupted PDFs.
| doc = await this.pdfService.loadDocument(request.url, { | |
| disableWorker: true // Try without worker for corrupted PDFs | |
| }) | |
| doc = await this.pdfService.loadDocument(request.url, { | |
| isEvalSupported: false // Retry with explicit isEvalSupported false | |
| }) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/services/PdfProcessingService.ts
Line: 69:71
Comment:
**logic:** `disableWorker: true` option passed but not used by PdfService.loadDocument. PdfService.ts:40-52 doesn't accept `disableWorker` in options - it only uses `isEvalSupported`. This retry logic won't work as intended for corrupted PDFs.
```suggestion
doc = await this.pdfService.loadDocument(request.url, {
isEvalSupported: false // Retry with explicit isEvalSupported false
})
```
How can I resolve this? If you propose a fix, please make it concise.
PDF Tool Implementation - Issue #172
Summary
Adds PDF extraction capabilities to the BrowserOS agent, enabling processing of PDF documents.
New PDF Capabilities
Raw Extraction
format={metadata: true}→ Document info (title, author, page count)format={text: true}→ Raw text from pagesformat={find: {query: "search"}}→ Text search with page locationsformat={outline: true}→ Table of contentsAI-Powered Extraction
format={companies: [], ceos: []}→ Custom structured datataskparameter describing what to extractPage Targeting
page: [1, 3, 5]→ Specific pagespages: "all"→ Entire documentpages: {start: 1, end: 10}→ Page rangesAgent Usage Examples
Architecture Overview
PDF processing occurs in the sidepanel due to Chrome extension restrictions.
Key Components:
File Structure and Flow
Files Changed
New Files
src/lib/tools/PdfExtract.ts- Main PDF extraction toolsrc/lib/services/PdfRequestHandler.ts- Request handling & cachingsrc/lib/services/PdfProcessingService.ts- Processing orchestrationsrc/lib/services/PdfService.ts- PDF.js core operationssrc/lib/services/PdfExtractionService.ts- Text/search/outline extractionsrc/lib/types/pdf.ts- PDF type definitionssrc/lib/utils/PdfPageUtils.ts- Page parameter utilitiesModified Files
src/sidepanel/hooks/useMessageHandler.ts- Added PDF request routingsrc/lib/tools/index.ts- Added PdfExtract exportsrc/lib/types/messaging.ts- Added PDF message typespackage.json- Added pdfjs-dist dependencysrc/lib/agent/BrowserAgent.prompt.ts- Added PDF tool descriptions to promptssrc/lib/agent/BrowserAgent.ts- Integrated PDF tool into agent logicsrc/lib/execution/Execution.ts- Added PDF execution handlingsrc/lib/runtime/ExecutionContext.ts- Updated context for PDF operationswebpack.config.js- Configured PDF.js worker and build settingsDependencies
pdfjs-dist: ^5.4.296- PDF processing library (added to package.json)Demo Videos
Video 1: Demo (2x speed, 720p)
pdftool_2x.mp4
Things to consider: