OneOffTech Parxy

Parxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.

Unified API to parse documents with different providers
Unified flexible hierarchical document model (page → block → line → span → character)
Supports both local libraries (e.g., PyMuPDF, Unstructured) and remote services (e.g., LlamaParse, LLMWhisperer, PdfAct)
Extensible: easily integrate new parsers in your own code
Trace the execution for debug purposes
Pair with evaluation utilities to compare extraction results (coming soon)

Requirements

Python 3.12 or above (Python 3.10 and 3.11 are supported on best-effort).

Next steps

Getting started
- The Parxy CLI
- Install the library in your application
Supported document processing services
Personalize drivers

Getting started

Parxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using uvx.

Use with minimal footprint (fewer drivers supported):

uvx parxy --help

Use all supported drivers:

uvx parxy[all] --help

See Supported services for the list of included drivers and their extras for the installation.

Use on the command line

You can install Parxy globally using either pip or uv. If you prefer you can execute without installation using uvx.

# Using pip
pip install parxy       # Basic installation
pip install parxy[all]  # All drivers included

# Using uv
uv pip install parxy       # Basic installation
uv pip install parxy[all]  # All drivers included

# Using uvx
uvx parxy       # Basic installation
uvx parxy[all]  # All drivers included

Once installed, you can use the parxy command to:

parxy parse: Extract text content from documents with customizable granularity levels (page, block, line, span, or character)
parxy markdown: Convert documents into Markdown format, with optional combining of multiple documents
parxy drivers: List available document processing drivers
parxy env: Create a configuration file with default settings
parxy docker: Generate a Docker Compose configuration for self-hosted services

Example usage:

# Extract text from a PDF using the default driver
parxy parse document.pdf

# Convert multiple PDFs to markdown and combine them
parxy markdown --combine -o output/ doc1.pdf doc2.pdf

# List available drivers
parxy drivers

# Create default configuration
parxy env

See Using the Parxy Command Line Interface or run parxy --help for more information about available commands and options.

Use as a library in your project

Install, all or the driver you need

# Install all supported drivers via Pip
pip install parxy[all]

# add to your project using when using UV
uv add parxy[all]

You can also install optional parser backends depending on your needs (e.g. PyMuPDF, Unstructured, LlamaParse):

Add the env variables when needed

Some services require an api key. Parxy support those as environment variables. You can create a .env file in your project root.

# LlamaParse 
PARXY_LLAMAPARSE_API_KEY=

# Unstract LLMWhisperer
PARXY_LLMWHISPERER_API_KEY=

Call the driver

from parxy_core.facade import Parxy

# Parse a document using the default driver
doc = Parxy.parse('path/to/document.pdf')

# Print basic information
print(f"Pages: {len(doc.pages)}")
print(f"Title: {doc.metadata.title}")

# Parse a document using a specific driver
Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')

For more information take a look at our Getting Started with Parxy tutorial.

Supported services

Service or Library	Support status	Extra	Local file	Remote file
PyMuPDF	Live	-	✅	✅
PdfAct	Live	-	✅	✅
Unstructured library	Preview	`unstructured_local`	✅	✅
Landing AI Agentic Document Extraction	Preview	`landingai`	✅	✅
LlamaParse	Preview	`llama`	✅	✅
LLMWhisperer	Preview	`llmwhisperer`	✅	✅
Unstructured.io cloud service	Planned
Chunkr	Planned
Docling	Planned

...and more can be added via the live extension!

Live extension

Live Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.

Create a class that inherits from Driver

from parxy_core.drivers import Driver
from parxy_core.models import Document

class CustomDriverExample(Driver):
    """Example custom driver for testing."""

    def _handle(self, file, level="page") -> Document:
        return Document(pages=[])

Register it in Parxy using the extend method

Parxy.extend(name='my_parser', callback=lambda: CustomDriverExample())

Use it

Parxy.driver('my_parser').parse('path/to/document.pdf')

More on the live extension in our How to Add a New Parser to Parxy guide.

Contributing

Thank you for considering contributing to Parxy! You can find how to get started in our contribution guide.

Interested in adding a new parser to the supported list, take a look at our How to Add a New Parser to Parxy guide.

Development

Parxy uses UV as package and project manager.

Clone the repository
Sync all dependencies with uv sync --all-extras

All Parxy code is located in the src directory:

parxy_core contains the drivers implementations, the models and the facade and factory to access Parxy features
parxy_cli contains the module providing the command line interface

Optional Dependencies vs Dependency Groups

Parxy uses optional dependencies to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.

The question What’s the difference between optional-dependencies and dependency-groups in pyproject.toml? give a nice overview of the differences.

Testing

Parxy is tested using Pytest. Tests, located under tests folder, run for each commit and pull request.

To execute the test suite run:

uv run pytest

You can run type checking and linting via:

uv run ruff check

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Supporters

The project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.

Licence and Copyright

Parxy is licensed under the GPL v3 licence.

All contributors

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENCE		LICENCE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OneOffTech Parxy

Getting started

Use on the command line

Use as a library in your project

Supported services

Live extension

Contributing

Development

Optional Dependencies vs Dependency Groups

Testing

Security Vulnerabilities

Supporters

Licence and Copyright

About

Uh oh!

Releases 12

Packages

Uh oh!

Uh oh!

Languages

License

OneOffTech/parxy

Folders and files

Latest commit

History

Repository files navigation

OneOffTech Parxy

Getting started

Use on the command line

Use as a library in your project

Supported services

Live extension

Contributing

Development

Optional Dependencies vs Dependency Groups

Testing

Security Vulnerabilities

Supporters

Licence and Copyright

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Uh oh!

Languages

Packages