hocr

Here are 39 public repositories matching this topic...

UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated Oct 28, 2025
C#

manisandro / gImageReader

Star

A Gtk/Qt front-end to tesseract-ocr.

c-plus-plus gtk qt ocr scanner tesseract-ocr pdf-document hocr hocr-documents

Updated Sep 9, 2025
C++

mittagessen / kraken

Star

OCR engine for all the languages

ocr neural-networks hocr optical-character-recognition htr handwritten-text-recognition alto-xml page-xml layout-analysis

Updated Oct 23, 2025
Python

scribeocr / scribeocr

Star

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

ocr abbyy tesseract hocr proofreading

Updated Oct 28, 2025
JavaScript

BobLd / DocumentLayoutAnalysis

Sponsor

Star

Document Layout Analysis resources repos for development with PdfPig.

pdf csharp hocr tei hocr-documents alto-xml table-extraction page-xml alto layout-analysis document-layout-analysis xycut docstrum pdfpig xy-cut recursive-xy-cut page-segmentation

Updated Oct 1, 2023
C#

UB-Mannheim / ocr-fileformat

Star

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr validation transformation hocr finereader page-xml alto ocr-d

Updated May 21, 2025
JavaScript

cneud / ocr-conversion

Star

Conversions between various OCR formats

ocr hocr tei-xml alto-xml page-xml abbyy-xml

Updated May 13, 2023

dbmdz / mirador-textoverlay

Star

Text Overlay plugin for Mirador 3

ocr hocr optical-character-recognition iiif mirador-plugins alto-xml mirador alto mirador-3

Updated Oct 28, 2025
JavaScript

filak / hOCR-to-ALTO

Star

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

hocr xsl alto xslt2 xsl-stylesheets

Updated Sep 25, 2025
XSLT

UB-Mannheim / ocr-gt-tools

Star

Ergonomic line-by-line transcription of scanned text.

ocr web-interface hocr transcription ground-truth

Updated Dec 16, 2020
JavaScript

dmi3kno / hocr

Star

Text-to-tibble

r ocr tesseract rstats tesseract-ocr hocr hocr-documents tibble

Updated Apr 25, 2020
R

fakabbir / OCR

Star

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

ocr tesseract python3 hocr

Updated Jun 12, 2021
Python

macabeus / pyslibtesseract

Star

✏️ Integration of Tesseract for Python using a shared library

ocr tesseract hocr

Updated Mar 25, 2016
Python

GeReV / hocr-editor-ts

Star

A visual hOCR file editor

ocr tesseract-ocr hocr hocr-documents

Updated Apr 3, 2024
TypeScript

GeReV / HocrEditor

Star

A visual editor for .hocr files.

ocr tesseract-ocr hocr hocr-documents

Updated Feb 5, 2025
C#

hadro / new-york-city-directories

Star

Some basic data and text extraction from the New York City Directories

ocr brooklyn digital-humanities hocr pdfs manhattan nypl new-york-city-directories

Updated Jun 19, 2017

hadro / brewery-guides

Star

The data for guides to breweries across the United States from 1896 to 1918

data open-data dataset digital-humanities hocr brewing nypl digital-collections brewers brewery-guides brewing-history

Updated Apr 12, 2017

jlieth / hocr-parser

Star

Python parser for hOCR files using lxml

python ocr hocr parsing-library hocr-documents

Updated Aug 23, 2020
Python

mayurcybercz / AI-Exam-evaluation

Star

CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

python nlp cli json nltk tesseract-ocr hocr answer-sheets evaluate-marks

Updated Feb 14, 2019
Jupyter Notebook

milahu / hocr-editor-qt

Star

graphical HOCR editor to produce minimal diffs for proofreading of tesseract OCR output

tesseract tesseract-ocr hocr proofreading ocr-post-processing hocr-editor minimal-diff cst-editor ocr-proofreading ocr-postprocessing

Updated Oct 25, 2025
Python

Improve this page

Add a description, image, and links to the hocr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hocr topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hocr

Here are 39 public repositories matching this topic...

UglyToad / PdfPig

manisandro / gImageReader

mittagessen / kraken

scribeocr / scribeocr

BobLd / DocumentLayoutAnalysis

UB-Mannheim / ocr-fileformat

cneud / ocr-conversion

dbmdz / mirador-textoverlay

filak / hOCR-to-ALTO

UB-Mannheim / ocr-gt-tools

dmi3kno / hocr

fakabbir / OCR

macabeus / pyslibtesseract

GeReV / hocr-editor-ts

GeReV / HocrEditor

hadro / new-york-city-directories

hadro / brewery-guides

jlieth / hocr-parser

mayurcybercz / AI-Exam-evaluation

milahu / hocr-editor-qt

Improve this page

Add this topic to your repo