Repositories / olmoocr_runner.git
README.md
Clone (read-only): git clone http://git.guha-anderson.com/git/olmoocr_runner.git
# olmOCR Runner
## Introduction
This project is a small local runner for converting PDFs to Markdown with
olmOCR. It uses the upstream `olmocr` Python package for document processing and
an already-installed `llama-server` binary from llama.cpp for model inference.
The runner is meant to work with either ROCm or NVIDIA GPUs as long as the
`llama-server` on `PATH` was built for the target GPU backend. The Python
environment is the same for both targets; CUDA and ROCm selection happens in
llama.cpp, not through PyTorch or vLLM dependencies.
## Installation
Install `uv` and make sure `llama-server` is available on `PATH` before running
the installer. This repository does not download or install `llama-server`.
Run:
```bash
./install.sh
```
The installer will:
1. Verify that `llama-server` is available.
2. Run `uv sync`.
3. Download the GGUF olmOCR model and multimodal projection file into:
```text
~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/
```
By default, the runner expects these files:
```text
~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/olmocr-2-7b-1025-fp8-q4_k_m.gguf
~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/mmproj-f16.gguf
```
You can override paths with environment variables:
```bash
export LLAMA_SERVER=/path/to/llama-server
export OLMOCR_GGUF_MODEL=/path/to/olmocr.gguf
export OLMOCR_MMPROJ=/path/to/mmproj.gguf
```
## Usage
Convert a PDF to Markdown with:
```bash
./ocr.sh path/to/input.pdf
```
The output is written next to the PDF with a `.md` extension. For example:
```bash
./ocr.sh docs/example.pdf
```
creates:
```text
docs/example.md
```
To run the integration test that generates a PDF with Pandoc and verifies that
olmOCR preserves a table and a formula:
```bash
uv run pytest -q tests/test_ocr_integration.py
```
That test requires `pandoc`, `xelatex`, `llama-server`, the model files, and a
working GPU backend. It skips cleanly if any of those are missing.