Repositories / olmoocr_runner.git

README.md

Clone (read-only): git clone http://git.guha-anderson.com/git/olmoocr_runner.git

Branch
1903 bytes · 448706a3b569
# olmOCR Runner ## Introduction This project is a small local runner for converting PDFs to Markdown with olmOCR. It uses the upstream `olmocr` Python package for document processing and an already-installed `llama-server` binary from llama.cpp for model inference. The runner is meant to work with either ROCm or NVIDIA GPUs as long as the `llama-server` on `PATH` was built for the target GPU backend. The Python environment is the same for both targets; CUDA and ROCm selection happens in llama.cpp, not through PyTorch or vLLM dependencies. ## Installation Install `uv` and make sure `llama-server` is available on `PATH` before running the installer. This repository does not download or install `llama-server`. Run: ```bash ./install.sh ``` The installer will: 1. Verify that `llama-server` is available. 2. Run `uv sync`. 3. Download the GGUF olmOCR model and multimodal projection file into: ```text ~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/ ``` By default, the runner expects these files: ```text ~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/olmocr-2-7b-1025-fp8-q4_k_m.gguf ~/models/olmOCR-2-7B-1025-Q4_K_M-GGUF/mmproj-f16.gguf ``` You can override paths with environment variables: ```bash export LLAMA_SERVER=/path/to/llama-server export OLMOCR_GGUF_MODEL=/path/to/olmocr.gguf export OLMOCR_MMPROJ=/path/to/mmproj.gguf ``` ## Usage Convert a PDF to Markdown with: ```bash ./ocr.sh path/to/input.pdf ``` The output is written next to the PDF with a `.md` extension. For example: ```bash ./ocr.sh docs/example.pdf ``` creates: ```text docs/example.md ``` To run the integration test that generates a PDF with Pandoc and verifies that olmOCR preserves a table and a formula: ```bash uv run pytest -q tests/test_ocr_integration.py ``` That test requires `pandoc`, `xelatex`, `llama-server`, the model files, and a working GPU backend. It skips cleanly if any of those are missing.