Installation Guide¶

This guide covers how to install docviz-python and its dependencies.

System Requirements¶

Python 3.10 or higher
Operating System: Windows, macOS, or Linux
Memory: At least 4GB RAM (8GB recommended)
Storage: At least 2GB free space

Package Manager Installation¶

Using uv (Recommended)¶

uv is a fast Python package installer and resolver. It’s the recommended way to install docviz-python:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install docviz-python
uv add docviz-python

# Or add to an existing project
uv add docviz-python --project

Using pip¶

Standard pip installation:

pip install docviz-python

# Upgrade to latest version
pip install docviz-python --upgrade

From Source¶

Clone the repository and install in development mode:

git clone https://github.com/privateai-com/docviz.git
cd docviz
pip install -e .

Optional Dependencies¶

Install additional dependencies for specific features:

Development Dependencies¶

# Using uv
uv add docviz-python[dev]

# Using pip
pip install docviz-python[dev]

Showcase Dependencies¶

For Jupyter notebooks and visualization examples:

# Using uv
uv add docviz-python[showcase]

# Using pip
pip install docviz-python[showcase]

CLI Dependencies¶

For command-line interface features:

# Using uv
uv add docviz-python[cli]

# Using pip
pip install docviz-python[cli]

External Dependencies¶

docviz-python uses several external tools that may need to be installed separately:

Tesseract OCR¶

Required for text extraction from images:

Ubuntu/Debian:

sudo apt-get install tesseract-ocr

macOS:

brew install tesseract

Windows:

DocViz can help install Tesseract for you. On first import, DocViz performs a dependency check and will download and launch the official Tesseract installer if it’s not found. You can also install manually from https://github.com/UB-Mannheim/tesseract/wiki.

Verify installation:

tesseract --version

Automatic Dependency Management¶

On first import, DocViz verifies dependencies and will:

Check that Tesseract OCR is available and working
Download required detection models if missing

Models are stored under ~/.docviz/models by default. You can clear cached data with:

from docviz.environment import reset_docviz_cache
reset_docviz_cache()

Note

The first run may take longer due to model downloads. Subsequent runs reuse cached models.

Verification¶

Test that the installation was successful:

import docviz

# Test basic functionality
document = docviz.Document("test.pdf")
print(f"Document loaded: {document.name}")
print(f"Document has {document.page_count} pages")

Common Installation Issues¶

Permission Errors¶

If you encounter permission errors, try:

# Use user installation
pip install --user docviz-python

# Or use a virtual environment
python -m venv docviz_env
source docviz_env/bin/activate  # On Windows: docviz_env\Scripts\activate
pip install docviz-python

Missing Dependencies¶

If you get import errors for dependencies:

# Reinstall with all dependencies
pip install --force-reinstall docviz-python

Tesseract Not Found¶

If Tesseract is not found, ensure it’s in your system PATH or specify the path:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Windows example

Next Steps¶

After successful installation, proceed to:

Quick Start Guide - Quick start guide
Basic Usage - Basic usage tutorial
API Reference - API reference