Installation Guide

This guide covers how to install docviz-python and its dependencies.

System Requirements

  • Python 3.10 or higher

  • Operating System: Windows, macOS, or Linux

  • Memory: At least 4GB RAM (8GB recommended)

  • Storage: At least 2GB free space

Package Manager Installation

Using pip

Standard pip installation:

pip install docviz-python

# Upgrade to latest version
pip install docviz-python --upgrade

From Source

Clone the repository and install in development mode:

git clone https://github.com/privateai-com/docviz.git
cd docviz
pip install -e .

Optional Dependencies

Install additional dependencies for specific features:

Development Dependencies

# Using uv
uv add docviz-python[dev]

# Using pip
pip install docviz-python[dev]

Showcase Dependencies

For Jupyter notebooks and visualization examples:

# Using uv
uv add docviz-python[showcase]

# Using pip
pip install docviz-python[showcase]

CLI Dependencies

For command-line interface features:

# Using uv
uv add docviz-python[cli]

# Using pip
pip install docviz-python[cli]

External Dependencies

docviz-python uses several external tools that may need to be installed separately:

Tesseract OCR

Required for text extraction from images:

Ubuntu/Debian:

sudo apt-get install tesseract-ocr

macOS:

brew install tesseract

Windows:

DocViz can help install Tesseract for you. On first import, DocViz performs a dependency check and will download and launch the official Tesseract installer if it’s not found. You can also install manually from https://github.com/UB-Mannheim/tesseract/wiki.

Verify installation:

tesseract --version

Automatic Dependency Management

On first import, DocViz verifies dependencies and will:

  • Check that Tesseract OCR is available and working

  • Download required detection models if missing

Models are stored under ~/.docviz/models by default. You can clear cached data with:

from docviz.environment import reset_docviz_cache
reset_docviz_cache()

Note

The first run may take longer due to model downloads. Subsequent runs reuse cached models.

Verification

Test that the installation was successful:

import docviz

# Test basic functionality
document = docviz.Document("test.pdf")
print(f"Document loaded: {document.name}")
print(f"Document has {document.page_count} pages")

Common Installation Issues

Permission Errors

If you encounter permission errors, try:

# Use user installation
pip install --user docviz-python

# Or use a virtual environment
python -m venv docviz_env
source docviz_env/bin/activate  # On Windows: docviz_env\Scripts\activate
pip install docviz-python

Missing Dependencies

If you get import errors for dependencies:

# Reinstall with all dependencies
pip install --force-reinstall docviz-python

Tesseract Not Found

If Tesseract is not found, ensure it’s in your system PATH or specify the path:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Windows example

Next Steps

After successful installation, proceed to: