Configuration¶
This guide covers all configuration options available in docviz-python, including extraction parameters, LLM settings, detection options, and output configurations.
Overview¶
docviz-python provides several configuration classes to customize the extraction process:
ExtractionConfig: Controls document processing parameters
DetectionConfig: Configures layout detection settings
LLMConfig: Manages Large Language Model integration
OCRConfig: Controls Optical Character Recognition settings
ExtractionConfig¶
The ExtractionConfig class controls how documents are processed and extracted.
Basic Configuration¶
import docviz
# Default configuration
config = docviz.ExtractionConfig()
# Custom configuration
config = docviz.ExtractionConfig(
page_limit=20, # Process only first 20 pages
zoom_x=2.0, # 2x horizontal zoom for better quality
zoom_y=2.0, # 2x vertical zoom for better quality
)
Parameters Reference¶
Page Processing
page_limit
(int, optional): Maximum number of pages to process. Default: None (all pages)zoom_x
(float): Horizontal zoom factor for PDF conversion. Default: 3.0zoom_y
(float): Vertical zoom factor for PDF conversion. Default: 3.0
Text Processing
pdf_text_threshold_chars
(int): Minimum characters required for PDF text to be considered valid. Default: 1000labels_to_exclude
(list[str]): List of content labels to exclude from extraction. Default: []prefer_pdf_text
(bool): Whether to prefer PDF-embedded text over OCR. Default: False
Example Configurations¶
High Quality Configuration
high_quality = docviz.ExtractionConfig(
zoom_x=4.0,
zoom_y=4.0,
pdf_text_threshold_chars=500,
prefer_pdf_text=True
)
Fast Processing Configuration
fast_processing = docviz.ExtractionConfig(
zoom_x=2.0,
zoom_y=2.0,
page_limit=10,
pdf_text_threshold_chars=2000
)
Academic Paper Configuration
academic_config = docviz.ExtractionConfig(
zoom_x=3.5,
zoom_y=3.5,
pdf_text_threshold_chars=800,
labels_to_exclude=["header", "footer"],
prefer_pdf_text=False
)
DetectionConfig¶
Configure the underlying YOLO model for layout detection.
Basic Configuration¶
import docviz
from docviz.lib.detection.backends import DetectionBackendEnum
detection_config = docviz.DetectionConfig(
imagesize=1024,
confidence=0.5,
device="cuda",
layout_detection_backend=DetectionBackendEnum.DOCLAYOUT_YOLO,
model_path="path/to/custom/model.pt"
)
Parameters Reference¶
Required Parameters
imagesize
(int): Input image size for the model (e.g., 1024, 2048)confidence
(float): Minimum confidence threshold for detections (0.0 to 1.0)device
(str): Device for inference (“cpu”, “cuda”, “mps”)layout_detection_backend
(DetectionBackendEnum): Detection backend to usemodel_path
(str): Path to the detection model file
Custom Model Usage¶
from docviz.lib.detection.backends import DetectionBackendEnum
# Using a custom trained model
custom_detection = docviz.DetectionConfig(
imagesize=1280,
confidence=0.6,
device="cuda",
layout_detection_backend=DetectionBackendEnum.DOCLAYOUT_YOLO,
model_path="/path/to/custom_model.pt"
)
document = docviz.Document("document.pdf")
extractions = document.extract_content_sync(
detection_config=custom_detection
)
LLMConfig¶
Configure Large Language Model integration for content summarization and analysis.
Provider Configuration¶
OpenAI Configuration
import os
import docviz
openai_config = docviz.LLMConfig(
model="gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY"),
base_url="https://api.openai.com/v1"
)
Azure OpenAI Configuration
azure_config = docviz.LLMConfig(
model="gpt-4o",
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url=os.getenv("AZURE_OPENAI_ENDPOINT")
)
Local Model Configuration
local_config = docviz.LLMConfig(
model="llama2",
api_key="",
base_url="http://localhost:11434/v1"
)
Parameters Reference¶
Required Parameters
model
(str): Model name (e.g., “gpt-4o-mini”, “llama2”)api_key
(str): API key for the provider (can be empty for local models)base_url
(str): Base URL for API endpoints
Usage Examples¶
Basic LLM Integration
document = docviz.Document("research_paper.pdf")
llm_config = docviz.LLMConfig(
model="gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY"),
base_url="https://api.openai.com/v1"
)
extractions = document.extract_content_sync(
llm_config=llm_config,
includes=[docviz.ExtractionType.FIGURE, docviz.ExtractionType.TABLE]
)
Custom Prompting
# Configure for local model
local_config = docviz.LLMConfig(
model="gemma3",
api_key="",
base_url="http://localhost:11434/v1"
)
OCRConfig¶
Configure Optical Character Recognition for text extraction from images.
Basic Configuration¶
import docviz
ocr_config = docviz.OCRConfig(
lang="eng",
chart_labels=["picture", "table", "formula"],
labels_to_exclude=["header", "footer"]
)
Parameters Reference¶
Required Parameters
lang
(str): Language code for OCR (e.g., “eng”, “fra”, “eng+fra”)chart_labels
(list[str]): List of content labels to apply OCR tolabels_to_exclude
(list[str]): List of content labels to exclude from OCR
Advanced Configuration¶
# Multi-language OCR configuration
multilang_ocr = docviz.OCRConfig(
lang="eng+fra+deu", # Multiple languages
chart_labels=["picture", "table", "formula", "equation"],
labels_to_exclude=["header", "footer", "caption"]
)
Save Configuration¶
Configure output formats and file handling.
Output Formats¶
import docviz
# Single format
extractions.save("output", save_format=docviz.SaveFormat.JSON)
# Multiple formats
extractions.save("output", save_format=[
docviz.SaveFormat.JSON,
docviz.SaveFormat.CSV,
docviz.SaveFormat.EXCEL
])
Format-Specific Options¶
JSON Output
extractions.save("output", save_format=docviz.SaveFormat.JSON)
CSV Output
extractions.save("output", save_format=docviz.SaveFormat.CSV)
Excel Output
extractions.save("output", save_format=docviz.SaveFormat.EXCEL)
Configuration Files¶
Use configuration files for complex setups:
YAML Configuration
# docviz_config.yaml
extraction:
page_limit: 50
zoom_x: 3.0
zoom_y: 3.0
pdf_text_threshold_chars: 1000
labels_to_exclude: ["header", "footer"]
prefer_pdf_text: false
llm:
model: "gpt-4o-mini"
base_url: "https://api.openai.com/v1"
output:
formats: ["json", "csv"]
base_path: "output"
Loading Configuration
import yaml
import docviz
# Load configuration from file
with open("docviz_config.yaml", "r") as f:
config_data = yaml.safe_load(f)
# Create configuration objects
extraction_config = docviz.ExtractionConfig(
page_limit=config_data["extraction"]["page_limit"],
zoom_x=config_data["extraction"]["zoom_x"],
zoom_y=config_data["extraction"]["zoom_y"]
)
llm_config = docviz.LLMConfig(
model=config_data["llm"]["model"],
api_key=os.getenv("OPENAI_API_KEY"),
base_url=config_data["llm"]["base_url"]
)
# Use in document processing
document = docviz.Document("document.pdf")
extractions = document.extract_content_sync(
extraction_config=extraction_config,
llm_config=llm_config
)
Best Practices¶
Performance Optimization¶
Set page limits: For testing, limit pages to speed up processing
Choose optimal zoom factors: Balance quality and performance
Use GPU when available: Set device=”cuda” for faster inference
Adjust PDF text threshold: Lower values prefer OCR, higher values prefer PDF text
Quality vs Speed Trade-offs¶
For High Quality: - Zoom: 3.0-4.0 - Detection confidence: 0.7+ - Lower PDF text threshold: 500-800 - Prefer PDF text: True
For Speed: - Zoom: 2.0-2.5 - Detection confidence: 0.3-0.5 - Higher PDF text threshold: 1500+ - Page limit: 10-20
Resource Management¶
Monitor memory usage with large documents
Use streaming for very large files
Set appropriate timeouts for API calls
Cache model weights for repeated processing
Next Steps¶
Basic Usage - Basic usage guide
Advanced Usage - Advanced features
Output Formats - Output format details
API Reference - Complete API reference