# Ollama-OCR
**Repository Path**: MrJson_yangkai/Ollama-OCR
## Basic Information
- **Project Name**: Ollama-OCR
- **Description**: Ollama-OCR 支持多种输出格式,以适应不同的用例:Markdown:保留结构化格式,如标题、列表和项目符号。纯文本:提取干净、未格式化的文本。JSON:机器可读的结构化输出,易于集成。结构化格式:提取并按层次组织表格和内容。键值对:适用于表格、收据或标记数据提取。
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-12-16
- **Last Updated**: 2024-12-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Ollama OCR 🔍
A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images. Available both as a Python package and a Streamlit web application.
## 🌟 Features
- **Multiple Vision Models Support**
- LLaVA 7B: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Advanced model with high accuracy for complex documents
- **Multiple Output Formats**
- Markdown: Preserves text formatting with headers and lists
- Plain Text: Clean, simple text extraction
- JSON: Structured data format
- Structured: Tables and organized data
- Key-Value Pairs: Extracts labeled information
- **Batch Processing**
- Process multiple images in parallel
- Progress tracking for each image
- Image preprocessing (resize, normalize, etc.)
## 📦 Package Installation
```bash
pip install ollama-ocr
```
## 🚀 Quick Start
### Prerequisites
1. Install Ollama
2. Pull the required model:
```bash
ollama pull llama3.2-vision:11b
```
## Using the Package
### Single Image Processing
```python
from ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b') # You can use any vision model available on Ollama
# Process an image
result = ocr.process_image(
image_path="path/to/your/image.png",
format_type="markdown" # Options: markdown, text, json, structured, key_value
)
print(result)
```
### Batch Processing (New! 🆕)
```python
from ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4) # max workers for parallel processing
# Process multiple images
# Process multiple images with progress tracking
batch_results = ocr.process_batch(
input_path="path/to/images/folder", # Directory or list of image paths
format_type="markdown",
recursive=True, # Search subdirectories
preprocess=True # Enable image preprocessing
)
# Access results
for file_path, text in batch_results['results'].items():
print(f"\nFile: {file_path}")
print(f"Extracted Text: {text}")
# View statistics
print("\nProcessing Statistics:")
print(f"Total images: {batch_results['statistics']['total']}")
print(f"Successfully processed: {batch_results['statistics']['successful']}")
print(f"Failed: {batch_results['statistics']['failed']}")
```
## 📋 Output Format Details
1. **Markdown Format**: The output is a markdown string containing the extracted text from the image.
2. **Text Format**: The output is a plain text string containing the extracted text from the image.
3. **JSON Format**: The output is a JSON object containing the extracted text from the image.
4. **Structured Format**: The output is a structured object containing the extracted text from the image.
5. **Key-Value Format**: The output is a dictionary containing the extracted text from the image.
-----
## 🌐 Streamlit Web Application(supports batch processing)
- **User-Friendly Interface**
- Drag-and-drop image upload
- Real-time processing
- Download extracted text
- Image preview with details
- Responsive design
1. Clone the repository:
```bash
git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
2. Go to the directory where app.py is located:
```bash
cd src
```
3. Run the Streamlit app:
```bash
streamlit run app.py
```
## Examples Output
### Input Image

### Sample Output


## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🙏 Acknowledgments
Built with Ollama
Powered by LLaMA Vision Models
## Star History
[](https://star-history.com/#imanoop7/Ollama-OCR&Date)