当前位置：首页 > news >正文

tesseract-ocr 文本识别开发指南

news 2025/7/15 4:08:28

简介

Tesseract是由Google公司开发的光学识别引擎，chat-gpt底层也使用的是Tesseract，本人在项目中使用该插件配合百度的Paddle-ocr进行文字识别，作用为进行文字倾斜度、旋转角度的识别，如下：

参考资料：

Tesseract 安装、使用、训练模型教程简介

Tesseract 安装与环境变量配置

Linux环境搭建OpenCV运行java-cv代码

tesseract-ocr 的使用

Tesseract java

Python的调用：

上面是环境的搭建和使用，下面是关于Tesseract的Python调用方法，以获取图片的旋转角度为例：

import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
print(sys.path)
import uvicorn
import cv2
from pytesseract import Output
import pytesseract
from fastapi import FastAPI,  Request, Form, UploadFile, File
from paddleocr import PaddleOCR, PPStructure
import numpy as np
from starlette.responses import FileResponse, StreamingResponse
from fastapi.responses import  JSONResponse
import uuidocr_sever = FastAPI()#0.95  用于  ocr识别  1 用于 版面分析scaling_ocr = 0.95
scaling_structure = 1def rotate_bound(image, angle, scaling):(h, w) = image.shape[:2](cX, cY) = (w / 2, h / 2)# 抓住旋转矩阵（应用角度的负数顺时针旋转），然后抓住正弦和余弦（即矩阵的旋转分量M = cv2.getRotationMatrix2D((cX, cY), -angle, scaling)cos = np.abs(M[0, 0])sin = np.abs(M[0, 1])# compute the new bounding dimensions of the image  计算图像的新边界尺寸nW = int((h * sin) + (w * cos))nH = int((h * cos) + (w * sin))# adjust the rotation matrix to take into account translation  调整旋转矩阵以考虑平移M[0, 2] += (nW / 2) - cXM[1, 2] += (nH / 2) - cY# perform the actual rotation and return the image  执行实际旋转并返回图像return cv2.warpAffine(image, M, (nW, nH), borderValue=(255, 255, 255))# 流程整合成一体
@ocr_sever.post("/imgInfos/")
async def img_infos(fileName: str = Form(...)):print("输入文件名为：{}".format(fileName))# 拼接成 图片的 相对路径file_path = "./doc/imgs/"+fileName;image = cv2.imread(file_path)# 二值化后的图像  识别率会提高rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)results = pytesseract.image_to_osd(rgb, output_type=Output.DICT)# 然后进行图像的旋转rotated_ocr = rotate_bound(image, angle=results["rotate"], scaling=scaling_ocr)rotated_structure = rotate_bound(image, angle=results["rotate"], scaling=scaling_structure)# 然后进行  paddle的识别ocr = PaddleOCR(use_angle_cls=True, lang="ch")result = ocr.ocr(rotated_ocr, cls=True)#result2 = ocr.ocr(image, cls=True)print(result)#print(result2)# 然后进行  ppstructure  版面分析table_engine = PPStructure(show_log=True, type='structure', image_orientation=True)structResult = table_engine(rotated_structure)struct = []for line in structResult:# 去除 img元素line.pop('img')print(line)struct.append(line)#然后把  两个结果  打包成 json  进行返回data = {"ocr": result, "structure": struct}return JSONResponse(data);# 获取图片的偏转角度
@ocr_sever.post("/imgAngle/")
async def img_angle(file: UploadFile = File(...)):print("imgAngle 输入文件名为：{}".format(file.filename))file_path = "./doc/imgs/"+file.filenamewith open(file_path, 'wb') as f:f.write(await file.read())image = cv2.imread(file_path)# 二值化后的图像  识别率会提高rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)results = pytesseract.image_to_osd(rgb, output_type=Output.DICT)return results# 获取翻转后图片
@ocr_sever.post("/imgRotate/")
async def img_rotate(file: UploadFile = File(...)):print("输入文件名为：{}".format(file.filename))file_path = "./doc/imgs/"+file.filenamewith open(file_path, 'wb') as f:f.write(await file.read())# 二值化后的图像  识别率会提高image = cv2.imread(file_path)rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)results = pytesseract.image_to_osd(rgb, output_type=Output.DICT)# 然后进行图像的旋转rotated = rotate_bound(image, angle=results["rotate"], scaling=scaling_ocr)newFilePath = "./doc/imgs/"+str(uuid.uuid1())+".jpg"cv2.imwrite(newFilePath, rotated)response = StreamingResponse(get_file_byte(newFilePath))return responsedef get_file_byte(filename):  # filename可以是文件，也可以是压缩包with open(filename, "rb") as f:while True:content = f.read(1024)if content:yield contentelse:break# 获取 paddleocr的解析结果   就是原先的接口
if __name__ == "__main__":print('开始加载orc')host = '0.0.0.0'port = 9999workers = 1# 这里一定要改 文件名 test04uvicorn.run(app='test04:ocr_sever',host=host,port=int(port))

其中核心代码其实就一段,其他的是对它的综合应用

pytesseract.image_to_osd(rgb, output_type=Output.DICT)

查看全文

http://www.mrgr.cn/news/57515.html

Vue2中几个目录

1024：只为遇见更好的自己

NumPy 与 Pandas 数据操作对比：从高效计算到灵活分析的转变

基于大模型的Milvus向量数据库的背景与实战应用，计算与索引机制，Python代码实现

2024怎么保护企业办公文件？10款企业常用的文件加密软件排行榜！

docker部署SQL审核平台Archery

【Python学习】Python基础，对于库和框架的讲解（优点缺点）小白必备的！！！

邦芒干货：职场中这三种人值得深交

手持无人机飞手执照，会组装调试入伍当兵有多香！

发现创新的力量：我们的网址专利检索平台

文心一言 VS 讯飞星火 VS chatgpt （376）-- 算法导论24.4 8题

Leetcode3. 无重复字符的最长子串

【网络协议栈】Tcp协议（下）的可靠性和高效性（超时重传、快速重传、拥塞控制、流量控制）

Spring Boot植物健康系统：智能农业的春天

LogicFlow自定义业务节点

简介

参考资料：

Python的调用：

相关文章：