当前位置：首页 > news >正文

PaddleOCR训练自己的私有数据集（包括标注、制作数据集、训练及应用）

news 2025/7/6 12:00:11

一、制作数据集

1、进入到PaddleOCR-releas-2.7目录

2、首先启用PPOCRLabel：在终端激活环境

3、接着点击左下角的自动标注

4、确认完成后点击左上角

5、新建gen_ocr_train_val_test.py

二、训练文字检测模型

1、模型下载

2.、配置ppocr检测模型文件

3、文字检测模型开始训练

4.、测试训练模型

三、训练文字识别模型

1、修改识别模型配置文件

2、模型训练

3、模型测试

四、转换成推理模型

1、在anaconda终端中输入指令进行测试

2.用predict_system.py进行验证

五、推理模型部署在RK3588上

记录自己使用的过程

前提：

已经完成电脑的基础配置包括：

cuda、cudnn、pytorch、conda、PPOCRLabel等

没有安装的可以参考：ubuntu22.04安装PPOCRLabel-CSDN博客

我之前写的一篇文章：PaddleOCR环境搭建、模型训练、推理、部署全流程（Ubuntu系统）_随记1-CSDN博客

这次主要改进并优化之前写的一篇内容

一、制作数据集

1、进入到PaddleOCR-releas-2.7目录

新建一个data文件夹、data文件夹下放置存放要标注的图片，名字命名images

在data下新建test_images用来存放测试图片

2、首先启用PPOCRLabel：在终端激活环境

conda activate label4

PPOCRLabel --lang ch --kie True

接着点击左上角文件，把下面三个都选上

自动导出标记结果、自动重新识别、自动保存未提交变更

3、接着点击左下角的自动标注

自动标注完成点击

对每一个图片内容进行检测并改正;完成之后点击右下角确认

4、确认完成后点击左上角

导出标记结果、导出识别结果

完成之后可以发现/home/sxj/ppocr-1/data/images下多了这几个文件

5、新建gen_ocr_train_val_test.py

功能描述：分别划分检测和识别的训练集、验证集、测试集

# coding:utf8
import os
import shutil
import random
import argparse# 删除划分的训练集、验证集、测试集文件夹，重新创建一个空的文件夹
def isCreateOrDeleteFolder(path, flag):flagPath = os.path.join(path, flag)if os.path.exists(flagPath):shutil.rmtree(flagPath)os.makedirs(flagPath)flagAbsPath = os.path.abspath(flagPath)return flagAbsPathdef splitTrainVal(root, absTrainRootPath, absValRootPath, absTestRootPath, trainTxt, valTxt, testTxt, flag):# 按照指定的比例划分训练集、验证集、测试集dataAbsPath = os.path.abspath(root)if flag == "det":labelFilePath = os.path.join(dataAbsPath, args.detLabelFileName)elif flag == "rec":labelFilePath = os.path.join(dataAbsPath, args.recLabelFileName)labelFileRead = open(labelFilePath, "r", encoding="UTF-8")labelFileContent = labelFileRead.readlines()random.shuffle(labelFileContent)labelRecordLen = len(labelFileContent)for index, labelRecordInfo in enumerate(labelFileContent):imageRelativePath = labelRecordInfo.split('\t')[0]imageLabel = labelRecordInfo.split('\t')[1]imageName = os.path.basename(imageRelativePath)if flag == "det":imagePath = os.path.join(dataAbsPath, imageName)elif flag == "rec":imagePath = os.path.join(dataAbsPath, "{}/{}".format(args.recImageDirName, imageName))# 按预设的比例划分训练集、验证集、测试集trainValTestRatio = args.trainValTestRatio.split(":")trainRatio = eval(trainValTestRatio[0]) / 10valRatio = trainRatio + eval(trainValTestRatio[1]) / 10curRatio = index / labelRecordLenif curRatio < trainRatio:imageCopyPath = os.path.join(absTrainRootPath, imageName)shutil.copy(imagePath, imageCopyPath)trainTxt.write("{}\t{}".format(imageCopyPath, imageLabel))elif curRatio >= trainRatio and curRatio < valRatio:imageCopyPath = os.path.join(absValRootPath, imageName)shutil.copy(imagePath, imageCopyPath)valTxt.write("{}\t{}".format(imageCopyPath, imageLabel))else:imageCopyPath = os.path.join(absTestRootPath, imageName)shutil.copy(imagePath, imageCopyPath)testTxt.write("{}\t{}".format(imageCopyPath, imageLabel))# 删掉存在的文件
def removeFile(path):if os.path.exists(path):os.remove(path)def genDetRecTrainVal(args):detAbsTrainRootPath = isCreateOrDeleteFolder(args.detRootPath, "train")detAbsValRootPath = isCreateOrDeleteFolder(args.detRootPath, "val")detAbsTestRootPath = isCreateOrDeleteFolder(args.detRootPath, "test")recAbsTrainRootPath = isCreateOrDeleteFolder(args.recRootPath, "train")recAbsValRootPath = isCreateOrDeleteFolder(args.recRootPath, "val")recAbsTestRootPath = isCreateOrDeleteFolder(args.recRootPath, "test")removeFile(os.path.join(args.detRootPath, "train.txt"))removeFile(os.path.join(args.detRootPath, "val.txt"))removeFile(os.path.join(args.detRootPath, "test.txt"))removeFile(os.path.join(args.recRootPath, "train.txt"))removeFile(os.path.join(args.recRootPath, "val.txt"))removeFile(os.path.join(args.recRootPath, "test.txt"))detTrainTxt = open(os.path.join(args.detRootPath, "train.txt"), "a", encoding="UTF-8")detValTxt = open(os.path.join(args.detRootPath, "val.txt"), "a", encoding="UTF-8")detTestTxt = open(os.path.join(args.detRootPath, "test.txt"), "a", encoding="UTF-8")recTrainTxt = open(os.path.join(args.recRootPath, "train.txt"), "a", encoding="UTF-8")recValTxt = open(os.path.join(args.recRootPath, "val.txt"), "a", encoding="UTF-8")recTestTxt = open(os.path.join(args.recRootPath, "test.txt"), "a", encoding="UTF-8")splitTrainVal(args.datasetRootPath, detAbsTrainRootPath, detAbsValRootPath, detAbsTestRootPath, detTrainTxt, detValTxt,detTestTxt, "det")for root, dirs, files in os.walk(args.datasetRootPath):for dir in dirs:if dir == 'crop_img':splitTrainVal(root, recAbsTrainRootPath, recAbsValRootPath, recAbsTestRootPath, recTrainTxt, recValTxt,recTestTxt, "rec")else:continuebreakif __name__ == "__main__":# 功能描述：分别划分检测和识别的训练集、验证集、测试集# 说明：可以根据自己的路径和需求调整参数，图像数据往往多人合作分批标注，每一批图像数据放在一个文件夹内用PPOCRLabel进行标注，# 如此会有多个标注好的图像文件夹汇总并划分训练集、验证集、测试集的需求parser = argparse.ArgumentParser()parser.add_argument("--trainValTestRatio",type=str,default="6:2:2",help="ratio of trainset:valset:testset")parser.add_argument("--datasetRootPath",type=str,default="./data/",help="path to the dataset marked by ppocrlabel, E.g, dataset folder named 1,2,3...")parser.add_argument("--detRootPath",type=str,default="./data/det",help="the path where the divided detection dataset is placed")parser.add_argument("--recRootPath",type=str,default="./data/rec",help="the path where the divided recognition dataset is placed")parser.add_argument("--detLabelFileName",type=str,default="Label.txt",help="the name of the detection annotation file")parser.add_argument("--recLabelFileName",type=str,default="rec_gt.txt",help="the name of the recognition annotation file")parser.add_argument("--recImageDirName",type=str,default="crop_img",help="the name of the folder where the cropped recognition dataset is located")args = parser.parse_args()genDetRecTrainVal(args)

运行：

python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ./data/images

完成之后在data文件夹下多了：det、rec两个文件夹

二、训练文字检测模型

可使用的模型参考模型列表，ppocr版本这里PPOCR版本作为预训练模型：

（经常用放在这里）

1、模型下载

下载之后在PaddleOCR-release-2.7根目录下建立pretrain_models文件夹，并将训练模型解压至该文件夹下。如下图：

2.、配置ppocr检测模型文件

在configs / det / ch_ppocr_v2.0 /找到 ch_det_res18_db_v2.0.yml配置文件

我的ch_det_res18_db_v2.0.yml代码：

Global:use_gpu: true                # 是否用GPU，无改为falseepoch_num: 50                # 训练迭代次数log_smooth_window: 20print_batch_step: 2          # 一次图片传输张数save_model_dir: ./output/ch_db_res18/   # 输出模型文件路径save_epoch_step: 50          # 训练迭代多少次保存一次训练模型# evaluation is run every 5000 iterations after the 4000th iterationeval_batch_step: [3000, 2000]cal_metric_during_train: Falsepretrained_model: ./pretrain_models/ch_PP-OCRv4_det_train/best_accuracy.pdparams # 刚下载好的训练模型路径checkpoints:save_inference_dir:use_visualdl: Falseinfer_img: doc/imgs_en/img_10.jpgsave_res_path: ./output/det_db/predicts_db.txtArchitecture:model_type: detalgorithm: DBTransform:Backbone:name: ResNet_vdlayers: 18disable_se: TrueNeck:name: DBFPNout_channels: 256Head:name: DBHeadk: 50Loss:name: DBLossbalance_loss: truemain_loss_type: DiceLossalpha: 5beta: 10ohem_ratio: 3Optimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Cosinelearning_rate: 0.001warmup_epoch: 2regularizer:name: 'L2'factor: 0PostProcess:name: DBPostProcessthresh: 0.3box_thresh: 0.6max_candidates: 1000unclip_ratio: 1.5Metric:name: DetMetricmain_indicator: hmeanTrain:dataset:name: SimpleDataSetdata_dir: ./data/             # train_data路径label_file_list:- ./data/det/train.txt      # 数据集标签路径ratio_list: [1.0]transforms:- DecodeImage: # load imageimg_mode: BGRchannel_first: False- DetLabelEncode: # Class handling label- IaaAugment:augmenter_args:- { 'type': Fliplr, 'args': { 'p': 0.5 } }- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }- { 'type': Resize, 'args': { 'size': [0.5, 3] } }- EastRandomCropData:size: [960, 960]max_tries: 50keep_ratio: true- MakeBorderMap:shrink_ratio: 0.4thresh_min: 0.3thresh_max: 0.7- MakeShrinkMap:shrink_ratio: 0.4min_text_size: 8- NormalizeImage:scale: 1./255.mean: [0.485, 0.456, 0.406]std: [0.229, 0.224, 0.225]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader listloader:shuffle: Truedrop_last: Falsebatch_size_per_card: 2num_workers: 2Eval:dataset:name: SimpleDataSetdata_dir: ./data/                   # train_data路径label_file_list:- ./data/det/val.txt              # 数据集中的评估标签transforms:- DecodeImage: # load imageimg_mode: BGRchannel_first: False- DetLabelEncode: # Class handling label- DetResizeForTest:
#           image_shape: [736, 1280]- NormalizeImage:scale: 1./255.mean: [0.485, 0.456, 0.406]std: [0.229, 0.224, 0.225]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: ['image', 'shape', 'polys', 'ignore_tags']loader:shuffle: Falsedrop_last: Falsebatch_size_per_card: 1 # must be 1num_workers: 2

3、文字检测模型开始训练

打开anaconda终端，激活环境进入到PaddleOCR-releas-2.7根目录下

输入以下指令开始模型训练：

conda activate label4

python tools/train.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml

4.、测试训练模型

找到模型保存的路径：/output/ch_db_res18/

使用best_accuracy.pdparams进行我们的模型测试，没有说明训练次数少，用latest.pdparams模型测试

在anaconda终端中输入以下指令进行测试，其中Global.pretrained_model是训练好并且需要测试的模型，Global.infer_img为所要检测的图片路径：

python tools/infer_det.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o Global.pretrained_model=output/ch_db_res18/latest.pdparams Global.infer_img="./data/test_images/1.jpg"

python tools/infer_det.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o Global.pretrained_model=output/ch_db_res18/best_accuracy.pdparams Global.infer_img="./data/test_images/1.jpeg"

检测模型完成，接下来进行识别模型训练和测试！！！

三、训练文字识别模型

1、修改识别模型配置文件

文字识别使用配置文件为ch_PP-OCRv3_rec.yml

在configs / rec / PP-OCRv3 /找到 ch_PP-OCRv3_rec.yml 配置文件

修改的地方和文字检测修改类似：

我自己ch_PP-OCRv3_rec.yml代码：

Global:debug: falseuse_gpu: trueepoch_num: 50log_smooth_window: 20print_batch_step: 1save_model_dir: ./output/rec_ppocr_v3save_epoch_step: 15eval_batch_step: [3000, 2000]cal_metric_during_train: truepretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_train/student.pdparams # 识别训练模型路径checkpoints:save_inference_dir:use_visualdl: falseinfer_img: doc/imgs_words/ch/word_1.jpgcharacter_dict_path: ppocr/utils/ppocr_keys_v1.txtmax_text_length: &max_text_length 25infer_mode: falseuse_space_char: truedistributed: truesave_res_path: ./output/rec/predicts_ppocrv4.txtOptimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Cosinelearning_rate: 0.001warmup_epoch: 5regularizer:name: L2factor: 3.0e-05Architecture:model_type: recalgorithm: SVTR_LCNetTransform:Backbone:name: MobileNetV1Enhancescale: 0.5last_conv_stride: [1, 2]last_pool_type: avglast_pool_kernel_size: [2, 2]Head:name: MultiHeadhead_list:- CTCHead:Neck:name: svtrdims: 64depth: 2hidden_dims: 120use_guide: TrueHead:fc_decay: 0.00001- SARHead:enc_dim: 512max_text_length: *max_text_lengthLoss:name: MultiLossloss_config_list:- CTCLoss:- SARLoss:PostProcess:  name: CTCLabelDecodeMetric:name: RecMetricmain_indicator: accignore_space: FalseTrain:dataset:name: SimpleDataSetdata_dir: ./train_data/ext_op_transform_idx: 1label_file_list:- ./data/rec/train.txt      # 识别训练数据集标签路径transforms:- DecodeImage:img_mode: BGRchannel_first: false- RecConAug:prob: 0.5ext_data_num: 2image_shape: [48, 320, 3]max_text_length: *max_text_length- RecAug:- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: truebatch_size_per_card: 16drop_last: truenum_workers: 8
Eval:dataset:name: SimpleDataSetdata_dir: ./data/label_file_list:- ./data/rec/val.txt         # 识别数据集中的评估标签路径transforms:- DecodeImage:img_mode: BGRchannel_first: false- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: falsedrop_last: falsebatch_size_per_card: 16num_workers: 8

2、模型训练

打开anaconda终端，激活环境进入到PaddleOCR-releas-2.7根目录下。输入以下指令开始模型训练。

python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml

训练完成。

3、模型测试

在anaconda终端中输入以下指令进行测试。其中Global.pretrained_model是我们训练好并且需要测试的模型，Global.infer_img为所要检测的图片路径。

python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=output/rec_ppocr_v3/latest.pdparams Global.infer_img="./data/test_images/1.jpeg"

识别模型结束

***************************************************************************

四、转换成推理模型

1、在anaconda终端中输入指令进行测试

其中Global.pretrained_model是训练好并且需要推理的模型，Global.save_inference_dir为要保存推理模型的位置。推理模型是可以直接被调用进行识别和检测。分别把训练好的文字检测模型和文字识别模型推理。

python tools/export_model.py -c "./configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml" -o Global.pretrained_model="./output/ch_db_res18/latest.pdparams" Global.save_inference_dir="./inference_model/det/"

保存在 inference model is saved to ./inference_model/det/inference

python tools/export_model.py -c "./configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml" -o Global.pretrained_model="./output/rec_ppocr_v3/latest.pdparams" Global.save_inference_dir="./inference_model/rec/"

保存在 inference model is saved to ./inference_model/rec/inference

其中det和rec即是保存的推理模型

2.用predict_system.py进行验证

打开anaconda终端输入以下指令：

python tools/infer/predict_system.py  --det_model_dir="./inference_model/det/" --rec_model_dir="./inference_model/rec"  --image_dir="./data/test_images/3.jpeg"