tesseract可以指定识别字符个数么

答案:1 悬赏:70 手机版

解决时间 2021-01-27 18:46

提问者网友：骨子里的高雅
2021-01-26 18:07

最佳答案

五星知识达人网友：你可爱的野爹
2021-01-26 19:31

可以通过配置Tesseract来使用Tesseract进行OCR，opencv和opencv的C#版本Emgu都集成了Tesseract这个工具。
但是在使用时经常会出现误判，比如把“s”识别成“5”，把“1”识别成“l”或“i”。可以设置相应的参数来识别指定范围的字符。
下面是Emgu中关于这个函数的API文档：
Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)
public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)
Member of Emgu.CV.OCR.Tesseract
Summary:
Create an tesseract OCR engine.
Parameters:
dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.
language: The language is (usually) an ISO 639-3 string or NULL will default to eng. It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier. The language may be a string of the form [~]%lt;lang>[+[~]]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.

我要举报

如以上回答内容为低俗、色情、不良、暴力、侵权、涉及违法等信息，可以点下面链接进行举报！

点此我要举报以上问答信息