Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
91 views
in Technique[技术] by (71.8m points)

python - How to get text of image using Tesseract


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I have a two-step solution


    1. Apply Dilation followed by Erosion (Closing)
    1. Apply thresholding.

Now why do we apply dilation followed by erosion?

As we can see the input image is consisting of artifacts around each character. Applying Closing operation will reduce the artifacts.

enter image description here

The artifacts are reduced but not completely gone. Therefore if we apply adaptive-threshold, result will be:

enter image description here

Now the image is suitable for reading:

AOF CIF

Code:


import cv2
from pytesseract import image_to_string

img = cv2.imread("7UGLJ.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
thr = cv2.adaptiveThreshold(cls, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY, 41, 10)
txt = image_to_string(thr)
print(txt)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.9k users

...