I’ve got Tika working with Tesseract on PDF files, but it seems that if I give it a PDF file that has both searchable text and images, the text is OCRed twice. Is there a way to avoid this? Even if it has to make two passes, one for the straight text and then another for just the images
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…