xkcd #743: Infrastructures

Jakylla@sh.itjust.works · edit-2 1 month ago

xkcd #743: Infrastructures

greenskye@lemm.ee · 1 year ago

I really, really hate that so many people still try to share ebooks as PDFs. Why that was ever a thing makes no sense to me. Yes, I absolutely wish to read a 500 page novel on portrait letter size pages with tiny font that completely ignores my screen size.

oatscoop@midwest.social · 1 year ago

I’ve given up on trying to find certain books in sane formats. Thankfully Calibre is really good at converting PDFs to actual ebook formats.

There’s a bit of a learning curve, and sometimes I have to do a little semi-automated cleanup – but it works.

greenskye@lemm.ee · 1 year ago

Really? I must have had a particularly troublesome PDF. It was almost like running it through OCR, generating hundreds of weird typos and formatting errors when I tried to convert with calibre.

oatscoop@midwest.social · 1 year ago

The OCR struggles with some PDFs for whatever reasons: font, formatting, etc.

There are 3rd party PDF OCR websites/programs that work better. If I’m having issues I run it through one of those first.

greenskye@lemm.ee · 1 year ago

Any suggestions? Even the good ones had error rates that might not matter for a couple of pages, but when scaled to a 500 page book, even a 1% error rate results in an annoying level of typos.

oatscoop@midwest.social · 1 year ago

I use gImageReader + Tesseract, but that probably doesn’t meet your criteria. Unfortunately OCR is very rarely perfect unless the input is perfectly clear and with a “OCR friendly” font/formatting. There are “AI powered” OCR out there, but I can’t speak to how well they work and I don’t know of any free ones.