|
|
Thread Tools | Search this Thread |
06-11-2021, 10:18 AM | #1 |
Zealot
Posts: 122
Karma: 10
Join Date: Oct 2017
Device: iPhone
|
How do you get rid of all images in an ePub file downloaded from Archive.org?
When I download ePub version of a book on Archive.org, I’m seeing not pure text but text mixed with images of the book pages. Is there a way to get just pure text version? Or is there a way to delete all images in an ePub file on Sigil?
|
06-11-2021, 10:38 AM | #2 |
A Hairy Wizard
Posts: 3,119
Karma: 18727091
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Normally Sigil and/or Calibre questions would be asked in their respective forum.
However, to delete images simply highlight the image(s) on the left side of the screen (Bookbrowser in Sigil) and hit the delete key. You will probably also want to delete the code which references the image from your html file(s). That can be done with a regex: search: <img.*?/> replace: nothing/blank |
06-11-2021, 11:28 AM | #3 | |
the rook, bossing Never.
Posts: 11,588
Karma: 87456643
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
You can also do what is suggested in Calibre Editor as well as Sigil. |
|
06-11-2021, 02:27 PM | #4 |
Grand Sorcerer
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I don't think I'd bother, myself. If you delete all those images of text, you'll probably be missing some content. My recommendation would be to delete the epub in question and find an alternative version.
|
06-11-2021, 02:54 PM | #5 |
Resident Curmudgeon
Posts: 74,576
Karma: 129670952
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I agree that it's best to buy the eBook if a retail version exists and if not, go with the pBook version or forget it and read something else.
|
06-11-2021, 03:56 PM | #6 |
the rook, bossing Never.
Posts: 11,588
Karma: 87456643
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Or do your own OCR if it's really really important PD content not available as cheap ebook.
|
06-14-2021, 04:40 AM | #7 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
This allows you to see all the images in the EPUB + little preview thumbnails (so you could tell if it's useless or an actual important image). You could then Right-Click each image and "Delete From Book". |
|
06-14-2021, 04:48 PM | #8 | |
A Hairy Wizard
Posts: 3,119
Karma: 18727091
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
You can also multi-select using ctrl+click or shift+click, then the del key, to delete all of them at once. |
|
06-14-2021, 06:00 PM | #9 |
Bibliophagist
Posts: 36,607
Karma: 146496996
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
I've only gotten books from archive.org a couple of times. In both cases, what was displayed was the scanned image with the text layer hidden. I suspected that this was an artifact from making the scan to PDF searchable since the text files were fine lessons in how not to do OCR.
|
06-14-2021, 09:19 PM | #10 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
"Archive.org ePub" All Archive.org's text formats are auto-generated OCR from the PDFs, no cleanup, no nothing. In Post #11, I even uploaded an EPUB straight out of Finereader 12... and you can see how much cleaner (and more readable) it is compared to the auto-generated junk. This is why I always recommend: PDF from Archive.org, then convert to text on your own if needed. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Archive.org ePub | Ghitulescu | ePub | 12 | 06-01-2021 02:55 AM |
archive.org downloads | abrogard | Calibre | 2 | 08-11-2018 06:08 PM |
Archive.org | crutledge | General Discussions | 129 | 08-28-2015 06:22 AM |
How do I get rid of the thumbs.db file i my epub | wannabee | Workshop | 7 | 12-04-2011 09:16 PM |
Accessing/ re-saving downloaded epub file from within epubreader | cklammer | EPUBReader | 3 | 12-06-2009 04:59 AM |