Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2021, 05:47 AM   #1
rachalmers
Junior Member
rachalmers began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2021
Device: Kindle
How do I find the Chapter and or Page break in a PDF

Real newbie to Calibre here.,..
I'm trying to convert a pdf to MOBI. Mostly it works fine. but no matter what I do I can't get it to insert a page break between Chapters or anywhere else that I've got a chapter or even a page break.
In Debug output, the Input shows a <hr/> and displays a line in the webpage output, but after that, it's ignored.
What am I doing wrong?

There is nothing special in the document at all. No fancy formatting nothing. Just can't get it to insert page breaks at chapters. Which I believe it can do?
So where do I start please?
rachalmers is offline   Reply With Quote
Old 02-11-2021, 07:39 AM   #2
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,540
Karma: 84500001
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
See Read this before Posting PDF Questions, especially the section "My PDF has a table of contents or links/bookmarks, but they weren't used during conversion".
jhowell is offline   Reply With Quote
Advert
Old 01-25-2024, 07:57 PM   #3
Shohreh
Zealot
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
Hello,

Indeed and sadly, adding bookmarks to the source PDF doesn't help Calibre splitting chapters right when converting to EPUB.

Is there no way to help Calibre with this task, without having to mess with the HTML in the EPUB output?


Last edited by Shohreh; 01-25-2024 at 08:42 PM.
Shohreh is offline   Reply With Quote
Old 01-25-2024, 08:07 PM   #4
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 36,242
Karma: 145735536
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Pretty much no. See the numerous comments about PDF being the worst format to convert from.
DNSB is offline   Reply With Quote
Old 01-25-2024, 08:13 PM   #5
Shohreh
Zealot
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
I know, but it's odd that Calibre can't just use the PDF's bookmarks to know where new chapters start.

At this point, the conversion went pretty well after I removed the headers in the source PDF with a bit of Python.

I just need to figure out how to have Calibre start a new page with each new chapter.

--
Edit: If Calibre really is unable to split chapters by relying on the PDF's bookmarks, what about splitting the PDF into multiple files (one chapter = one PDF; For this, check qpdf, cpdf, mutool, etc.), have Calibre convert them into EPUB files, and then merge them into a single EPUB?
--
Edit: Is there an option to prevent ebook-convert.exe from adding a "Document Outline" at the end of the EPUB?

Last edited by Shohreh; 01-25-2024 at 10:10 PM.
Shohreh is offline   Reply With Quote
Advert
Old 01-26-2024, 10:39 AM   #6
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,427
Karma: 87454321
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
If you knew how PDFs work you'd not think it odd.
Only actual paper, vellum and stone tablets are a worse conversion source than PDF, and some scanned PDFs are so bad that photographing the paper with a phone is better.

It's a waste otr time, because what you get working for one PDF (if you ever get it to 'work') may not work on the next PDF. My TCL NxtPaper 11 arrived today which is my latest solution to reading PDFs. Cheaper than Scribe, Elipsa, Boox etc.

Last edited by Quoth; 01-26-2024 at 10:41 AM.
Quoth is offline   Reply With Quote
Old 01-26-2024, 12:15 PM   #7
Shohreh
Zealot
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
Out of curiosity, why can't Calibre use a PDF's bookmarks to know where each chapter starts instead of guessing while reading the XHTML generated by pdftohtml?
Shohreh is offline   Reply With Quote
Old 01-26-2024, 01:00 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,963
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Let's count the reasons:

1) there is no guarantee bookmarks entries are chapter starts
2) PDF consists of a bunch of font glyphs placed at absolute co-ordinates on the page. A bookmark or any link really is also just another co-ordinate on a page. There is no way to map that to some semantic element reliably. One has to use heuristics.
kovidgoyal is offline   Reply With Quote
Old 01-26-2024, 01:47 PM   #9
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,427
Karma: 87454321
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
3) Other mad stuff too.
Quoth is offline   Reply With Quote
Old 01-27-2024, 08:25 PM   #10
Shohreh
Zealot
Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.Shohreh can program the VCR without an owner's manual.
 
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
Thanks for the infos.

Suggestion: If it's the user who added the bookmark into the PDF, it's reliable, and Calibre could make it an option to use that info to find chapters.

Anyway, an alternative, simpler solution than adding+removing bookmarks is to… 1) open the PDF in a reader eg. SumatraPDF, 2) make a list of pages/slices that make up chapters, 3) use it to split the source PDF into sub-PDFs (eg. cpdf; one chapter = one PDF) 4) run Calibre to convert them into EPUBs, and 5) finally join them into a single EPUB.

It does nothing to help converting PDFs into clean EPUBs, but at least, Calibre will know where each chapter starts and ends.

Last edited by Shohreh; 01-27-2024 at 08:58 PM.
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Chapter/Page Break Conversion in Calibre Draft Works Conversion 0 09-17-2019 11:03 AM
Page Break vs Chapter Split Trane Sigil 35 12-16-2016 02:14 PM
How to add a page break before each chapter barryem Editor 13 10-03-2016 10:11 PM
Manual page break as chapter ardeegee ePub 4 04-08-2011 11:35 PM
How to avoid page break after heading/chapter tkirke ePub 6 01-22-2010 02:12 PM


All times are GMT -4. The time now is 09:37 AM.


MobileRead.com is a privately owned, operated and funded community.