09-30-2008, 12:58 PM | #16 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Dumb question time: What good is the .xml if I already have the .prc and .opf with .html from the Mobipocket Creator Import? I've never used .xml files (my Word is 'stuck' at Word 2003). Which program converts these and/or reads them for further processing? Would you say they are better for storage, portability or something else? I'm not in the know here. Any info would be appreciated! Thanks! |
|
09-30-2008, 01:43 PM | #17 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
You can add the import and export of XML files to your Word 2003 (Actually all the way back to Word 2000) via a free download from Microsoft. They are pushing their new docx format. See the wiki.
Dale |
Advert | |
|
10-03-2008, 09:20 AM | #18 |
Connoisseur
Posts: 75
Karma: 14
Join Date: Jun 2008
Location: Australia
Device: iPad Pro 12"; Kindle Paperwhite
|
For the iLiad users...
I found a nice easy way of outputting PDF files so that they're readable on the iLiad. You can create your own PDF format styles for the PDF printer. So I created one for a paper size and margins that fir my iLiad, with a font that I liked to read. Then all you have to do is print whatever format you have to the PDF printer using that style and it will create a file that works on your iLiad. |
05-04-2009, 12:59 PM | #19 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
Complex PDF to HTML
I wrote a python script which converts the output of pdf2xml to html and attempts to maintain formatting of complex pdf's. I then use calibre to generate the ebook format (mobi in my case). It seems to work pretty well. You can read more about it on my blog at http://talkings.org/2009/05/03/complex-pdf-html/.
|
05-04-2009, 06:38 PM | #20 |
creator of calibre
Posts: 44,001
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cool, it was always in the back of my mind to write a script to implement column detection and a few other goodies form the output of pdf2xml, but I never found the time/motivation.
I'll be willing to integrate this into calibre (after the 0.6 release), so open a ticket and attch your script. Integration will depend on how easy it is to compile pdf2xml on various platforms. |
Advert | |
|
05-04-2009, 10:35 PM | #21 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
That sounds good. What time frame are you looking at? I still need to do some work on it to automate detection of more aspects of the content.
|
05-04-2009, 10:54 PM | #22 |
creator of calibre
Posts: 44,001
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
0.6 will take another couple of moths, so there's no rush
|
05-06-2009, 01:21 PM | #23 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
I am pretty happy with the progress I've made in the last couple of days. It seems to be working with almost anything I throw at it. I am adding a lot of options to customize how it handles the formatting. I'll post again when I have a new version up. I wish I had a better name than cxpdfhtml.py...
|
05-06-2009, 01:28 PM | #24 |
creator of calibre
Posts: 44,001
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
the name i used for my abortive attempt was pdfreflow.py
|
05-08-2009, 03:39 AM | #25 |
Zealot
Posts: 140
Karma: 50288
Join Date: Feb 2009
Device: KK 3G, iPad
|
My interest is just getting better reflowable paragraphs on fiction. I tried cxpdfhtml.py on a novel and was surprised at how well the "break on short lines" approach worked, although I haven't read in depth to find the not-short-enough lines.
I was wondering if you are considering (or anyone else has implemented) detection of paragraphs based on indentation? |
05-10-2009, 12:01 AM | #26 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
Actually it does use indentation to detect paragraphs. Basically if a line is indented and the next line is not, it is considered the beginning of a paragraph block. A short line break is detected if no other type of block/code is detected and the line is indented and doesn't quite go to the end of the line (10 pixels).
Although that could easily be made a configuration option as well. Thanks for the feedback. I hope it proves useful. |
05-10-2009, 08:43 PM | #27 | |
Resident Curmudgeon
Posts: 74,576
Karma: 129670952
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
05-10-2009, 09:06 PM | #28 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
It is cxpdfhtml. See my earlier post and my blog for details and download links: http://talkings.org/2009/05/07/cxpdfhtml/
|
05-11-2009, 10:48 AM | #29 |
Resident Curmudgeon
Posts: 74,576
Karma: 129670952
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Thank you. I'll give it a go later when I have a chance and find a PDF I want to convert.
|
09-10-2009, 06:03 AM | #30 |
Austrian Economist
Posts: 20
Karma: 16
Join Date: Jun 2009
Device: X51v
|
Hi all,
Like you guys, I have a lot of purchased PDF files, none of which are DRMed (I refuse to purchase any store that DRMs anything). All I can say is that it is virtually impossible to convert everything successfully. Like 2 previous recommendations, I like Nuance PDF Converter Pro. Nuance PDF Converter 6.0 Pro just came out and the converter is the most accurate so far, but still chokes with some books. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert djvu to PDF, DOC, or HTML? | enarchay | Other formats | 8 | 09-21-2011 09:22 AM |
LRFTools. Convert LRF to EPUB, HTML, PDF and RTF | elinares | LRF | 279 | 07-30-2011 11:48 PM |
Qindle - Qt for Kindle (Now with PDF, DJVU, EPUB and CHM support) | meem | Kindle Developer's Corner | 14 | 07-21-2011 04:49 PM |
Qindle .. Qt port with PDF, DJVU, EPUB and CHM support | meem | Kindle Developer's Corner | 17 | 10-03-2010 06:19 AM |
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 | jackdeth191 | Calibre | 9 | 05-02-2009 02:55 AM |