02-29-2024, 11:00 AM | #1 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Space character before Soft Hyphen missed in epub>docx conversion
Hello.
I ran into a probably quite rare conversion problem: A Soft Hyphen, when inserted before a word (ie between a space character and the first character of a word), can cause the space character not to be converted (epub > docx). Naturally, I am aware that such placement is very strange for Soft Hyphen. I don't know how such Soft Hyphens could get into epub. In any case, I have seen several documents exported in this way, and it is very difficult to fix them without access to the source epub file… sample epub+docx attached Calibre version: 6.26 |
02-29-2024, 11:09 AM | #2 |
Resident Curmudgeon
Posts: 74,455
Karma: 129358310
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The soft hyphens should be removed as they don't work well in ePub. Most software won't display them properly and searching will not work.
|
02-29-2024, 11:26 AM | #3 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
OK, I'm also not a fan of using Soft Hyphens in epub. But that doesn't change the fact that Calibre's conversion behavior appears to be flawed. And that this error is very problematic, especially if you have no control over the epub and only have to work with the conversion output.
|
02-29-2024, 11:32 AM | #4 |
the rook, bossing Never.
Posts: 11,428
Karma: 87454321
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Soft hyphens are for websites. Ebooks should leave it to the renderer to hyphenate.
There should never be a space on any side of a soft hyphen as the only reason for them is a hint as to where to break in a word. Unexpected never-to-be encountered formatting can break conversions. Someone has used a crazy tool or formatting of the epub. Fix the epub by deleting all soft hyphens before conversion to docx using the Calibre editor, which should be easy. Auto-hyphens also need to be off on WP source for creating an ebook and on for PDF, because the Wordprocessor has no knowledge of page width of an ebook, but does have the page size set for a PDF. Last edited by Quoth; 02-29-2024 at 11:36 AM. |
02-29-2024, 11:48 AM | #5 |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Quoth: Thank you, I agree, but once again: Source epub file is not always available. In such case you have to work with conversion output. And this is quite difficult without space characters.
|
02-29-2024, 12:12 PM | #6 |
Bibliophagist
Posts: 36,242
Karma: 145735536
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Since AFAIR, a soft hyphen is an invisible format character indicating a possible hyphenation location, placing a soft hyphen between a space character and a word is pretty much a garbage in, garbage out situation.
Since you appear to have the epub to be able to convert it, why not simply remove the soft hyphens and reconvert? To me, this would be the more sensible approach and allows those who use soft hyphens properly to continue using them. If you are only given the converted output, time to punt the garbage back to the originator and tell them to fix it. |
02-29-2024, 12:12 PM | #7 |
Well trained by Cats
Posts: 29,904
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
The EPUB is the source or you could not convert (DRM infested).
Try using the built in 'Polish' tool (you may need to add it to a toolbar):remove soft hyphens. If that does not get them all, You will need to use the editor (and some REGEX) |
02-29-2024, 12:41 PM | #8 | |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Quote:
Of course, problem can _easily_ be solved in epub. But conversion should _never_ remove spaces, and source epub file is not _always_ available. Sorry, but suggestions to solve it by editing epub are not very useful/relevant. Last edited by quinta@ebf.cz; 02-29-2024 at 12:44 PM. |
|
02-29-2024, 12:50 PM | #9 | |
the rook, bossing Never.
Posts: 11,428
Karma: 87454321
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
The source is broken. There should never ever be a space with a soft hyphen. A soft hypen should only ever be inside a word, and shouldn't be in an epub anyway. No regex needed. Simply replace every soft hyphen with nothing. It's not like non-breaking spaces, which are needed in ebooks, like between a number and a street or a number and a type of unit etc. |
|
02-29-2024, 01:03 PM | #10 | |
Bibliophagist
Posts: 36,242
Karma: 145735536
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Out of a perhaps morbid curiosity, was this a commercially available, public domain or freely distributable epub? If so, you might want to complain to the source. |
|
02-29-2024, 01:08 PM | #11 |
Resident Curmudgeon
Posts: 74,455
Karma: 129358310
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The only place I know for sure that soft hyphens work and do not break searching is on a Kindle with KF8 format eBooks.
|
02-29-2024, 01:12 PM | #12 | |
Resident Curmudgeon
Posts: 74,455
Karma: 129358310
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
I know two ways soft hyphens can get into an ePub. One is with calibre's polish and the other is with the Hyphenate This! plugin. But both do not put soft hyphens outside of words as in your example. If you really do not have the source ePub, remove all of the soft hyphens and do a spell check. |
|
02-29-2024, 01:23 PM | #13 | |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Ok. I know how to remove SH from epub. This is quite easy, no need of more explanations about that.
EDIT: Key informations for such sort of task: - SH can be searched/replaced by regular expression \xad (or \u00ad) (because SH is character U+0173, etc.) - be awared: soft hyphen character itself is not visible in Calibre editor - removing soft hyphens action is also part of Calibre "Polish ebook" tool Quote:
BTW, interesting fact: Seems not every space from "space+SH" combos is dropped. Don't know what does it mean. Just interesting. : ) Last edited by quinta@ebf.cz; 03-01-2024 at 04:45 AM. |
|
02-29-2024, 01:39 PM | #14 | |
Connoisseur
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
|
Quote:
Even though it would be much better not to have to use it. So I actually hope the conversion will be fixed, sooner or later. |
|
02-29-2024, 03:01 PM | #15 | |
the rook, bossing Never.
Posts: 11,428
Karma: 87454321
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
Your input file has a serious mistake. Fix broken input. All computer programs are famous for Garbage In gives Garbage Out. I'm sure there are other stupid things that should never ever be in an epub that will break conversion and Amazon's conversions break more easily than Calibre. But Amazon produces perfect mobo, azw3 and KFX from epub uploads to KDP that are corrent. This is a broken epub. It's a format error I've never seen in ten years. Last edited by Quoth; 02-29-2024 at 03:05 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Feature Request: Make soft hyphen unicode character visible | Morea | Editor | 14 | 07-25-2023 10:20 AM |
soft hyphens in docx conversion output | quinta@ebf.cz | Conversion | 3 | 09-07-2021 10:04 AM |
docx to epub; one-character pages missing | quinta@ebf.cz | Conversion | 3 | 07-31-2020 03:32 PM |
Soft Hyphen | lhuxley | Editor | 3 | 03-23-2015 08:02 PM |
Soft hyphen | Kumabjorn | Writers' Corner | 32 | 07-13-2014 12:00 AM |