Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2024, 12:19 PM   #1
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
Special characters and conversion

Dear friends,

I have a problem with converting a book (from docx to epub)

The book in question contains some words in the ancient Indian language Pali, which have a few special characters that are difficult to get properly displayed. Some examples of the characters in question are these:

When I view the epub in my e-reader (Kobo Clara 2E) each of these characters are simpy ignored and words are "merged together". For example: the word tomato would turn into toato. Strangely enough, when I open the file on the Calibre ebook reader, it works fine. It is only when I use the Kobo that it malfunctions.

I have some other books on the Kobo, where these characters display very fine, so I know that it is not beyond the capabilities of the device in itself.

Would be very grateful for any advice on this!

Last edited by MGA; 05-09-2024 at 12:22 PM.
MGA is offline   Reply With Quote
Old 05-09-2024, 12:31 PM   #2
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
In relation to this statement from the original post:
Quote:
For example: the word tomato would turn into toato.
But I also spotted that this letter can turn into ñ instead of disappearing, but not consistently so.
MGA is offline   Reply With Quote
Old 05-09-2024, 01:06 PM   #3
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 36,169
Karma: 145735366
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Sounds as if you are trying to display glyphs that are not supported by the current font used on your Kobo ereader.
DNSB is offline   Reply With Quote
Old 05-09-2024, 01:12 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,953
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You need to embed a font in the book that has those characters. Either embed it in the word document itself and conversion should preserve it or embed after conversion using the editor and apply the right font styles, or use the conversion option to embed all referenced fonts, provided you have the font on your system somewhere and the docx file actually references the correct font by name but just doesnt embed it.
kovidgoyal is online now   Reply With Quote
Old 05-09-2024, 02:15 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,900
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Those seem to be Language relevant.
Conversions from earlier TXT files sometimes needed you to set the CHARSET at the beginning of the Add. What Language was the DOCX using?

I was guessing ligatures (ffi,ll...) but I don't know of one that includes an M, so that rules out that reason for 'missing'
theducks is offline   Reply With Quote
Old 05-10-2024, 03:09 AM   #6
Julie Paradise
Member
Julie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheese
 
Julie Paradise's Avatar
 
Posts: 12
Karma: 1000
Join Date: May 2024
Location: Berlin
Device: KindlePW Palma Poke5 NovaAir2 NA3C TUC Max2 TabX A6X2
This is a two-step problem: First you need to make sure that the correct Unicode character is used in the source/text itself (so not DIY combination of letter-x+point-below), and then you need to make sure that the system/platform/device that displays this text and its diacritical special glyphs also has a font that contains this Unicode glyph.
Julie Paradise is online now   Reply With Quote
Old 05-10-2024, 09:41 AM   #7
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
Quote:
Originally Posted by Julie Paradise View Post
This is a two-step problem: First you need to make sure that the correct Unicode character is used in the source/text itself (so not DIY combination of letter-x+point-below), and then you need to make sure that the system/platform/device that displays this text and its diacritical special glyphs also has a font that contains this Unicode glyph.
Thank you so much for taking the time and making the effort. As I understand it, the first part of the problem, regarding whether the correct characters are used in the original input file: Yes, they are, and they are displayed very well there, using the ordinary Arial font.

And as for the second side of the problem: What would then happen if I specifically downloaded a font and added it to my own device, and then share the file in question with someone else? Would they end up with the same error, or is it somehow possible to "attach" a font with the file?

Sorry if my questions are on a very elementary level.
MGA is offline   Reply With Quote
Old 05-10-2024, 09:53 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,900
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by MGA View Post
Thank you so much for taking the time and making the effort. As I understand it, the first part of the problem, regarding whether the correct characters are used in the original input file: Yes, they are, and they are displayed very well there, using the ordinary Arial font.

And as for the second side of the problem: What would then happen if I specifically downloaded a font and added it to my own device, and then share the file in question with someone else? Would they end up with the same error, or is it somehow possible to "attach" a font with the file?

Sorry if my questions are on a very elementary level.
The proper way to avoid the issue is to EMBED the font into the book.
Depending on the font face, get out your wallet if you want to distribute the book. Font license can get expensive. Most fonts are copyrighted. and it geta worse. The license might needed for up to 4 of them (normal, bold, italic bold-italic)
theducks is offline   Reply With Quote
Old 05-10-2024, 10:00 AM   #9
Julie Paradise
Member
Julie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheeseJulie Paradise can extract oil from cheese
 
Julie Paradise's Avatar
 
Posts: 12
Karma: 1000
Join Date: May 2024
Location: Berlin
Device: KindlePW Palma Poke5 NovaAir2 NA3C TUC Max2 TabX A6X2
Quote:
Originally Posted by theducks View Post
The proper way to avoid the issue is to EMBED the font into the book.
Depending on the font face, get out your wallet if you want to distribute the book. Font license can get expensive. Most fonts are copyrighted. and it geta worse. The license might needed for up to 4 of them (normal, bold, italic bold-italic)
Yes, either embed the font (or at least glyphs used) or share it as PDF, depending on the circumstances.

Regarding the licence: Yes, licensing is another thing to consider. Thus the safest way would be to not try and use a fancy font with a licence, but just take one of the few very known fonts that basically have anything and everything covered of the Unicode chart. That leaves you with (admittedly not so pretty) fonts like Times New Roman, Arial, Noto, etc.
Julie Paradise is online now   Reply With Quote
Old 05-10-2024, 10:21 AM   #10
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
Quote:
Originally Posted by theducks View Post
Those seem to be Language relevant.
Conversions from earlier TXT files sometimes needed you to set the CHARSET at the beginning of the Add. What Language was the DOCX using?

I was guessing ligatures (ffi,ll...) but I don't know of one that includes an M, so that rules out that reason for 'missing'

Thanks for taking the time and making the effort!

I am quite a novice in this, but I managed to access the "document.xml", and there I found this information:

encoding="UTF-8"

It seems to me that this might be very relevant...?
MGA is offline   Reply With Quote
Old 05-10-2024, 10:28 AM   #11
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
Quote:
Originally Posted by theducks View Post
The proper way to avoid the issue is to EMBED the font into the book. (...)
Excellent, I will try to examine that option.
MGA is offline   Reply With Quote
Old 05-10-2024, 10:34 AM   #12
MGA
Member
MGA began at the beginning.
 
Posts: 22
Karma: 10
Join Date: May 2024
Device: Kobo, Clara
Quote:
Originally Posted by Julie Paradise View Post
Yes, either embed the font (or at least glyphs used)
Exccellent. I hope this is not too complicated. But will examine it. And from what I read previously, it would perhaps be best to do this in the DOCX, before converting it to EPUB?


Quote:
Regarding the licence: Yes, licensing is another thing to consider. Thus the safest way would be to not try and use a fancy font with a licence, but just take one of the few very known fonts that basically have anything and everything covered of the Unicode chart. That leaves you with (admittedly not so pretty) fonts like Times New Roman, Arial, Noto, etc.
That is completely fine, thanks for explaining.
MGA is offline   Reply With Quote
Old 05-10-2024, 11:54 AM   #13
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,389
Karma: 87013929
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
No need to embed in the docx, which makes opening times terrible.
As long as the same fonts are on the PC running Calibre (always true if the same computer), it will find and embed the fonts if asked.

Obviously if for publication you need either free fonts or a licence that covers ebook distribution (which might be incompatible with Kindle Publishing if the fonts have to be obfuscated or encrypted).
It doesn't matter for personal use.
Quoth is offline   Reply With Quote
Old 05-10-2024, 02:53 PM   #14
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 36,169
Karma: 145735366
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
It would be easier to embed the font in your ePub after the conversion. You can use the calibre ebook-editor to do this. As @Quoth mentioned, if this is not for your personal use, you would have to check the licensing on the font.
DNSB is offline   Reply With Quote
Old 05-10-2024, 04:44 PM   #15
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,900
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Polish can do it also
theducks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Enhancement request: special characters, invisible characters and html entities. PenguinCEO Editor 15 04-08-2020 05:26 PM
Special CHaracters Rellwood Calibre 10 05-01-2019 10:43 PM
Special characters in conversion derekn552 Conversion 4 02-05-2014 07:31 PM
HTML to Epub conversion dosn`t work because special characters eLit Conversion 2 08-29-2011 02:01 AM
PDF to WORD/HTML conversion, "special characters and marks" errors chengyibo PDF 3 11-06-2010 12:43 AM


All times are GMT -4. The time now is 09:29 AM.


MobileRead.com is a privately owned, operated and funded community.