Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 12-28-2013, 08:09 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Question regex - issue with spaces?

i wanted to tweak the layout of a short story (a library book loan- unlocked just for read-once layout tweaking)

It was using a lot constructs like this for spacing.

<p class"something"> </p>

(where the the gap seemed to be a simple space)

but I could not get the regex engine to find any of those lines no matter how I tweaked the options
It could find the 1st part ok, the bit before the "space" i.e. it could find
<p class"something">
but that was all. & useless as every line in the story starts like that!
maybe it was some weird "space" character, but when I opened with sigil instead , pasted in the offending line & hit replace all , it zapped them , no problems.

so is it a bug or is it a character issue ?
as it is an unconverted retail epub loan, I would not expect any weird characters to be in it


NB I did what i usually do in such cases, I coped & pasted the offending line into the regex find box, so any strange character should have copied over.

I think it more likely to be a bug in book editor regex
cybmole is offline   Reply With Quote
Old 12-28-2013, 09:47 AM   #2
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If you load the section to a hex editor does it show any unusual characters? Kovid may have trapped out undisplayable but present characters.
mrmikel is offline   Reply With Quote
Advert
Old 12-28-2013, 10:49 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i could extract & post a single xhtml file, if that is allowed. but if something works in sigil yet not in his editor, that seems strange.

if i open in notepad++ & select encode with ansi I see this- so it seems there is a non-displayable character - which sigil can handle but calibre editor cannot:
<p class="body">Â*</p>

I will risk attaching a zipped single chapter extract, purely for testing /debugging but will remove it if told to
Attached Files
File Type: zip blood-and-roses-1.zip (5.4 KB, 356 views)
cybmole is offline   Reply With Quote
Old 12-28-2013, 11:36 AM   #4
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Using a regex search using the dot char to represent the odd char (and maybe add 1 or 2 character repeat).

May find other odd unwanted matches, check first.

Code:
<p class"something">.{1,2}</p>
Perkin is offline   Reply With Quote
Old 12-28-2013, 11:41 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,900
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Perkin View Post
Using a regex search using the dot char to represent the odd char (and maybe add 1 or 2 character repeat).

May find other odd unwanted matches, check first.

Code:
<p class="something">.{1,2}</p>
the above code is invalid, which will cause a search to fail if that was what was used
theducks is offline   Reply With Quote
Advert
Old 12-28-2013, 12:08 PM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
It might also be an & nbsp; that is rendered as a normal space. In my installation I cannot see if it is a normal space or a non-breakable one.
Toxaris is offline   Reply With Quote
Old 12-28-2013, 12:11 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,900
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Toxaris View Post
It might also be an & nbsp; that is rendered as a normal space. In my installation I cannot see if it is a normal space or a non-breakable one.
wouldn't \s have found that single whitespace?
theducks is offline   Reply With Quote
Old 12-28-2013, 12:25 PM   #8
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by theducks View Post
the above code is invalid, which will cause a search to fail if that was what was used
Good spot, I just copied from OP's post.
Perkin is offline   Reply With Quote
Old 12-28-2013, 12:40 PM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i think the issue here guys is not how to write a clever regex solution, though it is always fun to watch great minds at work on a technical challenge

The issue for me is this: if calibre editor is to become the new improved sigil, then it needs to match or better sigil regex's ease of use.

In sigil I just pasted the unwanted line from xhtml code view into the "find" box, set replace to blank - click on replace all - job done. There was no need to faff around guessing at invisible characters, or puzzling out why it didnt work - it just worked 1st time.

Calibre is using the same regex engine, yes ? , so why does the same quick & easy method fail, when it works in sigil ?
cybmole is offline   Reply With Quote
Old 12-28-2013, 01:12 PM   #10
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
The character is A0, a non breaking space, but that is NOT how non-breaking spaces are encoded in HTML, they should be as numeric or entity.

Kovid might want to scan for these and replace them.

Last edited by mrmikel; 12-28-2013 at 01:16 PM.
mrmikel is offline   Reply With Quote
Old 12-28-2013, 01:31 PM   #11
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
well it's by a "New York Times #1 best selling author" ( but isn't everyone these days - I've still yet to find any of the runner up authors )
but it could be self published which could explain the non standard coding.

I cant see any credits for any big name publisher on the copyright page - there are credits for cover design & interior design but that is all.

anyway - what a silly place to put a non-breaking space character ( as you cant " break" a line which has only one character in it ! )
cybmole is offline   Reply With Quote
Old 12-28-2013, 03:17 PM   #12
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Using a <p> </p> paragraph enclosing a single non-breaking space character to create vertical whitespace or 'pseudo blank lines' is extremely common practice in my experience. I prefer using css top margins myself but hoping for the former to go away is pointless.

But this is beside the point because I don't understand in what way you think the calibre editor is not handling them properly. Importing the html in cybmole's attached zip I see the unicode version of the nbsp (the \xa0 char) syntax-highlighted in yellow rather than the default background colour of the editor which is used to display the normal space char. See attached. Some other unicode special chars (e.g. mdash, ndash but not hellip or smart quotes) are also syntax highlighted. Admittedly, the easy visibility of these unicode chars depends which editor theme you have selected. The default theme (pyte-light) highlights them in yellow. Unfortunately, currently if you're using one of the dark themes, the syntax-highlighting of these chars is difficult to see, so I'm sticking with the default pyte-light until Kovid has time to implement full customisation of themes.

Moving onto Find/Replace, if I select one of these 'empty paragraphs' and use it in the Find box, I have no problem at all finding or replacing with Mode set to either Normal or Regex. Alternatively, if I want to search only for the unicode nbsp char then I type \xa0 in the Find box and search in Regex Mode.
Attached Thumbnails
Click image for larger version

Name:	nbsp.jpg
Views:	443
Size:	32.6 KB
ID:	117292  
jackie_w is offline   Reply With Quote
Old 12-28-2013, 04:05 PM   #13
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I would love to see a difference between a normal space and a non-breaking space in the code. That would make things a lot clearer...
Toxaris is offline   Reply With Quote
Old 12-28-2013, 05:55 PM   #14
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by Toxaris View Post
I would love to see a difference between a normal space and a non-breaking space in the code. That would make things a lot clearer...
Isn't that exactly what you already see in the pic I attached above?
jackie_w is offline   Reply With Quote
Old 12-28-2013, 07:50 PM   #15
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
That is, the non-breaking spaces are highlighted making a thin yellow line. But if don't know that, it is not so obvious.

Once one is aware of it, you can regex for \xA0 and replace all the instances.

This discussion has been useful as I have encountered other hidden characters in Sigil and once they can be identified, they can gotten rid of the same way.

Last edited by mrmikel; 12-28-2013 at 07:53 PM.
mrmikel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Nook Simple Touch Glowlight - technological issue or quality-control issue? Dr. Drib Barnes & Noble NOOK 1 12-04-2012 01:32 PM
Regex Find and Replace - Spaces essayhead Sigil 2 08-10-2012 07:41 PM
Hyperlinks Issue in NLT Bible (ebook or software issue)? myet01 Kobo Reader 5 07-29-2011 08:47 AM
RegEx: Removing Page Numbers that have Spaces captainslow Conversion 2 02-27-2011 04:14 PM
regex Issue when Importing river Calibre 3 06-16-2009 11:03 AM


All times are GMT -4. The time now is 12:25 PM.


MobileRead.com is a privately owned, operated and funded community.