Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2023, 05:48 PM   #1
Tillomar
Bookworm
Tillomar began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
Calibre RegExp search: unexpected results

Hi there...

Sorry to jump in with this rather esoteric question on my first post to this forum -- but everything less complicated has had solutions already, so there was never a need to register... [My thanks for that!]

Caveat: I'm using a localized German version of Calibre 6.13, so my English links/names/... to certain calibre functions and dialogues may not be fully accurate.

I'm trying to subselect my library with a regexp search in order to then work the remaining books with a "metadata/search&replace" operation.

Context:
When importing books, often the title contains information about the series and the series number. It would be convenient to separate these attributes using the regexp available at "add books/read metadata". However, there are so many different formattings of those attributes that I was unable to come up with a regexp that catches at least most of them. As this dialogue has no way to use more than one reqexp, I have to do that myself.
Additionally, I want to shorten series information like "A ... series book 15" to "... 15" while letting "A ... book 23" stand at "A ... 15", because in the former case the "A" is not part of the series title.
Obviously, after extracting the series name, I will also extract the series number, and then remove the series name from the title...

One of my regexp to search for a specific class of titles is this:

Code:
title:"~\((?:(An?|The)\s+)(?P<sname>[^\)]*?)(?:[,-:]?\s*)(?:(Small\s+Town|Trilogy|Series|Roman(ce|tic)|Cozy|Crime|Thrillers?|Suspense|Myster(y|ies))([\s:,]*))*Series\s*(?:(No.?|Number|Volume|Book|(Book\s*)\#)\s*)\#?(?P<sno>\d+([.,]\d+)?)\)"
As you can see, the expressen tries to match a text sourrounded by round brackets. In this case, I search series information in the form
Code:
(The ... Series Book 1)
(A ... Series Book 2)
(An ... Series Book 3)
which I will later shorten to
Code:
... #
From my current library, the result from this search is 1209 books, and there are a lot of names which should not be matched. Some examples of name classes which should not be matched:

Code:
Once Upon A Death (Days Of Death Series Book 1)
BloodGifted: The Dantonville Legacy Series Book 1 (A Paranormal Romance)
Poor Boy Road: A Gritty Hard-Hitting Thriller Series Book # 1 (JAKE CALDWELL)
Alexa O'Brien Huntress Series Book 1-4 Box Set
The Trouble with Bree: The Spotlight Series Book 1.5
#1 does not have "A", "An" or "The" after the opening bracket.
#2 + #3 do not have a series number in front of the closing bracket.
#4 + #5 have no brackets at all.

When I test my expression against the names found by calibre, those names (name classes) are correctly not matched.

Can anyone help me to understand what's goin wrong here?

Tnx,
Tillomar
Tillomar is offline   Reply With Quote
Old 03-03-2023, 07:54 PM   #2
Tillomar
Bookworm
Tillomar began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
simplified...

I simplified my regexp expression to gain insight into what is causing the problem (one of serveral); so far, this one is the simplest expression that does also find titles w/o even any round bracket:
Code:
title:"~\([^()]*?\s+\d+\)"
(of course, the result is somewhat different from my original query...)

Currently, I feel that this must be some bug in calibre, but I would hate to bother the bug list if the problem sits in front of the keyboard.

Anyone?
Tillomar is offline   Reply With Quote
Advert
Old 03-03-2023, 08:17 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,996
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This will almost certainly be escaping issues. IIRC you need double backslashes. Start with something simple like

title:"~\\("

check that this finds titles with a ( in them.
kovidgoyal is offline   Reply With Quote
Old 03-04-2023, 05:57 AM   #4
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,774
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Tillomar View Post
I simplified my regexp expression to gain insight into what is causing the problem (one of serveral); so far, this one is the simplest expression that does also find titles w/o even any round bracket:
Code:
title:"~\([^()]*?\s+\d+\)"
(of course, the result is somewhat different from my original query...)

Currently, I feel that this must be some bug in calibre, but I would hate to bother the bug list if the problem sits in front of the keyboard.

Anyone?
Quote:
Originally Posted by kovidgoyal View Post
This will almost certainly be escaping issues. IIRC you need double backslashes. Start with something simple like

title:"~\\("

check that this finds titles with a ( in them.
Or try super-quotes. From the calibre manual:
Quote:
It is sometimes hard to get all the escapes right so the result is what you want, especially in regular expression and template searches. In these cases use the super-quote: """sequence of characters""". Super-quoted characters are used unchanged: no escape processing is done.
chaley is offline   Reply With Quote
Old 03-07-2023, 11:11 PM   #5
Tillomar
Bookworm
Tillomar began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2023
Location: Germany
Device: Kindle Keyboard + Paperwhite 1 + Paperwhite 3
Many thanks: double (or super-) escaping did it!
Tillomar is offline   Reply With Quote
Advert
Reply

Tags
calibre, rexexp, search


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search results in latest Calibre (6.2.1) ggtdm Calibre 6 08-09-2022 01:00 PM
Search results only show first result. How can I view successive results? lovedj1 Calibre 2 05-07-2021 07:53 AM
Template: Unexpected results with days_between ownedbycats Library Management 1 03-28-2021 08:42 PM
Forma Search-in-book results sometimes ends on page17, even if there's more (hidden)results droopy Kobo Reader 9 06-30-2020 11:05 AM
Unutterably Silly Unexpected results of the pumpkin pie kennyc Lounge 7 11-24-2010 12:14 PM


All times are GMT -4. The time now is 11:11 PM.


MobileRead.com is a privately owned, operated and funded community.