10-12-2023, 08:36 PM | #1 |
Connoisseur
Posts: 93
Karma: 2136220
Join Date: May 2019
Device: Kindle
|
The Spectator - only title and synopsis
The last two "The Spectator" only fetch title and synopsis, since October. The article body content is missing.
|
10-14-2023, 11:11 AM | #2 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
you can use the attached recipe, it will load all articles but is still a temporary solution (Might fail due to too many requests).
time for someone to figure out and add login code to the recipe. |
10-15-2023, 01:43 AM | #3 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
@kovidgoyal how can I make use of wayback machine? is it nytimes exclusive?
|
10-15-2023, 02:49 AM | #4 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yes i would need to add support for spectator to it. what is the url scheme for spectator? if it has a decent url scheme I might be able to do it.
|
10-15-2023, 03:13 AM | #5 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
https://web.archive.org/web/20231013...support-hamas/
looks like wayback machine doesn't have access to these articles. https://archive.today/ works but has different url and captcha checks. Can we do something for archive.today? https://archive.ph/K6f5r |
10-15-2023, 03:39 AM | #6 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
the archive.org entries are paywalled as well, so no point there. As for archive.today no idea never used it.
|
10-15-2023, 03:58 AM | #7 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I took a brief look at archive.is changing the recipe to use it should be as simple as replacing the article urls with urls of the form
https://archive.is/latest/original_url I dont know what their rate limiting and captcha policies are that will require experimentation. |
10-15-2023, 07:05 AM | #8 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
although it loads content in browser.. theres no response in calibre for these urls
Code:
Traceback (most recent call last): File "mechanize\_urllib2_fork.py", line 1238, in do_open File "http\client.py", line 1374, in getresponse File "http\client.py", line 318, in begin File "http\client.py", line 287, in _read_status http.client.RemoteDisconnected: Remote end closed connection without response if we can get response.. we can also fix WSJ recipe. |
10-15-2023, 10:34 AM | #9 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Does it work if you use the read_url() function from calibre.scraper.simple
|
10-15-2023, 01:41 PM | #10 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
Code:
from calibre.scraper.simple import read_url from calibre.ptempfile import PersistentTemporaryFile ... storage = [] articles_are_obfuscated = True def get_obfuscated_article(self, url): raw = read_url(self.storage, 'https://archive.is/latest/' + url) pt = PersistentTemporaryFile('.html') pt.write(raw.encode('utf-8')) pt.close() return pt.name it works, but is there a simpler way? |
10-15-2023, 10:40 PM | #11 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Easier in what sense?
|
10-15-2023, 11:01 PM | #12 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
idk, is this the right method though? I am noob here.
can we do it without writing into a temp file through get_obfuscated? |
10-16-2023, 01:58 AM | #13 |
creator of calibre
Posts: 43,994
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The cost of creating a temp file is insignificant compared to actually downloading so it doesnt matter, but I added some code to allow avoiding the temp file: https://github.com/kovidgoyal/calibr...6689de07213fbe
|
10-16-2023, 12:13 PM | #14 |
Evangelist
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
|
will be able to use this in the next update i guess. Thanks.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Spectator failed | darrenma | Recipes | 8 | 11-17-2022 07:17 PM |
Spectator Magazine has no content | mkgtu | Recipes | 9 | 10-01-2022 01:17 PM |
Recipe fails - The Spectator UK | nano5 | Recipes | 4 | 08-02-2022 06:20 AM |
Business Spectator | soctec | Recipes | 0 | 09-27-2012 03:29 AM |
Recipe for UK Spectator? | 7db | Recipes | 1 | 03-23-2011 05:52 AM |