Please add https://wizards.com sites for MTG stories #1300

Darthagnon · 2024-05-05T01:07:09Z

I'm starting work on an archival project, to convert Magic: the Gathering web fiction to EPUB (here and on my HDD), as it is slowly disappearing from the website with slapdash updates. Web2Epub is the best tool for the job, and I have been using it successfully with the default parser using the following:

New MTGStory:
URL structure: https://magic.wizards.com/en/news/magic-story/hero-iroas-2014-03-05
Include: #article-body
Title: #article-body > div > article > header > h1
Exclude: #article-body > div > aside, #article-body > div > article > div.css-AerwF

Old MTGStory:
URL structure: https://magic.wizards.com/en/articles/archive/magic-story/zendikars-last-stand-2016-02-17
Include: #main-content > article
#content-detail-page-of-an-article (to exclude author)
Title: #main-content > h1 
Exclude: #content > aside

Really old Magic Uncharted Realms:
URL structure: http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/ur/263
Unselect all except for "skip to content", as that is your article
Include: #content > div.center-content
Title: #ctl00_ctl00_ContentPlaceHolder1_mainContent_Article_headerPanel > div.description > h4
Exclude: #topNav, #leftColumn, #footerWrap, #ctl00_ctl00_ContentPlaceHolder1_mainContent_Article_footerPanel, #ctl00_ctl00_ContentPlaceHolder1_MagicTopNavigation_topNavigation, #ctl00_ctl00_ContentPlaceHolder1_mainContent_Article_socialbar, #ctl00_ctl00_ContentPlaceHolder1_mainContent_Article_HeadingLinks

mtglore.com:
Include: #content > div
Title: #content > div > div

The story is spread across 4 different URL/article website structures, half of which are only on the Internet Archive. Different chapters can exist under different structures, and the TOCs (if they exists) are not comprehensive.

My workflow currently involves getting the Archive.org links for as many chapters on one website structure as possible, as mix-and-match of Includes and Excludes doesn't seem to work very well (? or maybe I should just use more commas), testing, then editing the chapter list and pasting in the links to what I actually want to download, e.g.

<a href="https://web.archive.org/web/20230208082604/https://magic.wizards.com/en/news/making-magic/planeswalkers-guide-theros-part-1-2013-08-21">01 Planeswalker's Guide to Theros, Part 1: The Plane of Theros</a>
<a href="https://web.archive.org/web/20230330201809/https://magic.wizards.com/en/news/making-magic/planeswalkers-guide-theros-part-2-2013-08-28">02 Planeswalker's Guide to Theros, Part 2: The Poleis</a>
<!--<a href="https://web.archive.org/web/20140302084755/http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/ur/263">03 The Lost Confession</a>-->
<a href="https://web.archive.org/web/20230620224550/https://magic.wizards.com/en/news/making-magic/planeswalkers-guide-theros-part-3-2013-09-04">04 Planeswalker's Guide to Theros, Part 3: Nonhuman Creatures</a>
<a href="https://web.archive.org/web/20230128080100/https://magic.wizards.com/en/news/feature/prince-anax-part-1-2013-09-18">05 Prince Anax, Part 1</a>
<a href="https://magic.wizards.com/en/news/feature/prince-anax-part-2-2013-09-23">06 Prince Anax, Part 2</a>
<a href="https://web.archive.org/web/20231208032539/https://magic.wizards.com/en/news/feature/nymphs-theros-2013-10-02">07 Nymphs of Theros</a>
<a href="https://web.archive.org/web/20230923144610/https://magic.wizards.com/en/news/feature/consequences-attraction-2013-10-09">08 The Consequences of Attraction</a>
<a href="https://web.archive.org/web/20230929173716/https://magic.wizards.com/en/news/feature/tragedy-2013-10-23">09 Tragedy</a>
<a href="https://magic.wizards.com/en/news/making-magic/unanswered-questions-theros-2013-11-04">10 Unanswered Questions: Theros</a>
<a href="https://magic.wizards.com/en/news/feature/i-iroan-2013-11-04">11 I Iroan</a>
<a href="https://web.archive.org/web/20230203095421/https://magic.wizards.com/en/news/feature/sea-gods-labyrinth-part-1-2013-11-13">12 The Sea God's Labyrinth, Part 1</a>
<a href="https://web.archive.org/web/20230208082828/https://magic.wizards.com/en/news/feature/sea-gods-labyrinth-part-2-2013-11-20">13 The Sea God's Labyrinth, Part 2</a>
<a href="https://magic.wizards.com/en/news/feature/building-toward-dream-part-1-2013-11-27">14 Building Toward a Dream, Part 1</a>
<a href="https://web.archive.org/web/20230127023202/https://magic.wizards.com/en/news/feature/building-toward-dream-part-2-2013-12-04">15 Building Toward a Dream, Part 2</a>
<a href="https://web.archive.org/web/20230202181952/https://magic.wizards.com/en/news/feature/asphodel-2013-12-11">16 Asphodel</a>
<a href="https://web.archive.org/web/20230924005808/https://magic.wizards.com/en/news/feature/planeswalkers-guide-born-gods-2014-01-08">17 Planeswalker's Guide to Born of the Gods</a>
<a href="https://magic.wizards.com/en/news/magic-story/cowardice-hero-2014-01-22">18 Cowardice of the Hero</a>
<a href="https://magic.wizards.com/en/news/magic-story/emonberry-red-2014-02-05">19 Emonberry Red</a>
<a href="https://magic.wizards.com/en/news/magic-story/kioras-followers-2014-02-14">20 Kiora's Followers</a>
<a href="https://magic.wizards.com/en/news/magic-story/dance-flitterstep-2014-02-19">21 Dance of the Flitterstep</a>
<a href="https://magic.wizards.com/en/news/magic-story/walls-akros-2014-02-26">22 The Walls of Akros</a>
<a href="https://magic.wizards.com/en/news/magic-story/hero-iroas-2014-03-05">23 The Hero of Iroas</a>
<a href="https://magic.wizards.com/en/news/magic-story/oracle-ephara-2014-03-19">24 The Oracle of Ephara</a>
<a href="https://web.archive.org/web/20211205115729/https://magic.wizards.com/en/articles/archive/uncharted-realms/seasons-setessa-2014-03-26)">25 Seasons in Setessa</a>
<a href="https://web.archive.org/web/20230922141754/https://magic.wizards.com/en/news/feature/planeswalkers-guide-journey-nyx-2014-04-02">26 Planeswalker's Guide to Journey into Nyx</a>
<!--<a href="https://magic.wizards.com/en/news/magic-story/ajani-mentor-heroes-2014-12-17">27 Ajani, Mentor of Heroes</a>-->
<a href="https://web.archive.org/web/20211023194705/https://magic.wizards.com/en/articles/archive/uncharted-realms/labyrinth-labors-2014-04-16">28 The Labyrinth of Labors</a>
<a href="https://web.archive.org/web/20231204094100/https://magic.wizards.com/en/news/feature/desperate-stand-2014-04-16-0">29 Desperate Stand</a>
<!--<a href="https://magic.wizards.com/en/news/magic-story/dreams-city-2014-04-23">30 Dreams of the City</a>-->
<a href="https://web.archive.org/web/20230226010657/https://magic.wizards.com/en/news/magic-story/thank-gods-2014-04-30">31 Thank the Gods</a>
<a href="https://web.archive.org/web/20230926130458/https://magic.wizards.com/en/news/making-magic/journeys-end-2014-05-26">32 Journey's End</a>
<a href="https://web.archive.org/web/20230130113433/https://magic.wizards.com/en/news/magic-story/kruphixs-insight-2014-06-11">33 Kruphix's Insight</a>
<a href="https://web.archive.org/web/20221204012503/https://magic.wizards.com/en/news/feature/ajanis-vengeance-2014-07-23">34 Ajani's Vengeance</a>
<a href="https://web.archive.org/web/20230205144542/https://magic.wizards.com/en/news/magic-story/drop-drop-2015-05-20">35 Drop for Drop</a>
<a href="https://web.archive.org/web/20230131203628/https://magic.wizards.com/en/news/magic-story/its-time-talk-commander-2016-edition-2016-10-26">36 It's Time to Talk Commander (2016 Edition)!</a>

I am starting work on the parser, but was wondering if there was a way for it to target different sites, and to ignore the TOC, and only request a manual list? My workflow would be improved if there was just a box for URLs and it could extract the titles from that, rather than having to write an HTML chapter list with a href="">Title here</a> - is there a way to force this with a new parser?

The text was updated successfully, but these errors were encountered:

dteviot · 2024-05-07T07:54:19Z

@Darthagnon

I'm not quite sure what you're asking for.
Is it something that walks https://magic.wizards.com/en/news/archive, treating each article as a chapter to collect?

Darthagnon · 2024-05-08T13:32:45Z

Apologies, my explanation was rather confusing.

"Edit chapter URLs" is the view I use to collect chapter lists to convert to EPUB, because

the Wizards website is broken/useless/missing chapters, so there is no auto-parser that could work (EDIT: without too much work). An auto-parser would need to process https://magic.wizards.com/en/news/archive (2024), https://web.archive.org/web/2023mmddetc/https://magic.wizards.com/en/articles/archive (unreliable infinite scroller) and https://web.archive.org/web/2020mmddetc/http://www.wizards.com/Magic/Magazine/Archive.aspx (paginated, mostly 404s), https://web.archive.org/web/2014mmddetc/http://www.wizards.com/Magic/Magazine/Article.aspx
a lot of chapters are not story-related, so less useful for EPUB.

Questions

Parser: Is it possible for a single parser to target multiple domains (e.g. archive.org, multiple versions of wizards.com) with separate filters on a per-chapter basis? Or are there any example site templates that are similar to this use case?
Extension: Would it be possible to (optionally) open "Edit chapter URLs" by default, rather than auto-parsing all chapters? For the default parser and manual listing on weird random websites, auto-parsing doesn't seem to work very well, gathering lots of irrelevant navbar links. It would help my workflow if I could paste story links directly, rather than fiddling with the UI to eliminate irrelevant chapters.
Extension: Edit chapter URLs requires specifying the title to use per chapter, as well as html tags, eg. a href="">Title here</a> - could it be changed to just take a list of URLs? e.g. instead of

<a href="https://magic.wizards.com/en/news/magic-story/cowardice-hero-2014-01-22">18 Cowardice of the Hero</a>
<a href="https://magic.wizards.com/en/news/magic-story/emonberry-red-2014-02-05">19 Emonberry Red</a>
<a href="https://magic.wizards.com/en/news/magic-story/kioras-followers-2014-02-14">20 Kiora's Followers</a>

we could have

https://magic.wizards.com/en/news/magic-story/cowardice-hero-2014-01-22
https://magic.wizards.com/en/news/magic-story/emonberry-red-2014-02-05
https://magic.wizards.com/en/news/magic-story/kioras-followers-2014-02-14

... and the titles read according to the filter template to editable fields in the chapter list:

Many thanks for any advice or help!

Darthagnon · 2024-09-13T20:54:51Z

I have started implementation here: https://github.com/Darthagnon/web2epub-tidy-script/blob/master/MagicWizardsParser.js

dteviot · 2024-09-13T21:54:31Z

@Darthagnon

Parser: Is it possible for a single parser to target multiple domains (e.g. archive.org, multiple versions of wizards.com) with separate filters on a per-chapter basis? Or are there any example site templates that are similar to this use case?

yes. https://github.com/dteviot/WebToEpub/blob/ExperimentalTabMode/plugin/js/parsers/NoblemtlParser.js Although it's not obvious how it works. An example is

    findCoverImageUrl(dom) {
        return util.getFirstImgSrc(dom, ".thumbook, .sertothumb");
    }

This looks for an image using two CSS selectors. ".thumbook" and ".sertothumb" and picks the first it finds. As the two sites have a different layout, only one will succeed.

An alternate way to handle multiple sites, is the "dom" parameter holds the URL of the page in dom.baseURI. You could extract the hostname from the URL and then switch the logic based on that.

That said, WebToEpub is supposed to check the URL for each page, and then select the appropriate parser even if the Table of Contents is a mixture of sites. So, you might not need a combined parser. Just write one for each site.

gamebeaker · 2024-09-26T11:23:58Z

fixed in #1500
@Darthagnon
Updated version (1.0.0.0) has been submitted to Firefox and Chrome stores.
Firefox/ Chrome version is available now.

Darthagnon changed the title ~~Please add site https://wizards.com~~ Please add https://wizards.com sites for MTG stories May 5, 2024

Darthagnon mentioned this issue Aug 31, 2024

URL parser improvements #1452

Closed

gamebeaker closed this as completed Sep 26, 2024

gamebeaker added the Status: Completed label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please add https://wizards.com sites for MTG stories #1300

Please add https://wizards.com sites for MTG stories #1300

Darthagnon commented May 5, 2024

dteviot commented May 7, 2024

Darthagnon commented May 8, 2024 •

edited

Loading

Darthagnon commented Sep 13, 2024

dteviot commented Sep 13, 2024

gamebeaker commented Sep 26, 2024

Please add https://wizards.com sites for MTG stories #1300

Please add https://wizards.com sites for MTG stories #1300

Comments

Darthagnon commented May 5, 2024

dteviot commented May 7, 2024

Darthagnon commented May 8, 2024 • edited Loading

Questions

Darthagnon commented Sep 13, 2024

dteviot commented Sep 13, 2024

gamebeaker commented Sep 26, 2024

Darthagnon commented May 8, 2024 •

edited

Loading