Wrong title extracted for readallcomics.com URLs with numeric slugs (e.g. /030-2026/) #7

Open
opened 2026-03-12 17:58:56 +00:00 by bryan · 0 comments
Owner

Bug Report

URL: https://readallcomics.com/030-2026/

Expected behavior: The comic is downloaded with its actual title (the real series name from the page).

Actual behavior: The downloaded .cbz archive is named 030 instead of the real comic title.

Root Cause

extractTitleFromMarkup in comic/comic.go parses the HTML <title> tag using a regex that expects Title (YYYY...). For this URL, the page title renders as something like 030 (2026), so the regex extracts 030.

The fallback titleFromSlug also produces 030: the slug 030-2026 has its trailing -2026 stripped, leaving just 030.

Neither path recovers the real series title because the page's <title> tag doesn't include it, and the slug itself is purely numeric.

Steps to Reproduce

go run main.go https://readallcomics.com/030-2026/

Observe the output file is named 030.cbz.

Possible Fix

When the extracted title is purely numeric (or matches the bare slug), fall back to a more descriptive page element — e.g. the <h1> or og:title meta tag — which may contain the full series name.

## Bug Report **URL:** https://readallcomics.com/030-2026/ **Expected behavior:** The comic is downloaded with its actual title (the real series name from the page). **Actual behavior:** The downloaded `.cbz` archive is named `030` instead of the real comic title. ## Root Cause `extractTitleFromMarkup` in `comic/comic.go` parses the HTML `<title>` tag using a regex that expects `Title (YYYY...)`. For this URL, the page title renders as something like `030 (2026)`, so the regex extracts `030`. The fallback `titleFromSlug` also produces `030`: the slug `030-2026` has its trailing `-2026` stripped, leaving just `030`. Neither path recovers the real series title because the page's `<title>` tag doesn't include it, and the slug itself is purely numeric. ## Steps to Reproduce ```bash go run main.go https://readallcomics.com/030-2026/ ``` Observe the output file is named `030.cbz`. ## Possible Fix When the extracted title is purely numeric (or matches the bare slug), fall back to a more descriptive page element — e.g. the `<h1>` or `og:title` meta tag — which may contain the full series name.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bryan/yoink-go#7