20 Commits

Author SHA1 Message Date
c3c8eedf72 build: add --no-cache to docker-build to prevent stale layer reuse 2026-03-12 10:47:18 -04:00
aab0ad796d feat(batcave): add batcave.biz support, delete UI, and FlareSolverr bypass — fixes #6 2026-03-12 10:32:48 -04:00
0925d5ca63 docs: update README for batcave.biz support, delete feature, and FlareSolverr 2026-03-12 10:27:01 -04:00
89a5013fb2 fix(web): add comic delete UI and fix container Cloudflare bypass for #6
- Add delete button (SVG X, hover-reveal) and confirmation modal to comic cards
- Add DELETE /api/comics/delete endpoint with path traversal protection
- Fix container downloads: delegate Cloudflare-blocked requests to FlareSolverr
  (headless Chrome sidecar) instead of retrying with Go HTTP client, whose Linux
  TCP fingerprint is flagged by Cloudflare even with network_mode: host
- Add FlareSolverr service to docker-compose; inject FLARESOLVERR_URL env var
- Add diagnostic logging to BatcaveBizMarkup request flow
- Trim URL whitespace before storing in download job
- Guard Archive() against empty filelist; fix runJob error-check ordering
2026-03-12 09:41:03 -04:00
d2c715e973 feat: add batcave.biz support, closes #6
## What changed

- `BatcaveBizMarkup` now accepts a `clientChan chan *http.Client` and
  sends the authenticated cookie jar client back to the caller after
  completing the Cloudflare challenge flow. All error paths send nil so
  the caller never blocks.

- `Comic` struct gains a `Client *http.Client` field. `NewComic` wires
  up the channel, receives the client, and stores it so downstream code
  can reuse the same authenticated session.

- `downloadFile` branches on `c.Client`: when set it builds the request
  manually and only attaches a `Referer: https://batcave.biz/` header
  when the image URL is actually on batcave.biz. Some issues host images
  on third-party CDNs (e.g. readcomicsonline.ru) that actively block
  requests with a batcave Referer, returning 403 — omitting the header
  fixes those.

- `ParseBatcaveBizTitle` extracts the chapter title from the
  `__DATA__.chapters` JSON array by matching the chapter ID in the URL's
  last path segment. The HTML `<title>` on batcave.biz is prefixed with
  "Read " and suffixed with "comics online for free", making it
  unsuitable as a filename. Using the chapter data gives clean titles
  like "Nightwing (1996) 153". "Issue #" and bare "#" are stripped since
  the hash character causes problems on some filesystems and tools.

- `ParseBatcaveBizImageLinks` now unescapes `\/` → `/` in extracted
  URLs. The `__DATA__` JSON often contains forward-slash-escaped URLs
  that would otherwise be stored verbatim.

- `archive.go`: `filepath.Walk` was called on `filepath.Dir(sourcePath)`
  (the library root) instead of `sourcePath` (the comic's own folder).
  This caused any leftover image files from previous downloads in sibling
  directories to be included in every new CBZ. Fixed by walking
  `sourcePath` directly.

- `BatcaveBizMarkup` client now has a 30s `Timeout`. Without it, a
  single stalled CDN connection would hang the worker goroutine
  indefinitely, causing `Download()` to block forever waiting for a
  result that never arrives.

- Fixed `for e := range err` in `cli/root.go` — ranging over `[]error`
  with one variable yields the index, not the error value.
2026-03-11 20:55:03 -04:00
9cb26f27ec build: keep latest git tag in sync with each versioned release 2026-03-11 18:35:18 -04:00
855e97f72f chore: bump version to 1.2.1 2026-03-11 18:31:40 -04:00
ca891fc6c0 build: skip --note flag in gitea-release when NOTES is empty 2026-03-11 18:30:29 -04:00
9ec1301317 Merge pull request 'fix: extract title from h1 or URL slug when page title starts with #' (#5) from feat/title-h1-fallback into main
Reviewed-on: #5
2026-03-11 22:16:25 +00:00
dcb41deea9 fix: extract title from h1 or URL slug when page title starts with #
When readallcomics.com pages have a <title> containing only the issue
number (e.g. '#018 (2026)'), fall back to the h1 element first, then
derive the title from the URL slug by stripping the trailing year and
title-casing the hyphen-separated segments.

Closes #4
2026-03-11 18:13:14 -04:00
a7c3b632a5 docs: add local packaging screenshot to README 2026-03-09 22:34:35 -04:00
d53af6b84f build: add release pipeline targets to Makefile
Add tag, gitea-release, and release targets to encode the corrected
versioning process (no v-prefix). VERSION is now overridable via the
command line for use in make release VERSION=x.y.z.
2026-03-09 22:22:38 -04:00
9cd4af9bb6 Merge pull request 'feat(web): local image packaging — drag-and-drop or folder picker to CBZ' (#2) from feature/upload into main
Some checks failed
Release / Build and Release (push) Has been cancelled
2026-03-10 01:49:55 +00:00
96f9301b32 feat(web): add local image packaging — drag-and-drop or folder picker to CBZ 2026-03-09 21:41:40 -04:00
16b7545757 docs: add UI screenshot to README; simplify port binding in compose 2026-03-09 14:58:43 -04:00
412438fa22 feat(web): SVG download overlay, toast icons, initials placeholder, empty state hint, footer 2026-03-09 10:47:52 -04:00
551a5b2b2a fix(web): replace literal en-dashes with HTML entities in sort buttons 2026-03-09 09:04:06 -04:00
1a567a19fe feat(web): add pagination and fix port binding for Tailscale access
- Paginate comic grid at 48 per page with smart page number controls
- Bind container port to 0.0.0.0 so Tailscale traffic can reach WSL2
2026-03-09 08:53:26 -04:00
9d1ca16704 feat(web): improve UI responsiveness, polish, and update docs
- Add mobile/tablet responsive breakpoints to web UI
- Redesign cards as full-bleed poster layout with gradient overlay
- Add skeleton loading state, comic count badge, and search icon
- Switch to Docker image format for registry compatibility
- Add docker-build and docker-push Makefile targets with versioned tags
- Update README to document web UI, Docker deployment, and serve command
2026-03-08 23:06:50 -04:00
25eee6f76a feat(web): add dockerized web UI with comic library browser
Adds a `yoink serve` command that starts an HTTP server with a
Sonarr/MeTube-inspired dark UI. Features a URL input bar for
triggering downloads, a 150x300 cover grid with filter and sort
controls, a live download queue strip, and toast notifications.

Includes Dockerfile (multi-stage, distroless runtime) and
docker-compose.yml for easy deployment.
2026-03-08 22:02:38 -04:00
22 changed files with 3392 additions and 32 deletions

5
.dockerignore Normal file
View File

@@ -0,0 +1,5 @@
.git
.github
*.md
library/
*_test.go

11
.gitignore vendored
View File

@@ -20,3 +20,14 @@ go.work.sum
# env file
.env
# Built binary
yoink
yoink.exe
# Comic library (downloaded content)
library/
# IDE
.vscode/
.idea/

37
Dockerfile Normal file
View File

@@ -0,0 +1,37 @@
# ── Build stage ────────────────────────────────────────────────────────────
FROM mcr.microsoft.com/oss/go/microsoft/golang:1.22-bullseye AS builder
WORKDIR /app
# Restore modules in a separate layer so it's cached until go.mod/go.sum change
COPY go.mod go.sum ./
RUN go mod download && go mod verify
# Copy source and build a fully static binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-s -w" -trimpath -o yoink .
# ── Runtime stage ──────────────────────────────────────────────────────────
# distroless/base-debian12:nonroot — minimal attack surface, non-root by default
FROM gcr.io/distroless/base-debian12:nonroot
LABEL org.opencontainers.image.title="yoink" \
org.opencontainers.image.description="Comic downloader web UI" \
org.opencontainers.image.source="https://git.brizzle.dev/bryan/yoink-go"
WORKDIR /app
COPY --from=builder --chown=nonroot:nonroot /app/yoink .
ENV YOINK_LIBRARY=/library
VOLUME ["/library"]
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
CMD ["/app/yoink", "healthcheck"]
USER nonroot
CMD ["/app/yoink", "serve"]

View File

@@ -1,7 +1,10 @@
BIN := yoink
BUILD_DIR := build
REGISTRY := git.brizzle.dev/bryan/yoink-go
VERSION ?= $(shell git describe --tags --always --dirty)
NOTES ?= ""
.PHONY: all windows linux darwin clean
.PHONY: all windows linux darwin clean docker-build docker-push tag gitea-release release
all: windows linux darwin
@@ -16,5 +19,40 @@ darwin:
GOOS=darwin GOARCH=amd64 go build -o $(BUILD_DIR)/$(BIN)-darwin-amd64
GOOS=darwin GOARCH=arm64 go build -o $(BUILD_DIR)/$(BIN)-darwin-arm64
docker-build:
podman build --no-cache --format docker \
-t $(REGISTRY):$(VERSION) \
-t $(REGISTRY):latest \
.
docker-push: docker-build
podman push $(REGISTRY):$(VERSION)
podman push $(REGISTRY):latest
tag:
@if [ -z "$(VERSION)" ]; then echo "Usage: make tag VERSION=1.2.0"; exit 1; fi
git tag $(VERSION)
git tag -f latest
git push origin $(VERSION)
git push origin -f latest
gitea-release:
tea release create \
--tag $(VERSION) \
--title "$(VERSION)" \
$(if $(NOTES),--note $(NOTES),) \
--asset $(BUILD_DIR)/$(BIN)-windows-amd64.exe \
--asset $(BUILD_DIR)/$(BIN)-linux-amd64 \
--asset $(BUILD_DIR)/$(BIN)-linux-arm64 \
--asset $(BUILD_DIR)/$(BIN)-darwin-amd64 \
--asset $(BUILD_DIR)/$(BIN)-darwin-arm64
release:
@if [ -z "$(VERSION)" ]; then echo "Usage: make release VERSION=1.3.0 NOTES='...'"; exit 1; fi
$(MAKE) tag VERSION=$(VERSION)
$(MAKE) clean all
$(MAKE) gitea-release VERSION=$(VERSION) NOTES=$(NOTES)
$(MAKE) docker-push VERSION=$(VERSION)
clean:
rm -rf $(BUILD_DIR)

116
README.md
View File

@@ -1,6 +1,6 @@
# yoink
A CLI tool for downloading comics from readallcomics.com and packaging them as `.cbz` archives.
A tool for downloading comics from [readallcomics.com](https://readallcomics.com) and [batcave.biz](https://batcave.biz), packaging them as `.cbz` archives. Available as a CLI command or a self-hosted web application. The web UI also lets you package local image folders into `.cbz` archives directly from your browser.
## How it works
@@ -9,44 +9,148 @@ A CLI tool for downloading comics from readallcomics.com and packaging them as `
3. Packages the images into a `.cbz` (Comic Book Zip) archive
4. Cleans up downloaded images, keeping only the cover (`001`)
---
## Installation
Build from source (requires Go 1.22.3+):
### From source
Requires Go 1.22.3+:
```shell
go build -o yoink
```
### Pre-built binaries
Pre-built binaries for Linux (arm64) and Windows are available on the [releases page](https://git.brizzle.dev/bryan/yoink-go/releases).
## Usage
### Docker
```shell
docker pull git.brizzle.dev/bryan/yoink-go:latest
```
---
## CLI
Download a single comic issue:
```shell
yoink <url>
```
**Example:**
**Examples:**
```shell
yoink https://readallcomics.com/ultraman-x-avengers-001-2024/
yoink https://batcave.biz/ultraman-x-avengers-1-2025/
```
The comic title is extracted from the page and used to name the archive. Output is saved to:
```
```text
<library>/<Title>/<Title>.cbz
```
---
## Web UI
Yoink includes a self-hosted web interface for browsing and downloading comics from your browser.
![Yoink Web UI](Screenshot_01.png)
### Running directly
```shell
yoink serve
```
By default the server listens on port `8080`. Use the `-p` flag to change it:
```shell
yoink serve -p 3000
```
### Running with Docker
A `docker-compose.yml` is included for quick deployment:
```shell
docker compose up -d
```
Or with Podman:
```shell
podman compose up -d
```
The web UI is then available at `http://localhost:8080`.
### Features
- **Download queue** — paste a comic URL into the input bar and track download progress in real time
- **Local packaging** — drag and drop a folder of images (or use the file picker) to package them as a `.cbz` archive and add it to your library without downloading anything
- **Library grid** — browse your comics as a 150×300 cover grid with title-initial placeholders for missing covers
- **Filter & sort** — filter by title and sort by newest, oldest, AZ, or ZA
- **One-click download** — click any cover to download the `.cbz` archive directly
- **Delete** — remove a comic from your library with the × button on each card (confirmation required)
#### Packaging local images
![Local packaging panel](Screenshot_02.png)
Click the upload icon (↑) in the header to open the packaging panel. Enter a title, then either:
- **Drag and drop** a folder or image files onto the drop zone
- **Select folder** to pick an entire directory at once
- **Select files** to pick individual images
Images are sorted by filename, the first image is used as the cover, and the result is saved to your library as `<Title>/<Title>.cbz`.
### Library volume
Downloaded comics are stored at the path set by `YOINK_LIBRARY`. When using Docker, mount this as a volume to persist your library across container restarts:
```yaml
# docker-compose.yml
services:
flaresolverr:
image: ghcr.io/flaresolverr/flaresolverr:latest
restart: unless-stopped
yoink:
image: git.brizzle.dev/bryan/yoink-go:latest
ports:
- "8080:8080"
volumes:
- ./library:/library
environment:
- YOINK_LIBRARY=/library
- FLARESOLVERR_URL=http://flaresolverr:8191
restart: unless-stopped
depends_on:
- flaresolverr
```
---
## Configuration
| Variable | Default | Description |
|-----------------|--------------|--------------------------------------|
| --- | --- | --- |
| `YOINK_LIBRARY` | `~/.yoink` | Directory where comics are stored |
| `FLARESOLVERR_URL` | *(unset)* | URL of a [FlareSolverr](https://github.com/FlareSolverr/FlareSolverr) instance for Cloudflare-protected sites (e.g. batcave.biz). Required when running in Docker. |
```shell
YOINK_LIBRARY=/mnt/media/comics yoink https://readallcomics.com/some-comic-001/
```
---
## Dependencies
- [goquery](https://github.com/PuerkitoBio/goquery) — HTML parsing

BIN
Screenshot_01.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 560 KiB

BIN
Screenshot_02.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 270 KiB

28
cli/healthcheck.go Normal file
View File

@@ -0,0 +1,28 @@
package cli
import (
"fmt"
"net/http"
"os"
"github.com/spf13/cobra"
)
var healthcheckCmd = &cobra.Command{
Use: "healthcheck",
Short: "Check if the web server is running (used by Docker HEALTHCHECK)",
Args: cobra.NoArgs,
Hidden: true,
Run: func(cmd *cobra.Command, args []string) {
port, _ := cmd.Flags().GetString("port")
resp, err := http.Get(fmt.Sprintf("http://localhost:%s/health", port))
if err != nil || resp.StatusCode != http.StatusOK {
os.Exit(1)
}
},
}
func init() {
healthcheckCmd.Flags().StringP("port", "p", "8080", "Port the server is listening on")
cli.AddCommand(healthcheckCmd)
}

View File

@@ -40,14 +40,14 @@ var cli = &cobra.Command{
fmt.Println(comic.Title)
err := comic.Download(len(comic.Filelist))
for e := range err {
for _, e := range err {
fmt.Println(e)
}
comic.Archive()
comic.Cleanup()
},
Version: "1.1.0",
Version: "1.2.1",
}
func Execute() error {

36
cli/serve.go Normal file
View File

@@ -0,0 +1,36 @@
package cli
import (
"fmt"
"log"
"os"
"path/filepath"
"github.com/spf13/cobra"
"yoink/web"
)
var serveCmd = &cobra.Command{
Use: "serve",
Short: "Start the Yoink web UI",
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
library, ok := os.LookupEnv("YOINK_LIBRARY")
if !ok {
userHome, _ := os.UserHomeDir()
library = filepath.Join(userHome, ".yoink")
}
port, _ := cmd.Flags().GetString("port")
addr := fmt.Sprintf(":%s", port)
if err := web.Listen(addr, library); err != nil {
log.Fatal(err)
}
},
}
func init() {
serveCmd.Flags().StringP("port", "p", "8080", "Port to listen on")
cli.AddCommand(serveCmd)
}

View File

@@ -23,6 +23,9 @@ func (a ArchiveError) Error() string {
// It takes no parameters.
// Returns an error if the operation fails.
func (c *Comic) Archive() error {
if len(c.Filelist) == 0 {
return nil
}
outputPath := filepath.Join(c.LibraryPath, c.Title, c.Title+".cbz")
err := os.MkdirAll(filepath.Dir(outputPath), os.ModePerm)
@@ -45,7 +48,7 @@ func (c *Comic) Archive() error {
sourcePath := filepath.Join(c.LibraryPath, c.Title)
err = filepath.Walk(
filepath.Dir(sourcePath),
sourcePath,
func(path string, info os.FileInfo, err error) error {
if err != nil {
return ArchiveError{

110
comic/archive_test.go Normal file
View File

@@ -0,0 +1,110 @@
package comic
import (
"archive/zip"
"os"
"path/filepath"
"testing"
)
func TestArchiveError(t *testing.T) {
err := ArchiveError{Message: "archive failed", Code: 1}
if err.Error() != "archive failed" {
t.Errorf("Error() = %q, want %q", err.Error(), "archive failed")
}
}
func TestArchive(t *testing.T) {
t.Run("creates cbz with image files", func(t *testing.T) {
tmpDir := t.TempDir()
title := "TestComic"
comicDir := filepath.Join(tmpDir, title)
os.MkdirAll(comicDir, os.ModePerm)
// Create fake image files
for _, name := range []string{"TestComic 001.jpg", "TestComic 002.jpg", "TestComic 003.png"} {
os.WriteFile(filepath.Join(comicDir, name), []byte("fake image"), 0644)
}
c := &Comic{
Title: title,
LibraryPath: tmpDir,
Filelist: []string{"TestComic 001.jpg", "TestComic 002.jpg", "TestComic 003.png"},
}
err := c.Archive()
if err != nil {
t.Fatalf("Archive() unexpected error: %v", err)
}
archivePath := filepath.Join(comicDir, title+".cbz")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Fatalf("expected archive %s to exist", archivePath)
}
// Verify the zip contains the image files
reader, err := zip.OpenReader(archivePath)
if err != nil {
t.Fatalf("failed to open archive: %v", err)
}
defer reader.Close()
if len(reader.File) != 3 {
t.Errorf("archive contains %d files, want 3", len(reader.File))
}
})
t.Run("excludes non-image files from archive", func(t *testing.T) {
tmpDir := t.TempDir()
title := "TestComic"
comicDir := filepath.Join(tmpDir, title)
os.MkdirAll(comicDir, os.ModePerm)
// Create mixed files
os.WriteFile(filepath.Join(comicDir, "page-001.jpg"), []byte("image"), 0644)
os.WriteFile(filepath.Join(comicDir, "readme.txt"), []byte("text"), 0644)
os.WriteFile(filepath.Join(comicDir, "data.json"), []byte("json"), 0644)
c := &Comic{
Title: title,
LibraryPath: tmpDir,
Filelist: []string{"page-001.jpg"},
}
err := c.Archive()
if err != nil {
t.Fatalf("Archive() unexpected error: %v", err)
}
archivePath := filepath.Join(comicDir, title+".cbz")
reader, err := zip.OpenReader(archivePath)
if err != nil {
t.Fatalf("failed to open archive: %v", err)
}
defer reader.Close()
if len(reader.File) != 1 {
t.Errorf("archive contains %d files, want 1 (only .jpg)", len(reader.File))
}
})
t.Run("creates nothing when filelist is empty", func(t *testing.T) {
tmpDir := t.TempDir()
title := "EmptyComic"
c := &Comic{
Title: title,
LibraryPath: tmpDir,
}
err := c.Archive()
if err != nil {
t.Fatalf("Archive() unexpected error: %v", err)
}
archivePath := filepath.Join(tmpDir, title, title+".cbz")
if _, err := os.Stat(archivePath); !os.IsNotExist(err) {
t.Fatalf("expected no archive to be created for empty filelist")
}
})
}

93
comic/cleanup_test.go Normal file
View File

@@ -0,0 +1,93 @@
package comic
import (
"os"
"path/filepath"
"testing"
)
func TestCleanup(t *testing.T) {
t.Run("keeps cover image 001 and removes others", func(t *testing.T) {
tmpDir := t.TempDir()
title := "TestComic"
comicDir := filepath.Join(tmpDir, title)
os.MkdirAll(comicDir, os.ModePerm)
files := map[string]bool{
"TestComic 001.jpg": true, // should be kept
"TestComic 002.jpg": false, // should be removed
"TestComic 003.jpg": false, // should be removed
}
for name := range files {
os.WriteFile(filepath.Join(comicDir, name), []byte("fake"), 0644)
}
c := &Comic{
Title: title,
LibraryPath: tmpDir,
}
err := c.Cleanup()
if err != nil {
t.Fatalf("Cleanup() unexpected error: %v", err)
}
for name, shouldExist := range files {
path := filepath.Join(comicDir, name)
_, err := os.Stat(path)
exists := !os.IsNotExist(err)
if shouldExist && !exists {
t.Errorf("expected %s to be kept, but it was removed", name)
}
if !shouldExist && exists {
t.Errorf("expected %s to be removed, but it still exists", name)
}
}
})
t.Run("keeps non-image files", func(t *testing.T) {
tmpDir := t.TempDir()
title := "TestComic"
comicDir := filepath.Join(tmpDir, title)
os.MkdirAll(comicDir, os.ModePerm)
os.WriteFile(filepath.Join(comicDir, "TestComic.cbz"), []byte("archive"), 0644)
os.WriteFile(filepath.Join(comicDir, "metadata.json"), []byte("data"), 0644)
c := &Comic{
Title: title,
LibraryPath: tmpDir,
}
err := c.Cleanup()
if err != nil {
t.Fatalf("Cleanup() unexpected error: %v", err)
}
for _, name := range []string{"TestComic.cbz", "metadata.json"} {
path := filepath.Join(comicDir, name)
if _, err := os.Stat(path); os.IsNotExist(err) {
t.Errorf("expected non-image file %s to be kept", name)
}
}
})
t.Run("handles empty directory", func(t *testing.T) {
tmpDir := t.TempDir()
title := "EmptyComic"
comicDir := filepath.Join(tmpDir, title)
os.MkdirAll(comicDir, os.ModePerm)
c := &Comic{
Title: title,
LibraryPath: tmpDir,
}
err := c.Cleanup()
if err != nil {
t.Fatalf("Cleanup() unexpected error for empty dir: %v", err)
}
})
}

View File

@@ -1,6 +1,7 @@
package comic
import (
"net/http"
"path/filepath"
"regexp"
"strings"
@@ -18,6 +19,7 @@ type Comic struct {
Next *Comic
Prev *Comic
LibraryPath string
Client *http.Client
}
// extractTitleFromMarkup extracts the title from the comic's markup.
@@ -26,21 +28,52 @@ type Comic struct {
// Returns the extracted title as a string.
func extractTitleFromMarkup(c Comic) string {
yearFormat := `^(.*?)\s+\(\d{4}(?:\s+.+)?\)`
selection := c.Markup.Find("title")
if selection.Length() == 0 {
return "Untitled"
}
content := selection.First().Text()
regex := regexp.MustCompile(yearFormat)
matches := regex.FindStringSubmatch(content)
extractFrom := func(text string) string {
matches := regex.FindStringSubmatch(text)
if len(matches) != 2 {
return "Untitled"
return ""
}
return strings.ReplaceAll(matches[1], ":", "")
}
return strings.ReplaceAll(matches[1], ":", "")
title := extractFrom(c.Markup.Find("title").First().Text())
if strings.HasPrefix(title, "#") {
if h1 := extractFrom(c.Markup.Find("h1").First().Text()); h1 != "" && !strings.HasPrefix(h1, "#") {
return h1
}
if slug := titleFromSlug(c.URL); slug != "" {
return slug
}
}
if title != "" {
return title
}
return "Untitled"
}
// titleFromSlug derives a comic title from the last path segment of a URL.
// It strips a trailing year (-YYYY), replaces hyphens with spaces, and title-cases the result.
func titleFromSlug(url string) string {
slug := strings.TrimRight(url, "/")
if i := strings.LastIndex(slug, "/"); i >= 0 {
slug = slug[i+1:]
}
slug = regexp.MustCompile(`-\d{4}$`).ReplaceAllString(slug, "")
if slug == "" {
return ""
}
words := strings.Split(slug, "-")
for i, w := range words {
if len(w) > 0 {
words[i] = strings.ToUpper(w[:1]) + w[1:]
}
}
return strings.Join(words, " ")
}
// NewComic creates a new Comic instance from the provided URL and library path.
@@ -61,13 +94,25 @@ func NewComic(
LibraryPath: libraryPath,
}
go Markup(c.URL, markupChannel)
if strings.Contains(url, "batcave.biz") {
clientChan := make(chan *http.Client, 1)
go BatcaveBizMarkup(url, markupChannel, clientChan)
markup := <-markupChannel
c.Markup = markup
c.Client = <-clientChan
if t := ParseBatcaveBizTitle(markup, url); t != "" {
c.Title = t
} else {
c.Title = extractTitleFromMarkup(*c)
}
go ParseBatcaveBizImageLinks(markup, imageChannel)
} else {
go Markup(url, markupChannel)
markup := <-markupChannel
c.Markup = markup
c.Title = extractTitleFromMarkup(*c)
go ParseImageLinks(markup, imageChannel)
}
links := <-imageChannel
c.Filelist = links

170
comic/comic_test.go Normal file
View File

@@ -0,0 +1,170 @@
package comic
import (
"strings"
"testing"
"github.com/PuerkitoBio/goquery"
)
func newDocFromHTML(html string) *goquery.Document {
doc, _ := goquery.NewDocumentFromReader(strings.NewReader(html))
return doc
}
func TestExtractTitleFromMarkup(t *testing.T) {
tests := []struct {
name string
html string
url string
expected string
}{
{
name: "standard title with year",
html: `<html><head><title>Ultraman X Avengers 001 (2024)</title></head></html>`,
expected: "Ultraman X Avengers 001",
},
{
name: "title with year and extra text",
html: `<html><head><title>Batman 042 (2023 Digital)</title></head></html>`,
expected: "Batman 042",
},
{
name: "title with colon removed",
html: `<html><head><title>Spider-Man: No Way Home 001 (2022)</title></head></html>`,
expected: "Spider-Man No Way Home 001",
},
{
name: "no title tag",
html: `<html><head></head></html>`,
expected: "Untitled",
},
{
name: "title without year pattern",
html: `<html><head><title>Some Random Page</title></head></html>`,
expected: "Untitled",
},
{
name: "empty title",
html: `<html><head><title></title></head></html>`,
expected: "Untitled",
},
{
name: "title starts with # falls back to h1",
html: `<html><head><title>#018 (2026)</title></head><body><h1>Absolute Batman #018 (2026)</h1></body></html>`,
expected: "Absolute Batman #018",
},
{
name: "title starts with # but h1 also starts with #, falls back to slug",
html: `<html><head><title>#018 (2026)</title></head><body><h1>#018 (2026)</h1></body></html>`,
url: "https://readallcomics.com/absolute-batman-018-2026/",
expected: "Absolute Batman 018",
},
{
name: "title starts with # falls back to slug when no h1",
html: `<html><head><title>#018 (2026)</title></head></html>`,
url: "https://readallcomics.com/absolute-batman-018-2026/",
expected: "Absolute Batman 018",
},
{
name: "title starts with # no h1 no url",
html: `<html><head><title>#018 (2026)</title></head></html>`,
expected: "#018",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
doc := newDocFromHTML(tt.html)
c := Comic{Markup: doc, URL: tt.url}
result := extractTitleFromMarkup(c)
if result != tt.expected {
t.Errorf("extractTitleFromMarkup() = %q, want %q", result, tt.expected)
}
})
}
}
func TestTitleFromSlug(t *testing.T) {
tests := []struct {
name string
url string
expected string
}{
{
name: "standard comic URL",
url: "https://readallcomics.com/absolute-batman-018-2026/",
expected: "Absolute Batman 018",
},
{
name: "no trailing slash",
url: "https://readallcomics.com/absolute-batman-018-2026",
expected: "Absolute Batman 018",
},
{
name: "no year in slug",
url: "https://readallcomics.com/absolute-batman-018/",
expected: "Absolute Batman 018",
},
{
name: "single word slug",
url: "https://readallcomics.com/batman/",
expected: "Batman",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := titleFromSlug(tt.url)
if result != tt.expected {
t.Errorf("titleFromSlug() = %q, want %q", result, tt.expected)
}
})
}
}
func TestCover(t *testing.T) {
tests := []struct {
name string
filelist []string
wantSuffix string
expectErr bool
}{
{
name: "finds cover ending in 001.jpg",
filelist: []string{"https://example.com/image-002.jpg", "https://example.com/image-001.jpg", "https://example.com/image-003.jpg"},
wantSuffix: "image-001.jpg",
},
{
name: "finds cover ending in 000.jpg",
filelist: []string{"https://example.com/image-000.jpg", "https://example.com/image-001.jpg"},
wantSuffix: "image-000.jpg",
},
{
name: "returns error when no cover found",
filelist: []string{"https://example.com/image-002.jpg", "https://example.com/image-003.jpg"},
expectErr: true,
},
{
name: "returns error for empty filelist",
filelist: []string{},
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
c := &Comic{Filelist: tt.filelist}
cover, err := c.Cover()
if tt.expectErr && err == nil {
t.Error("Cover() expected error, got nil")
}
if !tt.expectErr && err != nil {
t.Errorf("Cover() unexpected error: %v", err)
}
if tt.wantSuffix != "" && !strings.HasSuffix(cover, tt.wantSuffix) {
t.Errorf("Cover() = %q, want path ending in %q", cover, tt.wantSuffix)
}
})
}
}

View File

@@ -6,6 +6,7 @@ import (
"net/http"
"os"
"path/filepath"
"strings"
"time"
cloudflarebp "github.com/DaRealFreak/cloudflare-bp-go"
@@ -39,13 +40,33 @@ func downloadFile(url string, page int, c *Comic) error {
}
}
res, err := handleRequest(url)
var res *http.Response
var err error
if c.Client != nil {
req, reqErr := http.NewRequest("GET", url, nil)
if reqErr != nil {
return ComicDownloadError{Message: "invalid request", Code: 1}
}
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
if strings.Contains(url, "batcave.biz") {
req.Header.Set("Referer", "https://batcave.biz/")
}
res, err = c.Client.Do(req)
} else {
res, err = handleRequest(url)
}
if err != nil {
return ComicDownloadError{
Message: "invalid request",
Code: 1,
}
}
if res.StatusCode != http.StatusOK {
return ComicDownloadError{
Message: "bad response",
Code: 1,
}
}
defer res.Body.Close()
imageFile, err := os.Create(imageFilepath)

145
comic/download_test.go Normal file
View File

@@ -0,0 +1,145 @@
package comic
import (
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
)
func TestComicDownloadError(t *testing.T) {
err := ComicDownloadError{Message: "download failed", Code: 1}
if err.Error() != "download failed" {
t.Errorf("Error() = %q, want %q", err.Error(), "download failed")
}
}
func TestHandleRequest(t *testing.T) {
t.Run("successful request", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Header.Get("User-Agent") == "" {
t.Error("expected User-Agent header to be set")
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("image data"))
}))
defer server.Close()
resp, err := handleRequest(server.URL)
if err != nil {
t.Fatalf("handleRequest() unexpected error: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("handleRequest() status = %d, want %d", resp.StatusCode, http.StatusOK)
}
})
t.Run("non-200 response", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
}))
defer server.Close()
_, err := handleRequest(server.URL)
if err == nil {
t.Error("handleRequest() expected error for 404 response, got nil")
}
})
t.Run("invalid URL", func(t *testing.T) {
_, err := handleRequest("http://invalid.localhost:0/bad")
if err == nil {
t.Error("handleRequest() expected error for invalid URL, got nil")
}
})
}
func TestDownloadFile(t *testing.T) {
t.Run("successful download", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("fake image content"))
}))
defer server.Close()
tmpDir := t.TempDir()
c := &Comic{
Title: "TestComic",
LibraryPath: tmpDir,
}
err := downloadFile(server.URL+"/image.jpg", 1, c)
if err != nil {
t.Fatalf("downloadFile() unexpected error: %v", err)
}
expectedPath := filepath.Join(tmpDir, "TestComic", "TestComic 001.jpg")
if _, err := os.Stat(expectedPath); os.IsNotExist(err) {
t.Errorf("expected file %s to exist", expectedPath)
}
})
t.Run("formats page number with leading zeros", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("fake image content"))
}))
defer server.Close()
tmpDir := t.TempDir()
c := &Comic{
Title: "TestComic",
LibraryPath: tmpDir,
}
err := downloadFile(server.URL+"/image.jpg", 42, c)
if err != nil {
t.Fatalf("downloadFile() unexpected error: %v", err)
}
expectedPath := filepath.Join(tmpDir, "TestComic", "TestComic 042.jpg")
if _, err := os.Stat(expectedPath); os.IsNotExist(err) {
t.Errorf("expected file %s to exist", expectedPath)
}
})
t.Run("server error returns error", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusInternalServerError)
}))
defer server.Close()
tmpDir := t.TempDir()
c := &Comic{
Title: "TestComic",
LibraryPath: tmpDir,
}
err := downloadFile(server.URL+"/image.jpg", 1, c)
if err == nil {
t.Error("downloadFile() expected error for server error, got nil")
}
})
t.Run("empty response body returns error", func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
// write nothing
}))
defer server.Close()
tmpDir := t.TempDir()
c := &Comic{
Title: "TestComic",
LibraryPath: tmpDir,
}
err := downloadFile(server.URL+"/image.jpg", 1, c)
if err == nil {
t.Error("downloadFile() expected error for empty body, got nil")
}
})
}

View File

@@ -1,9 +1,18 @@
package comic
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"net/http/cookiejar"
"net/url"
"os"
"regexp"
"strings"
"time"
"github.com/PuerkitoBio/goquery"
)
@@ -47,6 +56,216 @@ func Markup(url string, c chan *goquery.Document) *goquery.Document {
return markup
}
// fetchViaFlareSolverr fetches a URL through FlareSolverr (headless Chrome),
// returning the final page HTML as a Document. Cookies from the browser session
// are written into jar for use in subsequent requests (e.g. image downloads).
func fetchViaFlareSolverr(targetURL string, jar *cookiejar.Jar) (*goquery.Document, error) {
fsURL := os.Getenv("FLARESOLVERR_URL")
if fsURL == "" {
return nil, fmt.Errorf("FLARESOLVERR_URL not set")
}
payload, _ := json.Marshal(map[string]interface{}{
"cmd": "request.get",
"url": targetURL,
"maxTimeout": 60000,
})
resp, err := http.Post(fsURL+"/v1", "application/json", bytes.NewReader(payload))
if err != nil {
return nil, err
}
defer resp.Body.Close()
var result struct {
Status string `json:"status"`
Solution struct {
Response string `json:"response"`
Cookies []struct {
Name string `json:"name"`
Value string `json:"value"`
Domain string `json:"domain"`
Path string `json:"path"`
Secure bool `json:"secure"`
} `json:"cookies"`
} `json:"solution"`
}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, err
}
if result.Status != "ok" {
return nil, fmt.Errorf("flaresolverr: %s", result.Status)
}
parsed, _ := url.Parse(targetURL)
var cookies []*http.Cookie
for _, c := range result.Solution.Cookies {
cookies = append(cookies, &http.Cookie{
Name: c.Name,
Value: c.Value,
Domain: c.Domain,
Path: c.Path,
Secure: c.Secure,
})
}
jar.SetCookies(parsed, cookies)
return goquery.NewDocumentFromReader(strings.NewReader(result.Solution.Response))
}
func BatcaveBizMarkup(referer string, c chan *goquery.Document, clientChan chan *http.Client) *goquery.Document {
sendErr := func() *goquery.Document {
if c != nil {
c <- &goquery.Document{}
}
if clientChan != nil {
clientChan <- nil
}
return &goquery.Document{}
}
jar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: jar,
Timeout: time.Second * 30,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return nil
},
}
headers := map[string]string{
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
}
// GET the challenge page to obtain cookies and any necessary tokens
req, err := http.NewRequest("GET", referer, nil)
if err != nil {
return sendErr()
}
for k, v := range headers {
req.Header.Set(k, v)
}
res, err := client.Do(req)
if err != nil {
log.Printf("[batcave] initial GET failed: %v", err)
return sendErr()
}
log.Printf("[batcave] initial GET status: %d", res.StatusCode)
// Cloudflare challenge — use FlareSolverr (headless Chrome) to fetch the
// full page and solve any JS challenges. cf_clearance is stored in jar for
// subsequent image downloads.
if res.StatusCode == 403 || res.StatusCode == 503 {
res.Body.Close()
log.Printf("[batcave] Cloudflare challenge detected, fetching via FlareSolverr")
doc, err := fetchViaFlareSolverr(referer, jar)
if err != nil {
log.Printf("[batcave] FlareSolverr failed: %v", err)
return sendErr()
}
if c != nil {
c <- doc
}
if clientChan != nil {
clientChan <- client
}
return doc
}
defer res.Body.Close()
body, err := io.ReadAll(res.Body)
if err != nil {
return sendErr()
}
tokenRegex := regexp.MustCompile(`token:\s*"([^"]+)"`)
matches := tokenRegex.FindSubmatch(body)
if matches == nil {
// no challenge, parse directly
doc, err := goquery.NewDocumentFromReader(strings.NewReader(string(body)))
if err != nil {
return sendErr()
}
if c != nil {
c <- doc
}
if clientChan != nil {
clientChan <- client
}
return doc
}
encodedToken := string(matches[1])
token, err := url.QueryUnescape(encodedToken)
if err != nil {
token = encodedToken
}
// POST to /_v with fake browser metrics
params := url.Values{}
params.Set("token", token)
params.Set("mode", "modern")
params.Set("workTime", "462")
params.Set("iterations", "183")
params.Set("webdriver", "0")
params.Set("touch", "0")
params.Set("screen_w", "1920")
params.Set("screen_h", "1080")
params.Set("screen_cd", "24")
postReq, err := http.NewRequest("POST", "https://batcave.biz/_v", strings.NewReader(params.Encode()))
if err != nil {
return sendErr()
}
for k, v := range headers {
postReq.Header.Set(k, v)
}
postReq.Header.Set("Content-Type", "application/x-www-form-urlencoded")
postReq.Header.Set("Referer", referer)
postRes, err := client.Do(postReq)
if err != nil {
log.Printf("[batcave] POST to /_v failed: %v", err)
return sendErr()
}
defer postRes.Body.Close()
log.Printf("[batcave] POST to /_v status: %d", postRes.StatusCode)
io.ReadAll(postRes.Body)
// GET the real page with the set cookie
realReq, err := http.NewRequest("GET", referer, nil)
if err != nil {
return sendErr()
}
for k, v := range headers {
realReq.Header.Set(k, v)
}
realRes, err := client.Do(realReq)
if err != nil {
log.Printf("[batcave] final GET failed: %v", err)
return sendErr()
}
log.Printf("[batcave] final GET status: %d", realRes.StatusCode)
defer realRes.Body.Close()
doc, err := goquery.NewDocumentFromReader(realRes.Body)
if err != nil {
return sendErr()
}
if c != nil {
c <- doc
}
if clientChan != nil {
clientChan <- client
}
return doc
}
// ParseImageLinks parses a goquery document to extract image links.
//
// markup is the goquery document to parse for image links.
@@ -69,3 +288,83 @@ func ParseImageLinks(markup *goquery.Document, c chan []string) ([]string, error
return links, ImageParseError{Message: "No images found", Code: 1}
}
func ParseReadAllComicsLinks(markup *goquery.Document, c chan []string) ([]string, error) {
var links []string
markup.Find("img").Each(func(_ int, image *goquery.Selection) {
link, _ := image.Attr("src")
if !strings.Contains(link, "logo") && (strings.Contains(link, "bp.blogspot.com") || strings.Contains(link, "blogger.googleusercontent") || strings.Contains(link, "covers")) {
links = append(links, link)
}
})
c <- links
if len(links) > 0 {
return links, nil
}
return links, ImageParseError{Message: "No images found", Code: 1}
}
// ParseBatcaveBizTitle extracts the chapter title from the __DATA__.chapters array
// by matching the chapter id to the last path segment of the provided URL.
func ParseBatcaveBizTitle(markup *goquery.Document, chapterURL string) string {
slug := strings.TrimRight(chapterURL, "/")
if i := strings.LastIndex(slug, "/"); i >= 0 {
slug = slug[i+1:]
}
var title string
markup.Find("script").Each(func(_ int, s *goquery.Selection) {
if title != "" {
return
}
text := s.Text()
if !strings.Contains(text, "__DATA__") {
return
}
chapterRegex := regexp.MustCompile(`"id"\s*:\s*` + regexp.QuoteMeta(slug) + `[^}]*?"title"\s*:\s*"([^"]+)"`)
m := chapterRegex.FindStringSubmatch(text)
if len(m) >= 2 {
title = strings.ReplaceAll(m[1], `\/`, "/")
title = strings.ReplaceAll(title, "Issue #", "")
title = strings.ReplaceAll(title, "#", "")
}
})
return title
}
// ParseBatcaveBizImageLinks extracts image URLs from the __DATA__.images JavaScript
// variable embedded in a batcave.biz page.
func ParseBatcaveBizImageLinks(markup *goquery.Document, c chan []string) ([]string, error) {
var links []string
markup.Find("script").Each(func(_ int, s *goquery.Selection) {
text := s.Text()
if !strings.Contains(text, "__DATA__") {
return
}
arrayRegex := regexp.MustCompile(`"images"\s*:\s*\[([^\]]+)\]`)
arrayMatch := arrayRegex.FindStringSubmatch(text)
if len(arrayMatch) < 2 {
return
}
urlRegex := regexp.MustCompile(`"([^"]+)"`)
for _, m := range urlRegex.FindAllStringSubmatch(arrayMatch[1], -1) {
if len(m) >= 2 {
links = append(links, strings.ReplaceAll(m[1], `\/`, "/"))
}
}
})
c <- links
if len(links) > 0 {
return links, nil
}
return links, ImageParseError{Message: "No images found", Code: 1}
}

192
comic/parser_test.go Normal file
View File

@@ -0,0 +1,192 @@
package comic
import (
"strings"
"testing"
"github.com/PuerkitoBio/goquery"
)
func TestParseBatcaveBizImageLinks(t *testing.T) {
tests := []struct {
name string
html string
expectCount int
expectErr bool
expectURLs []string
}{
{
name: "extracts images from __DATA__",
html: `<html><body><script>
var __DATA__ = {"images":["https://cdn.batcave.biz/img/001.jpg","https://cdn.batcave.biz/img/002.jpg"]};
</script></body></html>`,
expectCount: 2,
expectErr: false,
expectURLs: []string{"https://cdn.batcave.biz/img/001.jpg", "https://cdn.batcave.biz/img/002.jpg"},
},
{
name: "unescapes forward slashes in URLs",
html: `<html><body><script>
var __DATA__ = {"images":["https:\/\/cdn.batcave.biz\/img\/001.jpg"]};
</script></body></html>`,
expectCount: 1,
expectErr: false,
expectURLs: []string{"https://cdn.batcave.biz/img/001.jpg"},
},
{
name: "extracts images with spaces around colon and bracket",
html: `<html><body><script>
var __DATA__ = {"images" : [ "https://cdn.batcave.biz/img/001.jpg" ]};
</script></body></html>`,
expectCount: 1,
expectErr: false,
expectURLs: []string{"https://cdn.batcave.biz/img/001.jpg"},
},
{
name: "no __DATA__ script",
html: `<html><body><script>
var foo = "bar";
</script></body></html>`,
expectCount: 0,
expectErr: true,
},
{
name: "__DATA__ present but no images key",
html: `<html><body><script>
var __DATA__ = {"title":"Nightwing"};
</script></body></html>`,
expectCount: 0,
expectErr: true,
},
{
name: "no script tags",
html: `<html><body><p>nothing here</p></body></html>`,
expectCount: 0,
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
doc, _ := goquery.NewDocumentFromReader(strings.NewReader(tt.html))
ch := make(chan []string, 1)
links, err := ParseBatcaveBizImageLinks(doc, ch)
if tt.expectErr && err == nil {
t.Error("ParseBatcaveBizImageLinks() expected error, got nil")
}
if !tt.expectErr && err != nil {
t.Errorf("ParseBatcaveBizImageLinks() unexpected error: %v", err)
}
if len(links) != tt.expectCount {
t.Errorf("ParseBatcaveBizImageLinks() returned %d links, want %d", len(links), tt.expectCount)
}
for i, expected := range tt.expectURLs {
if i >= len(links) {
t.Errorf("missing link at index %d: want %q", i, expected)
continue
}
if links[i] != expected {
t.Errorf("links[%d] = %q, want %q", i, links[i], expected)
}
}
channelLinks := <-ch
if len(channelLinks) != tt.expectCount {
t.Errorf("channel received %d links, want %d", len(channelLinks), tt.expectCount)
}
})
}
}
func TestImageParseError(t *testing.T) {
err := ImageParseError{Message: "test error", Code: 1}
if err.Error() != "test error" {
t.Errorf("Error() = %q, want %q", err.Error(), "test error")
}
}
func TestParseImageLinks(t *testing.T) {
tests := []struct {
name string
html string
expectCount int
expectErr bool
}{
{
name: "extracts blogspot images",
html: `<html><body>
<img src="https://bp.blogspot.com/page-001.jpg" />
<img src="https://bp.blogspot.com/page-002.jpg" />
</body></html>`,
expectCount: 2,
expectErr: false,
},
{
name: "extracts blogger googleusercontent images",
html: `<html><body>
<img src="https://blogger.googleusercontent.com/page-001.jpg" />
</body></html>`,
expectCount: 1,
expectErr: false,
},
{
name: "extracts covers images",
html: `<html><body>
<img src="https://example.com/covers/cover-001.jpg" />
</body></html>`,
expectCount: 1,
expectErr: false,
},
{
name: "excludes logo images",
html: `<html><body>
<img src="https://bp.blogspot.com/logo-site.jpg" />
<img src="https://bp.blogspot.com/page-001.jpg" />
</body></html>`,
expectCount: 1,
expectErr: false,
},
{
name: "excludes non-matching images",
html: `<html><body>
<img src="https://other-site.com/image.jpg" />
<img src="https://cdn.example.com/banner.png" />
</body></html>`,
expectCount: 0,
expectErr: true,
},
{
name: "no images at all",
html: `<html><body><p>No images here</p></body></html>`,
expectCount: 0,
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
doc, _ := goquery.NewDocumentFromReader(strings.NewReader(tt.html))
ch := make(chan []string, 1)
links, err := ParseImageLinks(doc, ch)
if tt.expectErr && err == nil {
t.Error("ParseImageLinks() expected error, got nil")
}
if !tt.expectErr && err != nil {
t.Errorf("ParseImageLinks() unexpected error: %v", err)
}
if len(links) != tt.expectCount {
t.Errorf("ParseImageLinks() returned %d links, want %d", len(links), tt.expectCount)
}
// Verify the channel also received the links
channelLinks := <-ch
if len(channelLinks) != tt.expectCount {
t.Errorf("channel received %d links, want %d", len(channelLinks), tt.expectCount)
}
})
}
}

17
docker-compose.yml Normal file
View File

@@ -0,0 +1,17 @@
services:
flaresolverr:
image: ghcr.io/flaresolverr/flaresolverr:latest
restart: unless-stopped
yoink:
build: .
ports:
- "8080:8080"
volumes:
- ./library:/library
environment:
- YOINK_LIBRARY=/library
- FLARESOLVERR_URL=http://flaresolverr:8191
restart: unless-stopped
depends_on:
- flaresolverr

402
web/server.go Normal file
View File

@@ -0,0 +1,402 @@
package web
import (
"archive/zip"
"embed"
"encoding/json"
"fmt"
"io"
"io/fs"
"net/http"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
"sync"
"time"
"github.com/PuerkitoBio/goquery"
"yoink/comic"
)
//go:embed static
var staticFiles embed.FS
type JobStatus string
const (
StatusPending JobStatus = "pending"
StatusRunning JobStatus = "running"
StatusComplete JobStatus = "complete"
StatusError JobStatus = "error"
)
type Job struct {
ID string `json:"id"`
URL string `json:"url"`
Title string `json:"title"`
Status JobStatus `json:"status"`
Error string `json:"error,omitempty"`
CreatedAt time.Time `json:"created_at"`
}
type ComicEntry struct {
Title string `json:"title"`
CoverURL string `json:"cover_url"`
FileURL string `json:"file_url"`
DownloadedAt time.Time `json:"downloaded_at"`
}
type Server struct {
libraryPath string
jobs map[string]*Job
mu sync.RWMutex
}
func NewServer(libraryPath string) *Server {
return &Server{
libraryPath: libraryPath,
jobs: make(map[string]*Job),
}
}
func (s *Server) Handler() http.Handler {
mux := http.NewServeMux()
// Embedded static assets
staticFS, _ := fs.Sub(staticFiles, "static")
mux.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.FS(staticFS))))
// Library files: covers (inline) and cbz downloads (attachment)
mux.Handle("/covers/", http.StripPrefix("/covers/", http.FileServer(http.Dir(s.libraryPath))))
mux.Handle("/files/", http.StripPrefix("/files/", s.downloadHandler()))
// API
mux.HandleFunc("/api/download", s.handleDownload)
mux.HandleFunc("/api/upload", s.handleUpload)
mux.HandleFunc("/api/comics", s.handleComics)
mux.HandleFunc("/api/comics/delete", s.handleDeleteComic)
mux.HandleFunc("/api/jobs", s.handleJobs)
mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
// SPA root
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" {
http.NotFound(w, r)
return
}
data, _ := staticFiles.ReadFile("static/index.html")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.Write(data)
})
return mux
}
// downloadHandler wraps the library file server to force Content-Disposition: attachment.
func (s *Server) downloadHandler() http.Handler {
fs := http.FileServer(http.Dir(s.libraryPath))
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Disposition", "attachment")
fs.ServeHTTP(w, r)
})
}
func (s *Server) handleDownload(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}
var req struct {
URL string `json:"url"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil || strings.TrimSpace(req.URL) == "" {
http.Error(w, "invalid request", http.StatusBadRequest)
return
}
req.URL = strings.TrimSpace(req.URL)
job := &Job{
ID: fmt.Sprintf("%d", time.Now().UnixNano()),
URL: req.URL,
Status: StatusPending,
CreatedAt: time.Now(),
}
s.mu.Lock()
s.jobs[job.ID] = job
s.mu.Unlock()
go s.runJob(job)
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(job)
}
func (s *Server) runJob(job *Job) {
s.mu.Lock()
job.Status = StatusRunning
s.mu.Unlock()
markupCh := make(chan *goquery.Document)
imageCh := make(chan []string)
c := comic.NewComic(job.URL, s.libraryPath, imageCh, markupCh)
s.mu.Lock()
job.Title = c.Title
s.mu.Unlock()
if len(c.Filelist) == 0 {
s.mu.Lock()
job.Status = StatusError
job.Error = "no images found"
s.mu.Unlock()
return
}
errs := c.Download(len(c.Filelist))
if err := c.Archive(); err != nil {
c.Cleanup()
s.mu.Lock()
job.Status = StatusError
job.Error = err.Error()
s.mu.Unlock()
return
}
c.Cleanup()
if len(errs) > 0 {
s.mu.Lock()
job.Status = StatusError
job.Error = errs[0].Error()
s.mu.Unlock()
return
}
s.mu.Lock()
job.Status = StatusComplete
s.mu.Unlock()
}
func (s *Server) handleComics(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
entries := []ComicEntry{}
dirs, err := os.ReadDir(s.libraryPath)
if err != nil {
json.NewEncoder(w).Encode(entries)
return
}
for _, dir := range dirs {
if !dir.IsDir() {
continue
}
title := dir.Name()
dirPath := filepath.Join(s.libraryPath, title)
var coverURL, fileURL string
var downloadedAt time.Time
files, _ := os.ReadDir(dirPath)
for _, f := range files {
name := f.Name()
if strings.HasSuffix(name, ".cbz") {
fileURL = "/files/" + url.PathEscape(title) + "/" + url.PathEscape(name)
if info, err := f.Info(); err == nil {
downloadedAt = info.ModTime()
}
}
// Cover kept by Cleanup: "<Title> 001.jpg"
stripped := strings.TrimSpace(strings.TrimPrefix(name, title))
if strings.HasPrefix(strings.ToLower(stripped), "001") {
coverURL = "/covers/" + url.PathEscape(title) + "/" + url.PathEscape(name)
}
}
if fileURL != "" {
entries = append(entries, ComicEntry{
Title: title,
CoverURL: coverURL,
FileURL: fileURL,
DownloadedAt: downloadedAt,
})
}
}
// Default: newest first
sort.Slice(entries, func(i, j int) bool {
return entries[i].DownloadedAt.After(entries[j].DownloadedAt)
})
json.NewEncoder(w).Encode(entries)
}
func (s *Server) handleJobs(w http.ResponseWriter, r *http.Request) {
s.mu.RLock()
jobs := make([]*Job, 0, len(s.jobs))
for _, j := range s.jobs {
jobs = append(jobs, j)
}
s.mu.RUnlock()
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(jobs)
}
func (s *Server) handleUpload(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}
// 500 MB limit
if err := r.ParseMultipartForm(500 << 20); err != nil {
http.Error(w, "request too large", http.StatusRequestEntityTooLarge)
return
}
title := strings.TrimSpace(r.FormValue("title"))
if title == "" {
http.Error(w, "title required", http.StatusBadRequest)
return
}
// Sanitize: no path separators or shell-special characters
title = filepath.Base(title)
title = strings.Map(func(r rune) rune {
if strings.ContainsRune(`/\:*?"<>|`, r) {
return '_'
}
return r
}, title)
fileHeaders := r.MultipartForm.File["images"]
if len(fileHeaders) == 0 {
http.Error(w, "no images provided", http.StatusBadRequest)
return
}
// Sort by original filename so page order is preserved
sort.Slice(fileHeaders, func(i, j int) bool {
return fileHeaders[i].Filename < fileHeaders[j].Filename
})
dir := filepath.Join(s.libraryPath, title)
if err := os.MkdirAll(dir, 0o755); err != nil {
http.Error(w, "failed to create directory", http.StatusInternalServerError)
return
}
cbzPath := filepath.Join(dir, title+".cbz")
cbzFile, err := os.Create(cbzPath)
if err != nil {
http.Error(w, "failed to create archive", http.StatusInternalServerError)
return
}
defer cbzFile.Close()
zw := zip.NewWriter(cbzFile)
defer zw.Close()
imageExts := map[string]bool{".jpg": true, ".jpeg": true, ".png": true, ".webp": true}
idx := 1
for _, fh := range fileHeaders {
ext := strings.ToLower(filepath.Ext(fh.Filename))
if !imageExts[ext] {
continue
}
if ext == ".jpeg" {
ext = ".jpg"
}
entryName := fmt.Sprintf("%03d%s", idx, ext)
src, err := fh.Open()
if err != nil {
continue
}
// Save first image as cover: "<Title> 001.jpg"
if idx == 1 {
coverPath := filepath.Join(dir, title+" "+entryName)
if cf, err := os.Create(coverPath); err == nil {
io.Copy(cf, src)
cf.Close()
src.Close()
src, err = fh.Open()
if err != nil {
continue
}
}
}
ze, err := zw.Create(entryName)
if err != nil {
src.Close()
continue
}
io.Copy(ze, src)
src.Close()
idx++
}
if idx == 1 {
// Nothing was written — no valid images
os.RemoveAll(dir)
http.Error(w, "no valid images in upload", http.StatusBadRequest)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{"title": title, "status": "complete"})
}
func (s *Server) handleDeleteComic(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodDelete {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}
var req struct {
Title string `json:"title"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil || strings.TrimSpace(req.Title) == "" {
http.Error(w, "invalid request", http.StatusBadRequest)
return
}
// Sanitize: prevent path traversal
title := filepath.Base(strings.TrimSpace(req.Title))
comicDir := filepath.Join(s.libraryPath, title)
// Ensure the resolved path is still under the library
if !strings.HasPrefix(comicDir, filepath.Clean(s.libraryPath)+string(filepath.Separator)) {
http.Error(w, "invalid title", http.StatusBadRequest)
return
}
if err := os.RemoveAll(comicDir); err != nil {
http.Error(w, "failed to delete comic", http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusNoContent)
}
func Listen(addr string, libraryPath string) error {
srv := NewServer(libraryPath)
fmt.Printf("Yoink web server listening on %s\n", addr)
return http.ListenAndServe(addr, srv.Handler())
}

1604
web/static/index.html Normal file

File diff suppressed because it is too large Load Diff