The Engineer’s Familiar Stranger

Adventures with Text

We write code every day, but “text” has always been a hidden minefield.

Self Introduction

PosetMage

Outline

  1. PDF Layout Hell → Content vs Layout
  2. Writing a Book in Markdown → Pipeline
  3. Blogging → Jekyll + Web Components
  4. Japanese Game OCR → Screenshot Translation
  5. Web Articles → TTS
  6. Video Subtitles → Whisper

Part 1: PDF Layout Hell

We’ve all been there:

The Pitfall: Print to PDF

You think “just print to PDF” is easy?

The moment you realize “rendering text” is not trivial

Solution: Separate Content from Layout

Split “what you write” from “how it looks”

Key insight: write content in plain text (Markdown), let template engines handle the rest

Part 2: Writing a Book in Markdown?

Markdown is great for writing, but a book is more than one article.

You need a pipeline:

  1. Multiple .md files → merge in chapter order
  2. Convert formats → PDF / EPUB / HTML
  3. Auto-generate TOC, page numbers, cross-references

The Pitfalls of Book Pipeline

Sounds simple? Here’s what actually happens:

Every “simple conversion” hides 10 edge cases

So I Built markbook

A Python pipeline that handles the mess for me:

Part 3: The Article Management Journey

Google Blogger → Medium → Obsidian → VS Code Foam → Jekyll

The path from renting a platform to owning your own words.

Phase 1: Platform Era

But problems appeared quickly:

I wanted to OWN my content, not rent a platform.

Phase 2: Note-Taking Tools

So I moved content to local tools:

Ownership problem solved! But new problem:

Phase 3: Jekyll — Own Everything

The answer: Markdown files in a git repo, Jekyll renders them into a website.

Google Blogger → Medium → Obsidian → Foam → Jekyll: the path to owning your own words.

Why I use Jekyll + Web Components

A decade of trial and error to find the right architecture

The Beginning: Blog Platforms

Like most people, I started with popular platforms:

But problems appeared quickly:

I wanted my own land, my own house.

First Attempt: Pure GitHub

Map routes directly to file system structure:

home/README.md        → homepage
about/README.md       → about me
tech/linux/README.md  → tech articles

The Pain of Pure GitHub

“Raw” quickly became painful:

Entering Jekyll

Jekyll gave me my first taste of Template power

But as my needs grew, so did the pain…

Jekyll’s Bottleneck

I wanted interactive charts and dynamic code demos in my articles

In Jekyll’s world, this is a disaster:

Changing a chart’s color means opening Markdown, HTML layout, and CSS files simultaneously

Trying Other Frameworks

Framework Pros Cons
Hugo Extremely fast Template syntax complex, Liquid is easier
Gatsby React ecosystem Too heavy, Webpack + GraphQL setup drains all energy
MDsveX Svelte integration Only works in specific folders, no flexible structure
SvelteKit Closest to auto md → html Folder structure still less flexible than Jekyll

SvelteKit’s closest approach:

src/
├── routes/
│   ├── blog/[slug]/
│   └── docs/[slug]/
├── content/
│   ├── blog/
│   └── docs/
└── lib/

But folder structure is still not flexible enough

Discovering Web Components + Svelte

First time experiencing the real power of Separation of Concerns

Write a clean component in Svelte, compile to standard Web Component

In Markdown, just write:

<my-chart data="[1,2,3]"></my-chart>

Jekyll + WC = Islands Architecture

Jekyll becomes a content routing shell

Svelte takes over all interactive logic

This is the modern Islands Architecture concept. Astro is the ultimate expression of this idea.

But in the end, Jekyll + WC is the most flexible.

So I Built Jekyll Layouts

This talk you’re watching right now is rendered by this system.

Part 4: I Just Want to Play Japanese Games

Playing a Japanese visual novel, the story looks amazing — but I can’t read it.

So I thought: what if I just screenshot and OCR it?

OCR Pitfalls Nobody Warns You About

So I Built JP_OCR_translate

After weeks of tuning: screenshot → OCR → translate pipeline that actually works

Part 5: My Eyes Are Dying

I read a lot of long-form articles — blog posts, documentation, research.

I just wanted someone to read the article to me. So I built it.

TTS Pitfalls: Text Is Not What You Think

The real challenge was never “text to speech” — it was “web page to clean text”

So I Built Browser-TTS

Part 6: I Want Subtitles on My Videos

Recording a talk or tutorial, but adding subtitles manually is torture.

Whisper changed everything — but it’s not magic

Whisper Pitfalls: Close But Not Perfect

Still needs post-processing: sentence splitting, timestamp correction, formatting

Connecting the Dots

Every problem boils down to text transformation and flow

Audio ──Whisper──→ Text
Text ──Markdown──→ Book / Blog / Slides
Text ──TTS──→ Audio
Image ──OCR──→ Text ──Translate──→ Another Language

Text seems simple until you actually try to process it. As engineers, we can automate these flows — and that’s our superpower.

Thank You!