The Engineer’s Familiar Stranger
Adventures with Text
We write code every day, but “text” has always been a hidden minefield.
Self Introduction
PosetMage
- Software Engineer
- Personal website, blog, publishing — all DIY
- Fell into every text-processing trap so you don’t have to
Outline
- PDF Layout Hell → Content vs Layout
- Writing a Book in Markdown → Pipeline
- Blogging → Jekyll + Web Components
- Japanese Game OCR → Screenshot Translation
- Web Articles → TTS
- Video Subtitles → Whisper
Part 1: PDF Layout Hell
We’ve all been there:
- A perfectly formatted Word doc explodes on another machine
- Missing fonts, broken spacing, shifted images
- Mixing “content” with “layout” = disaster
The Pitfall: Print to PDF
You think “just print to PDF” is easy?
- Linux server has no Chinese fonts installed → PDF full of tofu boxes
□□□ apt-get install fonts-noto-cjk— now you know this by heart- Different OS renders different line breaks → pagination completely wrong
- WeasyPrint, wkhtmltopdf, Puppeteer — each has its own quirks
The moment you realize “rendering text” is not trivial
Solution: Separate Content from Layout
Split “what you write” from “how it looks”
- LaTeX — academic standard, precise but steep learning curve
- HTML/CSS — web tech for layout, the tool engineers know best
Key insight: write content in plain text (Markdown), let template engines handle the rest
Part 2: Writing a Book in Markdown?
Markdown is great for writing, but a book is more than one article.
You need a pipeline:
- Multiple
.mdfiles → merge in chapter order - Convert formats → PDF / EPUB / HTML
- Auto-generate TOC, page numbers, cross-references
The Pitfalls of Book Pipeline
Sounds simple? Here’s what actually happens:
- Pandoc converts
.md→ PDF, but CJK fonts break again - Page breaks at wrong places — a heading sits alone at page bottom
- Image paths work locally but break after merge
- Cross-references between chapters? Good luck with relative links
Every “simple conversion” hides 10 edge cases
So I Built markbook
A Python pipeline that handles the mess for me:
- Define chapter structure, auto-merge
- Output to multiple formats
-
Writing a book like writing code: version control + CI/CD
- markbook
Part 3: The Article Management Journey
Google Blogger → Medium → Obsidian → VS Code Foam → Jekyll
The path from renting a platform to owning your own words.
Phase 1: Platform Era
- Google Blogger — easy, one-click publish, but zero control over layout
- Medium — great writing experience, beautiful typography
But problems appeared quickly:
- Content not portable, formatting breaks when migrating
- SEO has to start from scratch every time you move
- Locked in by the platform — feels like “renting a house”
I wanted to OWN my content, not rent a platform.
Phase 2: Note-Taking Tools
So I moved content to local tools:
- Obsidian — Markdown + local files, finally I own the data
- VS Code + Foam — wiki-style linking, lives in a git repo
Ownership problem solved! But new problem:
- These are personal knowledge bases, not publishing tools
- No public-facing output — how do readers see it?
- Still need a separate system for the actual website
Phase 3: Jekyll — Own Everything
The answer: Markdown files in a git repo, Jekyll renders them into a website.
- Content is plain
.mdfiles — you own them forever - Version control with git — full history, no lock-in
- GitHub Pages deploys for free — no server to manage
Google Blogger → Medium → Obsidian → Foam → Jekyll: the path to owning your own words.
Why I use Jekyll + Web Components
A decade of trial and error to find the right architecture
The Beginning: Blog Platforms
Like most people, I started with popular platforms:
- Google Blogger — easy, one-click publish
- Medium — great writing experience
But problems appeared quickly:
- Content not portable, formatting breaks when migrating
- SEO has to start from scratch
- Locked in by the platform — feels like “renting a house”
I wanted my own land, my own house.
First Attempt: Pure GitHub
Map routes directly to file system structure:
home/README.md → homepage
about/README.md → about me
tech/linux/README.md → tech articles
- No backend, no render loop
- Raw HTTP file serving
- Simple, direct, ugly but functional
The Pain of Pure GitHub
“Raw” quickly became painful:
- No shared navbar — copy-paste HTML header for every article
- No style management — the whole site looked like a 1990s bulletin board
- Maintenance cost grows linearly with number of articles
Entering Jekyll
Jekyll gave me my first taste of Template power
- Apply a theme, site instantly looks professional
- Write content in Markdown, Jekyll handles structure
- Deploy on GitHub Pages for free
But as my needs grew, so did the pain…
Jekyll’s Bottleneck
I wanted interactive charts and dynamic code demos in my articles
In Jekyll’s world, this is a disaster:
- Must create HTML in
_includes/, filled with<script>tags and Liquid logic - Use awkward syntax to call it from Markdown
- Project structure becomes spaghetti:
- Markdown mixed with Liquid tags
- HTML mixed with JavaScript logic
- JS depends on Liquid-injected variables
Changing a chart’s color means opening Markdown, HTML layout, and CSS files simultaneously
Trying Other Frameworks
| Framework | Pros | Cons |
|---|---|---|
| Hugo | Extremely fast | Template syntax complex, Liquid is easier |
| Gatsby | React ecosystem | Too heavy, Webpack + GraphQL setup drains all energy |
| MDsveX | Svelte integration | Only works in specific folders, no flexible structure |
| SvelteKit | Closest to auto md → html | Folder structure still less flexible than Jekyll |
SvelteKit’s closest approach:
src/
├── routes/
│ ├── blog/[slug]/
│ └── docs/[slug]/
├── content/
│ ├── blog/
│ └── docs/
└── lib/
But folder structure is still not flexible enough
Discovering Web Components + Svelte
First time experiencing the real power of Separation of Concerns
- Tried Lit for WC — felt extremely unnatural to write
- Discovered Svelte can also compile as WC
Write a clean component in Svelte, compile to standard Web Component
In Markdown, just write:
<my-chart data="[1,2,3]"></my-chart>
Jekyll + WC = Islands Architecture
Jekyll becomes a content routing shell
Svelte takes over all interactive logic
- Markdown handles storytelling
- Components handle the magic
- They don’t interfere with each other
This is the modern Islands Architecture concept. Astro is the ultimate expression of this idea.
But in the end, Jekyll + WC is the most flexible.
So I Built Jekyll Layouts
- Jekyll: file system as routing, Markdown for content, zero-cost deployment
- Web Components: standard browser API, framework-agnostic, reusable
- Svelte compile to WC: great dev experience, outputs standard components
- Supports slides, notes, mindmaps — one content, multiple outputs
This talk you’re watching right now is rendered by this system.
Part 4: I Just Want to Play Japanese Games
Playing a Japanese visual novel, the story looks amazing — but I can’t read it.
- Copy text? Impossible — it’s rendered in the game engine
- Google Translate camera? Unstable, slow, breaks immersion
- Fan translation patches? Not available for most games
So I thought: what if I just screenshot and OCR it?
OCR Pitfalls Nobody Warns You About
- Tesseract? Great for English, garbage for Japanese vertical text
- Game fonts are stylized — OCR engines choke on anti-aliased text
- Furigana (small reading hints above kanji) confuses the model
- Screenshot timing matters — dialogue boxes have animations
So I Built JP_OCR_translate
After weeks of tuning: screenshot → OCR → translate pipeline that actually works
- OCR engine reads Japanese text from screen captures
- Auto-translates to Chinese
-
Works for games, novels, and web pages
- JP_OCR_translate
Part 5: My Eyes Are Dying
I read a lot of long-form articles — blog posts, documentation, research.
- Sitting at a screen for 12 hours coding, then reading MORE text?
- I wanted to go for a walk and listen to articles like a podcast
- Browser read-aloud extensions? Robotic, can’t handle code blocks, reads nav menus
I just wanted someone to read the article to me. So I built it.
TTS Pitfalls: Text Is Not What You Think
- HTML is not text — you need to strip nav, footer, ads, code blocks
- What about
<pre>tags? You don’t want TTS readingconsole.log("hello") - CJK punctuation vs ASCII punctuation → different pause lengths
- Article behind a paywall or SPA? Good luck extracting content
The real challenge was never “text to speech” — it was “web page to clean text”
So I Built Browser-TTS
- Extract article content from web pages, strip the junk
- Convert clean text to audio via TTS engine
-
Play directly in the browser — no app install needed
- Browser-TTS
Part 6: I Want Subtitles on My Videos
Recording a talk or tutorial, but adding subtitles manually is torture.
- YouTube auto-captions? Decent for English, chaos for mixed-language content
- Professional subtitling services? Expensive for personal projects
- Manual transcription? Life is too short
Whisper changed everything — but it’s not magic
Whisper Pitfalls: Close But Not Perfect
- Mixed language (English terms in Chinese speech) → model gets confused
- Speaker changes not detected — who said what?
- Timestamps drift on long recordings — subtitles slowly go out of sync
- Raw transcript has no punctuation, no paragraphs — a wall of text
Still needs post-processing: sentence splitting, timestamp correction, formatting
Connecting the Dots
Every problem boils down to text transformation and flow
Audio ──Whisper──→ Text
Text ──Markdown──→ Book / Blog / Slides
Text ──TTS──→ Audio
Image ──OCR──→ Text ──Translate──→ Another Language
Text seems simple until you actually try to process it. As engineers, we can automate these flows — and that’s our superpower.