Sigul · Internal Architecture Reference

Universal capture from any source,
any device, any modality.

The substrate is the product. Everything else — archetypes, Board responses, the connections engine — is the layer that comes after.

Version Pass 2.6 · source-artifact model Date 2026-05-17 Owner Brad MacLean + Claude Status Draft, Bible v1.2 patch owed
Read this first

This page is downstream of Product Bible v1.1 and reflects a May 16-17, 2026 conversation in which Brad re-scoped Gen 1 to universal capture only — deferring 12 archetypes, Board responses, and L3 enrichment to Gen 1.5+. The substrate itself becomes the Gen 1 product.

Pass 2.6 ratifies the source-artifact / capture data model: source artifacts (one per unique URL — episode, video, article, book) are the deduplicable substrate; captures (one per moment of resonance) are the units of attention. Many captures point to one source artifact. This unlocks proper multi-moment capture, transcript reuse, book passages, and creator attribution — and is foundational to every other Gen 1 close-out arc.

Where this page disagrees with the Bible, this page is more recent — but a Bible v1.2 patch is owed documenting the re-scope before this becomes the canonical reference. Treat as authoritative draft until that patch lands.

The promise, broken down

Three guarantees.

Universal capture is one promise made of three concrete commitments. Each guarantee has a clear close-out path. None compromised by deferring archetypes.

01

Any source

The URL classifier recognizes content from any platform and routes it to a working resolver — specialized when it matters, universal when it doesn't, honest when it can't.

02

Any device

Capture lands cleanly regardless of where it originated — smartphone, tablet, laptop. Plus cross-device routes that ride on platforms the user already has.

03

Any modality

URL-origin and text-origin in Gen 1. Image-origin (OCR) and audio-origin (voice memos) land in future Gen via the same substrate.

What Sigul captures

Five capture types.

Canonical taxonomy from Product Bible v1.1 and the multi-type capture spec. Each type covers many sources — the share sheet does the routing.

Link
capture_type · link

Apple Podcasts, Spotify, YouTube, Substack, Medium, NYT, Waking Up, arXiv, Notion, articles from anywhere with a URL.

Gen 1 · shipped
Thought
capture_type · thought

My own thought, voice-captured reflection, typed insight. The user is the source.

Gen 1 · shipped
Email
capture_type · email

Forwarded newsletter, friend's recommendation, anything in your inbox. Text above the forward delimiter becomes the reflection.

Gen 1 · arc 4
Screen
capture_type · screen

Screenshot of paywall'd content, photo of a sign, OCR'd image of physical text. Future Gen.

Future Gen
Book
capture_type · book

Physical book passage, Kindle highlight, audiobook clip. Photo OCR or manual entry. Future Gen.

Future Gen
Any device, layer one

Front doors by device class.

Hardware reality. What the user is physically holding when they capture. Each class has one or more surfaces — software where capture originates.

Smartphone
iOS share sheet Shipped
capture.sigul.app Shipped
Native iOS app Future
Android Future
Tablet
iPad inherits iOS share sheet Smoke test
Pencil / handwriting Future
Android tablet Future
Laptop / desktop
Chrome extension Arc 3
sigul.app/capture web Partial
Vault UI (localhost) Shipped
MCP / agent endpoint Future
Future hardware
Apple Watch Future
Vision Pro Future
Siri / voice intent Post WWDC
Apple Intelligence MCP Post WWDC
Any device, layer two

Cross-device routes.

Capture paths that ride on platforms the user already has. Work from any device, regardless of class. Email forward is the high-leverage Gen 1 addition.

Email forward
Gen 1 · arc 4

Forward any email to capture@sigul.ai. Text above the forward delimiter becomes the "what hit you" reflection. Universal across every device and email client. High leverage for newsletter readers — zero device-install friction.

Bot integrations
Future Gen

Current Discord setup is Brad-personal dogfood — hardcoded channel, no install flow, not productized. Productizing means public install flow, per-user OAuth, channel scoping. Slack, Telegram, Teams follow same pattern.

Native iOS app
Future Gen

Share extension, deeper iOS integration, native UI. Gen 1 path is iOS Shortcut → Safari per Bible. Native app deferred until post-WWDC June 8 to see what Apple ships.

MCP and agent endpoints
Future Gen

S-024 three-tier rollout. Sigul as intelligence substrate accessible to other agents. Different distribution model — Sigul as service, not just product.

How it actually works

The pipeline.

Every front door, every cross-device route, every modality lands in the same funnel. One ingress, one classifier, one substrate.

Worker · KV · poller · mail receiver

Edge plumbing. Every front door deposits a payload here.

IH server · /ingest/url

The single front door inside Sigul. Classifies, routes, writes L2.

Resolver cascade · Tier 1 → 4

Specialized parser, universal fallback, honest stub. Whisper as last resort.

L2 substrate · captures.json

The moat. Structured metadata, transcript, source identity, user reflection.

vault.sigul.app

The Gen 1 demo surface. The substrate is the demo.

Any source, the resolver model

Four tiers, one cascade.

Each capture tries Tier 1 first (specialized for known platforms), falls to Tier 2 (platform mappers, mostly pending), falls to Tier 3 (universal Readability + OG), or surfaces honestly at Tier 4 (closed gardens) with an explicit message.

Tier 1 · Specialized
Specialized parser

Apple Podcasts (orchestrator + WhisperKit), YouTube (yt-dlp, S-028 complete), Waking Up (HTML parser). Spotify pending.

Tier 2 · Mapper
Platform mapper

Substack, Medium, WordPress, arXiv, Notion get proper platform + publication identity. Arc 1. Not yet built.

Tier 3 · Universal
Universal floor

Readability + OG + JSON-LD. D1 hardening shipped May 9 (6 baseline bugs closed). The honest baseline.

Tier 4 · Boundary
Honest boundary

Closed gardens — Spotify (today), Audible, Facebook, TikTok, Patreon. Honest stub message. Sub-bucket refinement pending.

Cascade order

Specialized parser → YouTube captions → RSS direct → Podcast Index → WhisperKit → honest stub. WhisperKit-as-last-resort is the architectural goal — Option E1 (YouTube captions canonical) and the Podcast 2.0 cascade are how that goal becomes practice.

Substrate quality, today

The capture-time UX guard.

Even with archetypes deferred, the substrate Sigul builds during the Gen 1 window is the substrate it has forever for those captures. Don't silently degrade L3-readiness.

"What hit you?" prompt prominence

Per Product Bible v1.1, reflection is never optional in UX prominence — the field stays optional, but the prompt stays unmissable. A small UX pass on the iOS Shortcut + capture worker keeps tomorrow's substrate L3-ready when archetypes ship. Cost of skipping: every capture made between now and L3 launch loses the affective layer that makes Sigul irreplaceable.

Candidate work, not ratified roadmap

Gen 1 close-out · five arcs.

Five arcs complete the universal-capture promise. Arc 5 ships first — it's foundational data-model work that the other four arcs write against. Then Arc 1 and Arc 4 in parallel (substrate breadth), then Arc 2 (transcript coverage), then Arc 3 (any-device closeout) in parallel with anything. Scope estimates only — none codebase-validated. Each requires a pre-arc spike before dispatch per the May 11 failure-pattern protocol.

Arc 5 · Foundational · ships first
Source-artifact model + multi-moment capture + UX feedback
~10–15 hr Cort across 4 sub-arcs

Source artifacts are the deduplicable substrate (one per unique URL — episode, video, article, book). Captures are the units of attention (one per moment of resonance — has user_note, position, captured_at). Many captures → one source artifact. Unlocks multi-moment capture, transcript reuse, book passages, Reading Sessions, creator attribution. UX Model 3: automatic routing with transparent feedback ("Added to your earlier capture. 3 moments now").

Caveat — bigger commitment than the other four arcs. Schema migration of existing 120 captures + new source-artifacts collection. Splittable into 4 sub-arcs: 5a schema design + migration spec, 5b migration + ingestion idempotency, 5c capture-time UX feedback, 5d vault UI for multi-capture rendering. Pre-arc spike: confirm URL collisions in existing corpus, canonical-URL normalization scheme, source-artifact primary key per resolver.

Arc 1 · Any source
Tier 2 mapper architecture
~6–8 hr Cort, estimate

Substack, Medium, WordPress, arXiv, Notion get proper platform and publication identity instead of collapsing to generic_article. May 8 audit scoped this; design pending. The biggest remaining L2 breadth gap.

Caveat — scope not validated by codebase reconnaissance. Pre-arc spike required: confirm current state of platform-classifier code, identify mapper interface, confirm what D1 hardening already partially closed. Ships after Arc 5 so new mappers write to the source-artifact model.

Arc 2 · Full transcript
Option E1 + Podcast 2.0 disposition
~3–4 hr Cort, plus decision

E1: YouTube auto-captions land in canonical content.passage_text. yt-dlp already extracts them. Podcast 2.0 cascade: 3.7% transcript hit rate, near-zero non-transcript metadata per D-553 and D-485. Likely shelve, let E1 carry the transcript coverage win.

Caveat — Podcast 2.0 disposition decision still open. Default position pre-pressure-test: shelve A and B with revival triggers, ship E1 narrow. Transcript lives at source-artifact level per Arc 5 model.

Arc 3 · Any device
Chrome ext, Discord disposition, iPad smoke test
multi-session

capture.sigul.app DNS shipped, cellular and wifi both working. Remaining: Chrome extension MVP per Bible Gen 1 scope (multi-session), Discord disposition (clarify it's Brad-personal dogfood, productization is Future Gen), iPad smoke test (10 min).

Caveat — Chrome extension is the dominant sub-arc and isn't really 6–8 hr Cort. Likely a multi-session build with its own design pass. Could split into Arc 3a (DNS + smoke test, near done) and Arc 3b (Chrome ext MVP, real build). UI work; can dispatch in parallel with other arcs once Arc 5 ships.

Arc 4 · Email forward
capture@sigul.ai inbound
~6–10 hr Cort, plus infra

Inbound mail receiver (Postmark / SendGrid / Mailgun), parser extracts URLs and forwarded text, writes to /ingest/url. Universal cross-device route. Strategically the highest-leverage Gen 1 addition — newsletter readers onboard with zero device-install friction.

Caveat — inbound mail infra may add hours beyond the parser work. Pre-arc spike required: pick mail provider, verify MX record path, scope spam/attachment policy. Ships after Arc 5 so email captures write to the source-artifact model (one newsletter issue = one source-artifact).

If this page lands canonical

Follow-on docs owed.

This page is downstream of three canonical docs. Locking this page without updating them creates internal inconsistency in the doc hierarchy.

Shipped, working today
Gen 1 close-out arc · candidate
Future Gen · dotted border
UX guard · do not skip