The substrate is the product. Everything else — archetypes, Board responses, the connections engine — is the layer that comes after.
This page is downstream of Product Bible v1.1 and reflects a May 16-17, 2026 conversation in which Brad re-scoped Gen 1 to universal capture only — deferring 12 archetypes, Board responses, and L3 enrichment to Gen 1.5+. The substrate itself becomes the Gen 1 product.
Pass 2.6 ratifies the source-artifact / capture data model: source artifacts (one per unique URL — episode, video, article, book) are the deduplicable substrate; captures (one per moment of resonance) are the units of attention. Many captures point to one source artifact. This unlocks proper multi-moment capture, transcript reuse, book passages, and creator attribution — and is foundational to every other Gen 1 close-out arc.
Where this page disagrees with the Bible, this page is more recent — but a Bible v1.2 patch is owed documenting the re-scope before this becomes the canonical reference. Treat as authoritative draft until that patch lands.
Universal capture is one promise made of three concrete commitments. Each guarantee has a clear close-out path. None compromised by deferring archetypes.
The URL classifier recognizes content from any platform and routes it to a working resolver — specialized when it matters, universal when it doesn't, honest when it can't.
Capture lands cleanly regardless of where it originated — smartphone, tablet, laptop. Plus cross-device routes that ride on platforms the user already has.
URL-origin and text-origin in Gen 1. Image-origin (OCR) and audio-origin (voice memos) land in future Gen via the same substrate.
Canonical taxonomy from Product Bible v1.1 and the multi-type capture spec. Each type covers many sources — the share sheet does the routing.
Apple Podcasts, Spotify, YouTube, Substack, Medium, NYT, Waking Up, arXiv, Notion, articles from anywhere with a URL.
Gen 1 · shippedMy own thought, voice-captured reflection, typed insight. The user is the source.
Gen 1 · shippedForwarded newsletter, friend's recommendation, anything in your inbox. Text above the forward delimiter becomes the reflection.
Gen 1 · arc 4Screenshot of paywall'd content, photo of a sign, OCR'd image of physical text. Future Gen.
Future GenPhysical book passage, Kindle highlight, audiobook clip. Photo OCR or manual entry. Future Gen.
Future GenHardware reality. What the user is physically holding when they capture. Each class has one or more surfaces — software where capture originates.
Capture paths that ride on platforms the user already has. Work from any device, regardless of class. Email forward is the high-leverage Gen 1 addition.
Forward any email to capture@sigul.ai. Text above the forward delimiter becomes the "what hit you" reflection. Universal across every device and email client. High leverage for newsletter readers — zero device-install friction.
Current Discord setup is Brad-personal dogfood — hardcoded channel, no install flow, not productized. Productizing means public install flow, per-user OAuth, channel scoping. Slack, Telegram, Teams follow same pattern.
Share extension, deeper iOS integration, native UI. Gen 1 path is iOS Shortcut → Safari per Bible. Native app deferred until post-WWDC June 8 to see what Apple ships.
S-024 three-tier rollout. Sigul as intelligence substrate accessible to other agents. Different distribution model — Sigul as service, not just product.
Every front door, every cross-device route, every modality lands in the same funnel. One ingress, one classifier, one substrate.
Edge plumbing. Every front door deposits a payload here.
The single front door inside Sigul. Classifies, routes, writes L2.
Specialized parser, universal fallback, honest stub. Whisper as last resort.
The moat. Structured metadata, transcript, source identity, user reflection.
The Gen 1 demo surface. The substrate is the demo.
Each capture tries Tier 1 first (specialized for known platforms), falls to Tier 2 (platform mappers, mostly pending), falls to Tier 3 (universal Readability + OG), or surfaces honestly at Tier 4 (closed gardens) with an explicit message.
Apple Podcasts (orchestrator + WhisperKit), YouTube (yt-dlp, S-028 complete), Waking Up (HTML parser). Spotify pending.
Substack, Medium, WordPress, arXiv, Notion get proper platform + publication identity. Arc 1. Not yet built.
Readability + OG + JSON-LD. D1 hardening shipped May 9 (6 baseline bugs closed). The honest baseline.
Closed gardens — Spotify (today), Audible, Facebook, TikTok, Patreon. Honest stub message. Sub-bucket refinement pending.
Specialized parser → YouTube captions → RSS direct → Podcast Index → WhisperKit → honest stub. WhisperKit-as-last-resort is the architectural goal — Option E1 (YouTube captions canonical) and the Podcast 2.0 cascade are how that goal becomes practice.
Even with archetypes deferred, the substrate Sigul builds during the Gen 1 window is the substrate it has forever for those captures. Don't silently degrade L3-readiness.
Per Product Bible v1.1, reflection is never optional in UX prominence — the field stays optional, but the prompt stays unmissable. A small UX pass on the iOS Shortcut + capture worker keeps tomorrow's substrate L3-ready when archetypes ship. Cost of skipping: every capture made between now and L3 launch loses the affective layer that makes Sigul irreplaceable.
Five arcs complete the universal-capture promise. Arc 5 ships first — it's foundational data-model work that the other four arcs write against. Then Arc 1 and Arc 4 in parallel (substrate breadth), then Arc 2 (transcript coverage), then Arc 3 (any-device closeout) in parallel with anything. Scope estimates only — none codebase-validated. Each requires a pre-arc spike before dispatch per the May 11 failure-pattern protocol.
Source artifacts are the deduplicable substrate (one per unique URL — episode, video, article, book). Captures are the units of attention (one per moment of resonance — has user_note, position, captured_at). Many captures → one source artifact. Unlocks multi-moment capture, transcript reuse, book passages, Reading Sessions, creator attribution. UX Model 3: automatic routing with transparent feedback ("Added to your earlier capture. 3 moments now").
Caveat — bigger commitment than the other four arcs. Schema migration of existing 120 captures + new source-artifacts collection. Splittable into 4 sub-arcs: 5a schema design + migration spec, 5b migration + ingestion idempotency, 5c capture-time UX feedback, 5d vault UI for multi-capture rendering. Pre-arc spike: confirm URL collisions in existing corpus, canonical-URL normalization scheme, source-artifact primary key per resolver.
Substack, Medium, WordPress, arXiv, Notion get proper platform and publication identity instead of collapsing to generic_article. May 8 audit scoped this; design pending. The biggest remaining L2 breadth gap.
Caveat — scope not validated by codebase reconnaissance. Pre-arc spike required: confirm current state of platform-classifier code, identify mapper interface, confirm what D1 hardening already partially closed. Ships after Arc 5 so new mappers write to the source-artifact model.
E1: YouTube auto-captions land in canonical content.passage_text. yt-dlp already extracts them. Podcast 2.0 cascade: 3.7% transcript hit rate, near-zero non-transcript metadata per D-553 and D-485. Likely shelve, let E1 carry the transcript coverage win.
Caveat — Podcast 2.0 disposition decision still open. Default position pre-pressure-test: shelve A and B with revival triggers, ship E1 narrow. Transcript lives at source-artifact level per Arc 5 model.
capture.sigul.app DNS shipped, cellular and wifi both working. Remaining: Chrome extension MVP per Bible Gen 1 scope (multi-session), Discord disposition (clarify it's Brad-personal dogfood, productization is Future Gen), iPad smoke test (10 min).
Caveat — Chrome extension is the dominant sub-arc and isn't really 6–8 hr Cort. Likely a multi-session build with its own design pass. Could split into Arc 3a (DNS + smoke test, near done) and Arc 3b (Chrome ext MVP, real build). UI work; can dispatch in parallel with other arcs once Arc 5 ships.
Inbound mail receiver (Postmark / SendGrid / Mailgun), parser extracts URLs and forwarded text, writes to /ingest/url. Universal cross-device route. Strategically the highest-leverage Gen 1 addition — newsletter readers onboard with zero device-install friction.
Caveat — inbound mail infra may add hours beyond the parser work. Pre-arc spike required: pick mail provider, verify MX record path, scope spam/attachment policy. Ships after Arc 5 so email captures write to the source-artifact model (one newsletter issue = one source-artifact).
This page is downstream of three canonical docs. Locking this page without updating them creates internal inconsistency in the doc hierarchy.
Section 10 (Gen 1 Scope) re-scoped: universal capture only; archetypes, Board responses, L3 deferred to Gen 1.5+. Section 4 (Data Schema) extended with source-artifact / capture model per Arc 5. Section 17 cross-reference to this doc as the canonical architecture map.
Type 5 Email Forward currently scoped as Gen 2 in spec v3. Re-scope to Gen 1 per May 16 decision. Mobile entry-point section needs Arc 3 status update (Chrome ext still pending, capture.sigul.app DNS shipped). Reading Sessions design re-anchored to source-artifact model.
Five close-out arcs added (Arc 5 foundational, then Arc 1/4 parallel, Arc 2, Arc 3). Revival triggers documented for deferred Gen 1.5 work (archetypes, L3, Spotify Tier 1, S-027, Tier 3.5 SSR). Podcast 2.0 A and B formally shelved.
Add: "Gen 1 re-scoped 2026-05-16 to universal capture only; source-artifact + capture model per Arc 5; see sigul-universal-capture-architecture.html." Frees future-Claude from reconstructing the May 16-17 conversation.