ADR-007: Replace Stripe FDW with API-Driven Sync Pipeline
Status
Accepted
Context
The bookkeeping ledger views depend on joining Stripe data across balance_transactions, charges, payment_intents, invoices, refunds, and transfers. These are accessed via a Supabase Stripe Foreign Data Wrapper (FDW) that queries the Stripe API live on every SQL query.
Two problems surfaced:
-
The FDW is incompatible with Stripe's basil API version. Our Stripe account uses
2025-06-30.basil, which removedcharge.invoice,invoice.payment_intent,invoice.charge, andpayment_intent.invoice— the canonical fields for linking charges to invoices. The FDW has not been updated for basil (last Stripe-specific code change was March 2025). These columns silently return NULL, forcing us to join throughpayment_details.order_reference— a field Stripe explicitly truncates to 25 characters for L2/L3 card data and documents as non-canonical. -
The FDW is architecturally wrong for analytics. Every query hits the live Stripe API. Materialized view refreshes are slow and subject to rate limits. No local indexes. Dashboard queries depend on external API availability.
The Supabase Stripe Sync Engine (webhook-driven) was evaluated but is missing balance_transactions and transfers — the two most critical objects for bookkeeping.
Decision
Replace the Stripe FDW with a periodic API-driven sync pipeline that writes to local Postgres tables in the stripe schema.
Architecture — three layers:
- Layer 0 (Sync): Scripts fetch from Stripe API endpoints, store raw responses in local tables. Each table has typed columns for indexed/filterable fields and an
attrsjsonb column containing the complete, unaltered API response. No transformation. The sync controls which API version it requests — pre-basil for objects where basil removed fields we need, or basil where the new structure is preferred. - Layer 1 (Materialized view): Same as today — joins the
stripe.*tables intoanalytics.bookkeeping_stripe_ledger. The SQL changes minimally because the table shape is the same as the FDW, just with data that actually populates the canonical join fields (charge.invoice, etc.). - Layer 2+ (Views): Unchanged.
Sync strategy:
- Initial backfill: paginate through full history using Stripe list endpoints
- Incremental: fetch records with
created[gte]since last sync timestamp - Cadence: daily cron (bookkeeping doesn't need real-time)
- Objects synced:
balance_transactions,charges,payment_intents,invoices,refunds,transfers,payouts
Alternative considered: Pin the FDW to a pre-basil API version (2024-06-20). The FDW supports an api_version server option. This would restore charge.invoice etc. Rejected because it's technical debt with an expiration date — Stripe will eventually drop old API versions, and we'd still have the live-API-on-every-query problem.
Alternative considered: Supabase Stripe Sync Engine. Webhook-driven, near real-time, Postgres-native. Missing balance_transactions and transfers. A hybrid (sync engine + custom scripts for gaps) adds two sync mechanisms with different freshness characteristics. Rejected in favor of one consistent approach.
Alternative considered: Stripe Data Pipeline. First-party, most complete coverage. Does not support Postgres as a destination (Snowflake/Redshift only). Rejected — would require S3 intermediate storage and additional ETL.
Alternative considered: Airbyte. Open-source ETL with Stripe source and Postgres destination. Pinned to API version 2022-11-15 — no basil support, no invoice_payments. Heavy infrastructure for what we need. Rejected.
Consequences
Benefits:
- Canonical join fields (
charge.invoice,invoice.payment_intent) work because we control the API version per request - Local tables with proper indexes — dashboards don't depend on external API availability
- Full raw Stripe response stored in
attrs— never locked out of fields when the API evolves - Matview refresh queries local data — fast, no rate limits
- Simple infrastructure — scripts + cron, no new services to deploy
- Clean layer separation: Layer 0 is pure ingestion, no opinion
Tradeoffs:
- We build and maintain sync code (pagination, incremental tracking, error handling)
- Data freshness is daily instead of live — acceptable for bookkeeping, would not be acceptable for real-time application logic
- Initial backfill may take time for accounts with large history
- Storage cost for local copies of Stripe data (negligible at our scale)
Risks:
- Stripe API rate limits during backfill — mitigated by respecting rate limit headers and backing off
- Schema drift if Stripe changes response shape — mitigated by storing full
attrsjsonb (typed columns are a convenience layer, not the source of truth) - If we ever need real-time Stripe data for application logic (not just analytics), this pipeline doesn't solve that — we'd still need webhooks or the sync engine for that use case