Operational review of WICHI's backend on Railway and Supabase, covering real-world incidents like connection pooling and migration conflicts, plus criteria for infrastructure migration.
Why This Stack
WICHI’s backend is a FastAPI + PostgreSQL stack. When choosing infrastructure to host it, we had three criteria:
- Deployment must be simple — a GitHub push should be all it takes to deploy. We didn’t want to spend time configuring a separate CI/CD pipeline.
- DB + auth in one place — PostgreSQL, user authentication, and Row Level Security (RLS) should all be available within a single platform, without combining multiple services.
- Minimal initial cost — at a pre-revenue stage, fixed costs had to be minimized. Free tiers or usage-based billing were mandatory.
Railway satisfied the first and third criteria. Supabase satisfied the second and third. Combining the two covers everything from FastAPI app deployment to PostgreSQL operations, authentication, and RLS — without any additional external services.
This post covers both services in detail based on real operational experience, highlights things to know when running them in combination, and outlines the criteria for knowing when to migrate away from this stack.
Architecture Overview
graph TB
subgraph Client
A[Web App - Vercel]
end
subgraph Railway
B[FastAPI Server]
C[Health Check Endpoint]
end
subgraph Supabase
D[PostgreSQL DB]
E[Connection Pooler - PgBouncer]
F[Auth Service]
G[REST API - PostgREST]
H[Realtime]
I[Storage]
end
subgraph External
J[GitHub Repository]
end
A -->|HTTPS| B
B -->|port 6543 - Transaction Mode| E
E --> D
A -->|Direct| F
J -->|push trigger| B
C -->|DB connection check| E
style E fill:#f9f,stroke:#333
style B fill:#bbf,stroke:#333
The critical detail in this diagram is that the FastAPI server does not connect directly to the Supabase DB. It routes through the Connection Pooler (PgBouncer). Why this matters is covered in detail in the Supabase section.
Railway: In Detail
Core Feature: GitHub Auto-Deploy
Railway’s greatest strength is GitHub-integrated automatic deployment. Once you connect a repository, every push to the main branch automatically triggers a build and deploy. No separate GitHub Actions workflow required.
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant RW as Railway
participant Svc as Service
Dev->>GH: git push origin main
GH->>RW: Webhook trigger
RW->>RW: Detect Dockerfile / Nixpacks
RW->>RW: Build image
RW->>Svc: Deploy new version
RW->>RW: Health check pass
RW-->>Svc: Route traffic to new version
Note over RW,Svc: Zero-downtime rolling deploy
The deployment process works as follows:
- Push to GitHub triggers Railway via webhook
- If a Dockerfile exists, Docker build runs; otherwise, Nixpacks auto-configures the build environment
- After image build completes, deployment goes to a new container
- Traffic switches to the new version once health check passes
For a typical FastAPI app, build time (including dependency installation) runs 1-3 minutes, and traffic switchover completes within tens of seconds. Writing your own Dockerfile enables layer caching, which dramatically reduces build time when dependencies haven’t changed.
Preview Deploy
Opening a PR automatically creates a Preview Deploy for that branch. It gets its own independent URL, so you can verify real behavior during PR review. This feature is similar to Vercel’s Preview Deploy but applied to backend services, which makes it particularly useful.
One caveat: you need to decide upfront how to manage the DB connection string for Preview Deploys. Using the production DB is risky, and connecting a separate staging DB requires environment variable separation.
Usage-Based Billing
Railway’s billing is calculated on three axes:
| Billing Item | Unit | Notes |
|---|---|---|
| vCPU | Per hour | Based on actual CPU time used |
| Memory | GB-hours | Based on usage, not allocation |
| Network Egress | GB | Only outbound traffic is billed |
| Disk | GB-hours | When persistent storage is used |
The advantage of this structure is low costs when traffic is low. The disadvantage is that it’s hard to know the exact cost until you open the month-end invoice. The dashboard shows a real-time Estimated Cost, but this is a running total, not a month-end projection.
Railway’s billing model is a “pay for what you use” structure. If there’s no traffic, costs are nearly zero — but a traffic spike can generate unexpected charges. Setting up Spending Alerts is not optional; it’s essential.
Developer Experience (DX)
Railway’s DX ranks among the best in its class. Here’s a breakdown by category:
| DX Area | Rating | Details |
|---|---|---|
| Initial setup | Excellent | First deploy within 5 minutes of GitHub connection |
| Dashboard | Excellent | Visual service topology, intuitive |
| Logs | Good | Real-time streaming supported; search and filtering are limited |
| Environment variable management | Excellent | Per-service separation, reference variable (${{}} syntax) support |
| CLI | Good | Basic commands like railway run, railway logs |
| Documentation | Fair | Core content is there, but edge case docs are lacking |
| Community | Good | Active Discord, fast response times |
A note on logging: Railway’s built-in log viewer is sufficient for debugging, but falls short for structured log searching or long-term retention. For production, integrating an external log service (Axiom, Better Stack, etc.) is recommended. Railway supports log drain, so integration isn’t difficult.
Limitations
Limitations observed while using Railway:
- Cold starts: When there’s no traffic, instances enter a sleep state. The first request can take several seconds. For FastAPI apps, this includes Uvicorn startup time, making the perceived delay noticeable.
- Limited regions: Region options are fewer than AWS or GCP. Asia regions exist, but there’s no Seoul region for Korean users. This directly impacts response latency.
- Scaling control: Horizontal scaling (running multiple instances) is possible, but you can’t configure the fine-grained auto-scaling policies available with AWS ECS or Kubernetes. This is a limitation for services with irregular traffic patterns.
- Cron job limitations: You can spin up a separate cron service on Railway, but there’s no built-in scheduler. You need to embed a library like APScheduler directly in your app or separate it into a dedicated service.
Supabase: In Detail
PostgreSQL as a Service
Supabase’s core offering is managed PostgreSQL. Instance provisioning, backups, and upgrades are handled by Supabase, so you rarely need to directly manage DB operations.
Supabase is architecturally a set of services layered on top of PostgreSQL. You don’t have to use all of them. Here’s the breakdown of what WICHI actually uses versus what it doesn’t:
| Feature | Used? | Purpose / Reason for Not Using |
|---|---|---|
| PostgreSQL DB | Yes | Core data store |
| Auth | Yes | Email/OAuth authentication, JWT issuance |
| Row Level Security (RLS) | Yes | Per-user data access control |
| Connection Pooler | Yes | Railway→DB connection management |
| SQL Editor | Yes | Ad-hoc queries, data verification |
| REST API (PostgREST) | No | FastAPI handles queries directly |
| Realtime | No | No real-time features needed currently |
| Storage | No | No file upload functionality |
| Edge Functions | No | Server logic handled by Railway |
Looking at just the features we use, Supabase is effectively “managed PostgreSQL + Auth + RLS.” This combination alone significantly reduces management overhead compared to operating separate authentication (Auth0, Firebase Auth) and database services (RDS, Cloud SQL).
Auth Service
Supabase Auth is a GoTrue-based authentication service. It supports email/password authentication and OAuth (Google, GitHub, etc.) login out of the box, and handles JWT token issuance and refresh automatically.
On the FastAPI server, we verify the JWT sent by the client and execute DB queries based on the user ID in the token. Not having to build the auth service from scratch is the biggest advantage.
There’s an important caveat, though. Supabase Auth’s JWT includes role and sub (user ID) by default. RLS policies can identify the current user via auth.uid(), but this only works automatically when accessing through the Supabase client library. When querying PostgreSQL directly from FastAPI, RLS is not automatically applied — you need to write separate access control logic on the server side.
RLS does not mean “configured and therefore safe.” The scope of RLS enforcement depends on how the DB is accessed. Accessing with the
service_rolekey bypasses RLS entirely, so managing this key is the crux of security.
Connection Pooling: A Deep Dive
This is the most important configuration in Supabase operations, warranting its own section.
graph LR
subgraph Railway
A1[FastAPI Instance 1]
A2[FastAPI Instance 2]
A3[FastAPI Instance 3]
end
subgraph Supabase Connection Pooler
B[PgBouncer<br/>port 6543<br/>Transaction Mode]
end
subgraph PostgreSQL
C[DB Instance<br/>Max Connections Limited]
end
A1 -->|conn 1| B
A1 -->|conn 2| B
A2 -->|conn 3| B
A2 -->|conn 4| B
A3 -->|conn 5| B
B -->|pooled conn A| C
B -->|pooled conn B| C
B -->|pooled conn C| C
style B fill:#f96,stroke:#333
PostgreSQL has a physical limit on concurrent connections. On Supabase’s Free tier, this limit is even stricter. If the FastAPI server creates a new DB connection per request, connection exhaustion occurs under concurrent load.
The Connection Pooler (PgBouncer) solves this. It accepts connection requests from multiple clients and reuses a small number of actual PostgreSQL connections.
Supabase offers two Pooler modes:
| Mode | Port | Behavior | Best For |
|---|---|---|---|
| Transaction | 6543 | Allocates/returns connections per transaction | Serverless patterns, short queries |
| Session | 5432 (Pooler) | Holds connection until session ends | Cases requiring persistent connections |
WICHI’s backend uses Transaction mode. Each FastAPI request is an independent transaction, so Transaction mode — which returns the connection immediately after query execution — is the right fit.
Configuration caveats:
- The
DATABASE_URLenvironment variable must use the Pooler address (port 6543). Using a direct connection (port 5432) means no connection pooling, and you’ll hit the concurrent connection limit quickly. - Transaction mode does not support
PREPAREstatements orLISTEN/NOTIFY. If using SQLAlchemy, you may need to setprepared_statement_cache_size=0. - Migrations must use the direct connection. DDL commands (CREATE TABLE, ALTER TABLE, etc.) can behave unexpectedly when run through the Transaction mode Pooler.
Free Tier vs Pro Plan
Supabase pricing based on publicly available information:
| Item | Free | Pro ($25/mo) |
|---|---|---|
| DB size | 500 MB | 8 GB (expandable) |
| Auth MAU | 50,000 | 100,000 |
| Storage | 1 GB | 100 GB |
| Edge Function invocations | 500K/month | 2M/month |
| Daily backups | Not included | 7-day retention |
| Branching | Not included | Available |
| Pausing | Auto-pause after 7 days of inactivity | None |
The biggest constraint on the free tier is auto-pausing after 7 days of inactivity. A paused project must be manually restored from the dashboard, taking several minutes. This is fine for personal projects or early development, but for any service with real users, upgrading to Pro is mandatory.
The free tier’s 500 MB DB limit is reached faster than you’d expect. If you’re storing log-type data or history tables in the DB, you can hit the ceiling within weeks. It’s worth designing early on which data lives in the DB and which gets offloaded elsewhere.
Migration Management
Supabase supports a migration workflow via its CLI. Commands like supabase migration new and supabase db push let you manage schema changes.
The temptation is the dashboard’s SQL Editor. “Let me just quickly add one table” via a direct CREATE TABLE in the dashboard creates drift between your local migration files and the actual DB schema. This drift gets harder to resolve the more it accumulates.
Recommended workflow:
- Always write schema changes as local migration files
- Apply to the remote DB with
supabase db push - Use the dashboard SQL Editor for data queries only
- Urgent data modifications (DELETE, UPDATE) can go through the dashboard, but schema changes (DDL) should never be done there
Operational Experience
Deployment Speed and Stability
Railway’s deployments are stable. From GitHub push to the new version receiving traffic, it generally takes 2-4 minutes. Zero-downtime deploy is supported, so there’s no service interruption during deployment.
However, when a build fails, the previous version is retained — which means “not realizing a deployment failed” is possible. Without separate deployment failure alerts, you won’t know until you manually check the dashboard. Setting up Slack or Discord webhook integration early is recommended.
Uptime
Both Railway and Supabase offer SLA-backed plans, but the perceived uptime in the Free-to-Pro range is in the high 99% range. We haven’t experienced any major service outages, but intermittent delays and brief anomalies did occur.
Incident Log
Key incidents experienced during the operational period:
| # | Type | Symptoms | Root Cause | Detection Method | Resolution Time | Action Taken |
|---|---|---|---|---|---|---|
| 1 | DB connection exhaustion | Spike in API 500 errors | Direct connection (no pooler) under concurrent load | Manual error log review | ~30 min | Switched to Pooler (port 6543) |
| 2 | Cold start delay | First request timeout | Heavy model load during startup after sleep | User report | ~10 min | Lightened startup event |
| 3 | Migration conflict | Deploy failure | Dashboard DDL change conflicted with CLI migration | Deploy log review | ~1 hour | Manual migration file sync |
| 4 | Missing env variable | Service startup failure | Typo in Railway environment variable | Deploy log review | ~5 min | Added env variable validation script |
| 5 | Supabase auto-pause | DB connection failure | Free tier 7-day inactivity auto-pause | Health check failure alert | ~5 min (manual restore) | Upgraded to Pro plan |
The highest-impact incident was #1 (connection exhaustion). Operating with direct connections instead of the Pooler, we hit the concurrent connection limit as users grew. The API intermittently returned 500 errors, and diagnosing the cause took time.
Most incidents were caused by deferring configuration we already knew we should do. Connection Pooler setup, environment variable validation, and alert configuration are things that should be completed alongside the first deployment.
Logging and Monitoring
Railway’s logging system is sufficient for debugging but falls short for production-grade monitoring.
| Item | Railway Built-in | Additional Tooling Needed |
|---|---|---|
| Real-time log streaming | Provided | — |
| Log search/filter | Basic level | When structured log searching is needed |
| Log retention period | Limited | When long-term retention is needed |
| CPU/Memory metrics | Dashboard provided | Alert threshold configuration |
| Custom metrics | Not provided | Prometheus, Datadog, etc. |
| Error tracking | Not provided | Sentry, etc. |
| APM | Not provided | Separate tooling if needed |
The same goes for Supabase. The dashboard shows basic metrics like DB size, connection count, and API request count, but query performance analysis and slow query tracking require separate configuration.
The realistic approach: start with just the Railway and Supabase built-in dashboards in the early stage, then incrementally add Sentry (error tracking) and log drain (long-term log retention) as users grow. Setting up a full monitoring stack from day one is overkill.
Performance Benchmarks
Approximate figures measured in the actual production environment. Since exact numbers vary by service characteristics, they’re expressed as ranges.
| Item | Range | Conditions |
|---|---|---|
| API response time (warm) | 50-200 ms | Including DB query, via Pooler |
| API response time (cold start) | 3-8 sec | First request after sleep |
| Deployment time | 2-4 min | Dockerfile build, dependency cache hit |
| DB query (simple SELECT) | 5-20 ms | Indexed table |
| DB query (with JOINs) | 20-100 ms | Varies by table size and index design |
| Pooler overhead | ~5 ms | Additional latency vs. direct connection |
The 3-8 second cold start is not negligible for a production service. However, it only occurs after a period with no traffic, so it’s not an issue during active hours. Users accessing intermittently during nighttime or early morning hours may notice it.
Cost Structure
Specific dollar amounts are not disclosed, but here is the structural overview.
Railway Cost Characteristics
Railway uses usage-proportional billing. Costs are low in months with little traffic and rise in high-traffic months. This unpredictability is Railway’s billing model’s defining characteristic.
Methods for controlling costs:
- Spending Alert: Configure notifications when estimated monthly cost exceeds a threshold
- Sleep policy adjustment: Allowing instances to sleep during zero-traffic periods reduces cost (at the trade-off of cold starts)
- Resource caps: Setting CPU and memory ceilings indirectly limits cost ceilings
Supabase Cost Characteristics
Supabase uses tier-based billing. Tiers progress from Free → Pro ($25/mo) → Team ($599/mo) → Enterprise (custom negotiation), with overage charges within each tier.
The typical trigger for upgrading from Free to Pro is one of the following:
- Hitting the 500 MB DB limit
- Needing to avoid the 7-day auto-pause
- Requiring daily backups
- Needing branching functionality
Total Cost Structure Comparison
| Stage | Railway | Supabase | Overall Cost Profile |
|---|---|---|---|
| Development/testing | Free to minimal | Free ($0) | Nearly free |
| Small-scale operations | Usage-proportional | Pro ($25/mo) | Fixed + variable blend |
| Medium-scale operations | Usage-proportional (increasing) | Pro + overage | Growing variability |
| Large-scale | Hard to predict | Team/Enterprise | Migration evaluation zone |
Alternative Comparisons
Server Hosting: Railway vs Render vs Fly.io
| Item | Railway | Render | Fly.io |
|---|---|---|---|
| Deployment method | GitHub push auto | GitHub push auto | CLI (fly deploy) primarily |
| Build | Dockerfile / Nixpacks | Dockerfile / auto-detect | Dockerfile / Buildpacks |
| Cold starts | Yes (on sleep) | Yes (Free tier) | No (minimum 1 machine always on) |
| Pricing model | Usage-based | Instance-based + usage | Instance-based + usage |
| Preview Deploy | Supported | Supported | Not supported (manual setup needed) |
| Region selection | Limited | US/EU | 30+ regions |
| DX | High (intuitive dashboard) | High | Medium (CLI proficiency needed) |
| Docker support | Native | Native | Native |
| Scaling | Vertical + horizontal (limited) | Vertical + horizontal | Vertical + horizontal (fine-grained) |
Why Railway was chosen: It had the best DX, and the fact that deployment was complete with just a GitHub push was decisive. Fly.io is superior in performance and region coverage but its CLI-based workflow increases initial setup cost. Render is similar to Railway, but Railway’s dashboard UX was better at the time of evaluation.
Database: Supabase vs PlanetScale vs Neon
| Item | Supabase | PlanetScale | Neon |
|---|---|---|---|
| DB engine | PostgreSQL | MySQL (Vitess) | PostgreSQL |
| Built-in auth | Auth included | Not included | Not included |
| RLS | Supported | Not supported (MySQL limitation) | Supported (PostgreSQL) |
| Branching | Pro and above | Supported (core feature) | Supported (core feature) |
| Connection pooling | PgBouncer built-in | Custom proxy | Custom proxy |
| Serverless driver | supabase-js | @planetscale/database | @neondatabase/serverless |
| Free tier | 500 MB, 50K MAU | Free tier discontinued in 2025 | 512 MB, 190 compute hours |
| Additional services | Realtime, Storage, Edge Functions | None (DB only) | None (DB only) |
Why Supabase was chosen: Being able to get Auth + RLS + PostgreSQL in one place was decisive. Neon is PostgreSQL-based and similar to Supabase, but lacks Auth, requiring a separate authentication service. PlanetScale is MySQL-based, which excluded it for a PostgreSQL-preferring project.
Migration Decision Criteria
Conditions Where This Stack Can Be Maintained
- Monthly active users (MAU) in the low thousands or fewer
- Traffic is not centered on Korea/Asia (regional latency is not critical)
- Response time requirements are not strict (intermittent cold starts are acceptable)
- Team size is small (1-3 people) — no dedicated infrastructure personnel
- Infrastructure costs are contained within a monthly fixed budget
Signals That Warrant Migration Evaluation
graph TD
A[Maintain Current Stack] --> B{Cold starts impacting<br/>the business?}
B -->|No| C{Costs within<br/>predictable range?}
B -->|Yes| G[Need always-on environment]
C -->|Yes| D{Multi-region<br/>needed?}
C -->|No| H[Evaluate fixed-rate plans or<br/>reserved instances]
D -->|No| E{DB size within<br/>Supabase Pro limits?}
D -->|Yes| I[Evaluate AWS/GCP migration]
E -->|Yes| F[Maintain current stack]
E -->|No| J[Evaluate self-hosted PostgreSQL<br/>or RDS]
G --> K[Evaluate Fly.io / AWS ECS]
H --> K
I --> L[Full cloud migration]
J --> L
style F fill:#9f9,stroke:#333
style K fill:#ff9,stroke:#333
style L fill:#f99,stroke:#333
Specifically, when any one of these conditions applies, it’s time to evaluate migration:
- Cold starts are impacting the business: If real-time response becomes a core requirement, you need an always-on environment.
- Costs become unpredictable: If Railway’s usage-based billing produces high month-to-month variance and cost control becomes difficult, switching to a service with fixed pricing is more rational from a business management perspective.
- Multi-region is needed: If you need to reduce latency for a global user base, a cloud provider with CDN and flexible region selection is advantageous.
- DB size exceeds Supabase Pro limits: If you’re handling data beyond 8 GB or need complex query optimization, self-managed PostgreSQL or AWS RDS should be evaluated.
- Compliance requirements emerge: If regulations apply to data storage location, encryption standards, audit logging, etc., managed services may not provide sufficient control.
Why Leaving Supabase Is Hard
Migrating from Supabase to another PostgreSQL service is not just a data migration problem. Each Supabase feature has a different migration difficulty level.
| Feature | Migration Difficulty | Reason |
|---|---|---|
| PostgreSQL data | Low | Can migrate via pg_dump / pg_restore |
| Schema + RLS policies | Medium | RLS policies depend on Supabase-specific functions (auth.uid()) |
| Auth | High | User tables, JWT structure, and OAuth configuration all need to be rebuilt |
| Storage | Medium | S3-compatible so data transfer is possible, but URL references need updating |
| Realtime | High | Requires building a replacement solution from scratch |
| Edge Functions | Medium | Deno-based; requires rewriting for another serverless platform |
The key is Auth. If you’re dependent on Supabase Auth, migration means redesigning your entire authentication system. This isn’t a simple infrastructure swap — it’s closer to a product architecture change. It’s important to recognize this lock-in at the outset when choosing Supabase.
Stack migration should not be “we’ll switch when problems arise.” Define migration criteria upfront and check them periodically. When the migration moment arrives, you’re already under operational load — so set the criteria while you still have bandwidth.
Combined Operations Checklist
When setting up Railway + Supabase for the first time, the following items should be completed before the first deployment:
- Set the Supabase Pooler address (port 6543) in the Railway service
- Separate
service_rolekey andanonkey in environment variables - Configure Railway Spending Alert
- Set up deployment failure alerts (Slack/Discord webhook)
- Implement health check endpoint (including DB connection status)
- Write RLS policies and verify with test data
- Decide on migration workflow (CLI only, no dashboard DDL)
- Decide on Preview Deploy DB connection strategy (whether to separate a staging DB)
- Confirm Supabase free tier auto-pause policy
- Decide on log retention strategy (built-in vs. log drain)
Conclusion
Railway + Supabase is a stack well-suited for early-stage products. It delivers deployment automation, managed PostgreSQL, and built-in authentication with minimal configuration. The trade-off is accepting cold starts, unpredictable costs, and monitoring limitations.
The essence of this stack is “choosing not to spend time on infrastructure so you can focus on the product.” Once the product is validated and traffic stabilizes, evaluate migration — but until then, maximizing this stack’s development speed advantage is the rational approach.
That said, starting with AWS/GCP from day one is over-engineering. Define your migration criteria in advance, but maintain the current stack until real bottlenecks appear.
Related Posts

NVIDIA Tax — How GPUs Capture Most of AI's Profits
Dissecting NVIDIA H100's 88% margin structure, analyzing how the AI chip monopoly impacts the ecosystem, and examining 3 scenarios for disruption

Anatomy of the AI Market in 3 Layers — What the $660B Really Looks Like
First installment of a series analyzing the AI market across three layers: infrastructure ($500B+), platform ($18B+), and applications ($80B+)

The Advisor Pattern Is a Price Tag, Not Architecture
What surfaces on the second read of Anthropic's Advisor Tool. This isn't new architecture — it's a temporary fix shaped by 2026 pricing. A pattern that disappears once Opus prices drop, and eleven other papers from the same period are quietly moving the same way. The anchor of the series.