Case Study: PsyStat Nexus | dev://portfolio

The Problem

The researchers the software industry forgot

SPSS costs $99 a month. Stata costs $195. Mplus costs $500+. R is free but has a learning curve that filters out half the people who need it. Every one of them requires a desktop computer. None of them check your assumptions before running the test. None of them tell you in plain English what the result means. And when you're done, you still have to manually format the output for APA 7th edition.

Behavioral science researchers — psychologists, sociologists, education researchers, public health analysts — currently split their workflow across five or more tools: Excel for data entry, SPSS for analysis, Word for writeup, a textbook for interpretation, and maybe R when SPSS can't do what they need. Every tool switch is friction. Every manual formatting step is error-prone. Every $99/month license is a barrier for a grad student on a stipend.

As a senior psychology major, I lived this problem. The tools were designed for statisticians, not for the researchers who actually use statistics to answer questions about human behavior. PsyStat exists because the question "is my data normal?" shouldn't require a Stack Overflow deep-dive.

By the Numbers

240+

Statistical Methods

33

Modules

77

API Router Groups

13

Research Intelligence Tools

6

Self-Paced Courses

$0

Free Tier (all 240 methods)

The Solution

One app, one tap, one workflow

PsyStat consolidates the entire research statistics workflow into a single application. Data preparation with outlier detection and MICE imputation. Test selection via a guided 5-step wizard. 240+ methods from t-tests to Bayesian MCMC. Automatic assumption checking before every analysis. Publication-ready APA 7th edition output. AI interpretation in plain English. Export to PDF, CSV, Excel, PowerPoint, BibTeX.

The scope is genuinely comprehensive: psychometrics (Cronbach's alpha, IRT, measurement invariance), SEM (path analysis, multilevel, cross-level mediation), Bayesian analysis (Bayes factors, MCMC, posterior predictive checks), causal inference (DAGs, instrumental variables, propensity scores, E-values), meta-analysis (fixed/random effects, publication bias detection, trim-and-fill), machine learning, time series, network analysis, survival analysis, text analysis, and more. 33 modules. 240+ endpoints. Everything a behavioral scientist reaches for R or SPSS to do.

And it runs on your phone. Not a crippled mobile companion — the full platform via Expo and React Native. Start a Bayesian MCMC on the bus, check results at lunch. Your statistics lab fits in your pocket.

Research Intelligence

The tools that make researchers better, not just faster

Speed isn't enough. PsyStat includes 13 research intelligence tools designed to catch the mistakes that ruin studies before they're published.

The Adversarial Reviewer simulates Reviewer 2 — the methodological critic every researcher dreads. It reads your analysis and asks the hard questions: "Have you checked for outliers? Your sample size underpowers this effect. Did you correct for multiple comparisons?" Better to hear it from your app than from a journal rejection six months later.

Multiverse Analysis generates specification curves showing whether your finding survives different analytical choices. Outlier method, missing data strategy, transformation, test variant — researchers make dozens of arbitrary decisions. The specification curve reveals which findings are robust and which are fragile artifacts of one specific path through the decision space.

The Assumption Dashboard runs automated pre-flight checks before every analysis: Shapiro-Wilk, Levene, Mauchly, D'Agostino, Anderson-Darling. Visual green/amber/red report cards. When violations occur, the system suggests alternative tests. When your data isn't normal, it doesn't just warn you — it recommends Mann-Whitney instead of the t-test and explains why.

Most tools let you run invalid analyses without warning. PsyStat refuses to let bad methodology leave the building.

The Hard Parts

What made this difficult

240 methods without monolithic code Each statistical domain is a separate FastAPI router importing its own libraries (scipy, statsmodels, scikit-learn, lifelines, networkx). 77 router groups, independently versioned and testable. The modular architecture is the only reason one developer can maintain 240+ endpoints.
NumPy serialization Python's numpy arrays, DataFrames, and complex statistics output don't serialize to JSON. Every API response passes through a custom NumpySafeResponse serializer that converts numpy types to native Python before encoding. Sounds trivial until a single int64 crashes your entire response pipeline.
Mobile-first with heavy computation React Native on older phones can't handle scipy-level computation. Solution: offload all numerical work to the FastAPI backend. The frontend is pure UI. Results cache client-side (30-minute LRU) to avoid re-computation. The app stays responsive even on budget hardware.
AI context without token bloat Passing full analysis history and all 240 methods to Claude blows up token usage. The backend maintains a lightweight context: last 5 messages, current module, 20 most recent analysis summaries. Enough for the AI to be genuinely helpful at ~2K tokens per call instead of 20K.
Multiverse combinatorial explosion Checking all combinations of analytical decisions (outlier method × missing data strategy × transformation × test variant) scales exponentially. The specification curve engine enumerates decision points explicitly, runs each combination, and renders the range of possible conclusions as a visual. Revealing whether a finding is robust shouldn't itself be computationally intractable.
Trusting the libraries PsyStat doesn't re-implement scipy. That's a feature, not a limitation. Battle-tested algorithms (scipy, statsmodels, scikit-learn, lifelines) produce results that match SPSS and R. Researchers can cite PsyStat output in publications because the math underneath is the same math underneath everything else. The innovation is the interface, not the computation.

Architecture

How it's built

React Native frontend via Expo 55 with React 19. File-based routing via Expo Router. 80+ TypeScript components. FastAPI backend with Python 3.12, running on Railway with Uvicorn. Supabase PostgreSQL with row-level security — every user sees only their own data. JWT authentication. Claude AI (Sonnet default, Opus for Scholar tier) proxied through the backend so the frontend never contacts Anthropic directly.

Web deployment via Vercel. Mobile builds via EAS (Expo Application Services) for iOS and Android. Rate limiting: 60 req/min authenticated, 20 req/min unauthenticated. Max body 10 MB. Max array 100K items. 30-minute LRU result cache on the client.

Three-tier pricing: Free (all 240 methods, 10 AI messages/month), Researcher ($9.99/mo, unlimited saves, 100 AI messages), Scholar ($49.99/mo, everything including Opus access and priority support). Compare: SPSS $99/month.

React Native Expo 55 TypeScript FastAPI Python Supabase Claude AI scipy statsmodels scikit-learn Railway Vercel

Why It Matters

Democratize rigor, not just access

The free tier includes all 240 methods because statistical rigor shouldn't be behind a paywall. A grad student on a $20K stipend deserves the same assumption checking, the same specification curves, the same publication-ready output as a tenured professor with a site license. The tools that make research better should be available to everyone doing research.

But access without understanding is just a different kind of problem. PsyStat includes 6 self-paced courses, interactive simulations (CLT visualizer, p-value explorer, power/error tradeoff), a 62-term glossary, 27 module help guides in both plain and technical language, and a guided wizard that recommends tests based on your research design. The goal isn't to make statistics easier to run. It's to make statistics easier to understand.

The most dangerous analysis is the one that ran without error on data that violated every assumption. PsyStat exists to make sure that doesn't happen — not by restricting what you can do, but by making sure you know what you're doing before you do it.