Case Study
Forked Tongue
Not a teacher. Not a chatbot. A friend who happens to speak the language you want to learn. Voice chat, cultural immersion, and a 3D neural network that lights up as your brain rewires itself around a new tongue.
The Problem
Language apps teach you words. They don't teach you language.
Duolingo can teach you that "the cat is on the table." Cool. Now order ramen in Shinjuku. The gap between a vocabulary drill and a real conversation is the entire distance between studying a language and speaking one — and most apps refuse to cross it.
You can study for months and still freeze when a real person talks to you. The problem isn't effort. It's architecture. Flashcards train recall. Conversations train fluency. Those are different wiring jobs, and no amount of flashcards will finish the second one.
Language isn't a vocabulary list. It's relationships, jokes, cultural in-jokes, the rhythm of how a real speaker hedges and interrupts and teases. That can't be gamified into a streak. It can only be practiced with something that behaves like a person.
By the Numbers
16
AI Characters
8
Languages
16
Prompt Layers
10
External Services
4
Conversation Modes
3
Apps in Monorepo
$0
Free Tier Voice
~90%
Prompt Cache Savings
The Solution
Give the student a friend, not a curriculum
Forked Tongue is 16 AI characters across 8 languages — Korean, Japanese, Spanish, Portuguese, Mandarin, Cantonese, American English, British English. Joon and Soeun in Seoul. Ren and Hina in Tokyo. Diego and Valentina in Mexico City. Each one has a backstory, a city, a personality, opinions, and a way of speaking that is specifically theirs.
They text you first. No placement test. No curriculum. You figure it out together, the same way you would if a real person you just met started talking to you in their language. The AI adapts to how you learn, remembers your weaknesses, and adjusts difficulty in real time.
Voice chat runs on-device for free users — Kokoro TTS in the browser, Web Speech API for input, zero cost, zero latency trip to a vendor. Premium users get ElevenLabs voices that are lifelike enough to forget you're talking to a model. Speak. Listen. Feel it.
Your buddy also knows the weather where you are. Rain in Seoul, heat wave in Mexico City, first snow in Tokyo — the conversation can start with what's actually outside your window. It sounds like a small touch. It's the difference between talking to a chatbot and talking to a friend who just looked up from their phone.
How Conversations Work
16 layers of context, rebuilt every message
Every message you send triggers a dynamic prompt assembly from 16 distinct sources. The character's persona and speaking style. The conversation mode you picked — Chill, Nudge, Immersion, or Tutor — which controls correction density and how much target language versus English gets mixed in. A summary of the conversation so far. Cross-session character memory, so the character "remembers" you.
On top of that: a pronunciation profile specific to your native-language-to-target-language pair. Past correction discoveries — your recurring error patterns. Teaching strategies that have worked for you before. Emotional state guidance that detects frustration or confidence and adjusts encouragement accordingly. Cross-user difficulty patterns that flag universally hard concepts.
Layered in probabilistically: a 10% chance of an immersion moment (brief full-target-language burst, memory-personalized), a 5% chance of a contextual scenario ("let's pretend you're ordering at a cafe"), a 20% chance of an active recall challenge on something you learned earlier. The randomness is the point. Real conversations aren't scheduled.
All of it is stitched together and sent to Claude, which streams back token-by-token over WebSocket. Prompt caching cuts repeated-prefix costs by roughly 90%, which is the only reason giving every free user unlimited access to this much context is economically survivable.
Cultural Immersion
Mistakes, songs, and proverbs that sound unhinged
The characters make mistakes in English — on purpose. The same mistakes a real Korean, Japanese, or Spanish speaker would actually make. When you catch them, you earn points. You're not being tested. You're catching your friend slipping up, which is exactly the social dynamic that makes language stick.
They break into K-pop songs mid-conversation. Translate menus with way too much enthusiasm. Teach you proverbs that sound absolutely unhinged when translated literally ("even a monkey can fall from a tree"). These moments are randomized and memory-personalized. They're the reason you'll remember a word six months later.
Your progress renders as a 3D neural network visualization. Nodes light up as concepts are learned. Weak areas glow red. Mastery spreads like electricity through a graph of your own mind. It's a visualization that maps to real learning state, not a XP bar with confetti.
Buddies also know each other. Joon in Seoul and Ren in Tokyo are friends. Put them in a conversation together and Korean and Japanese bounce off each other — while you learn both at once. Spanish and Portuguese. Mandarin and Cantonese. American and British English. Dual-language immersion that uses linguistic similarity as a teaching tool instead of fighting it.
Learn through relationship, not curriculum. That's the design philosophy.
The Hard Parts
What made this difficult
- Building a per-user linguistic model that actually adapts The Linguistic Neural Network is a custom engine that maps what a user knows, finds gaps, and drives adaptive difficulty per message. Not a static curriculum — a live graph of competence and weakness that the conversation layer reads from and writes to every turn. Weak nodes, error-prone concepts, and suggested practice targets all feed back into the next prompt.
- Assembling 16 prompt layers on every request without falling over Character persona. Mode instructions. Adaptive fragment. Conversation memory. Character memory. Language triangulation across the user's other languages. Teaching notes. Linguistic insights. Cross-user patterns. Pronunciation profile. Emotional state. Immersion trigger. Scenario suggestion. Active recall. Content filter. All composed in order, all cached intelligently, all shipped before the user notices latency.
- Free-tier voice that costs the platform nothing Premium voice uses ElevenLabs and Deepgram. Free voice has to cost zero. Solution: Kokoro TTS running as in-browser WASM for output, Web Speech API for input. On-device. Private. No network round-trip to a paid vendor. The free tier is actually usable instead of being a paywall in disguise.
- One codebase, three deliverables A monorepo shared across an Express API, a React 19 + Vite 6 web app, and an Expo SDK 52 React Native mobile app. Characters, types, AI wrappers, and the LNN engine live in shared packages. Web and mobile both consume the same character definitions and the same conversation engine, so a new feature lands everywhere at once instead of three times.
- Subscription plumbing across web, iOS, and Android Stripe for web credit-card subscriptions. RevenueCat wrapping Apple StoreKit and Google Play Billing for mobile. Webhook verification on both sides, unified subscription state in Postgres, and a tier model (Free / Plus / Premium) that gates message counts, scenarios, and premium voice independently of platform.
Architecture
How it's built
A TypeScript monorepo managed with npm workspaces. The API is Express on Node 20, talking to PostgreSQL 17 via Prisma 6. Redis 7 handles rate limiting, token blacklisting, teaching-notes cache, pronunciation profiles, and cross-user pattern caching. REST endpoints for everything transactional. WebSocket for streaming conversation tokens in real time.
The web app is React 19 on Vite 6, a client-side SPA. The mobile app is Expo SDK 52 on React Native 0.76, managed workflow. Shared packages hold the 16 character definitions and system prompts, the Claude / ElevenLabs / Deepgram wrappers, the common type library, and the LNN engine itself — so the phone app and the web app run the same brain.
Claude Haiku 4.5 handles the bulk of conversational turns with aggressive prompt caching. Sonnet 4.6 is reserved for vision, scenario generation, and complex correction analysis. ElevenLabs (multilingual v2) and Deepgram (Nova-2) serve premium voice. Stripe handles web billing, RevenueCat handles mobile IAP. AdSense and AdMob serve free-tier ads. Deployed through Docker Compose in dev; production hosting is live on the web.
Roadmap
Launched on web, mobile next
Forked Tongue is live at forked-tongue.vercel.app in alpha. All 16 characters, all 8 languages, the free tier, the pricing ladder, the voice pipeline, the brain visualization — all of it ships now, on the web, in a browser, no download required.
The Expo build is in development for iOS and Android distribution. Same characters. Same conversation engine. Same LNN driving adaptation. The mobile app is a different surface, not a different product — which is the whole point of the shared-packages monorepo architecture.
Pick a language. Someone will text you. In their language. No placement test, no setup. You'll figure it out together. That's what a friend does.