Back to cases
Case Info

EduHam — Coding Learning Platform from Zero to Scale

A coding learning platform built from scratch in twelve months. Custom multi-language sandbox, instructor authoring tools, real-time classrooms, and a closed-loop analytics stack — running 5,000 concurrent learners by handover.

The Challenge

A B2C learn-to-code platform with two product modes — self-paced courses and instructor-led cohorts — and one hard requirement: every lesson ends in code the learner actually runs, not a multiple-choice question. Over twelve months we built the platform from an empty repository: identity, courses, lesson authoring, quiz and assignment builders, a multi-language code-execution sandbox, real-time classrooms, billing, and a closed-loop analytics pipeline. By month twelve the platform served 5,000 concurrent learners across ten UI languages and executed 50,000+ code submissions a day.

Four Builder Surfaces, One Platform

Every course is assembled from four kinds of unit. Treating them as separate apps would have given instructors four logins, four content models, and four analytics surfaces. Treating them as one platform required a shared content model and a single submission pipeline — the choice paid back the first time an instructor dropped a code assignment into the middle of a live classroom in three clicks, not a release.

Dimension Lesson BuilderQuiz BuilderAssignment BuilderLive Classroom
Primary author Curriculum leadInstructorInstructorInstructor
Learner surface Self-paced readerSelf-paced or cohortSelf-paced or cohortCohort only
Sandbox required Optional (snippets)Optional (code questions)AlwaysAlways
Completion measure Time-on-page + scroll depthScore + attemptsTests passedAttendance + submissions

When the curriculum team launched an early language track in month seven, they assembled it from existing unit pools in eleven business days — no new components shipped.

Solution Architecture

Microservice decomposition

A handful of Laravel services with hard domain boundaries. Most own their own Postgres database; tightly-coupled domains share a connection rather than forcing a boundary that was not real. Each service exposes a versioned contract — HTTP for synchronous calls, a message queue for async paths.

One frontend, three surfaces

A single Vue (TypeScript) codebase split into three surfaces — learner app, instructor authoring, operator dashboard — sharing the same component library and i18n bundle. Pusher carries real-time presence, live submissions, and instructor notifications.

Isolated code sandbox

The sandbox is a separate domain on its own Kubernetes node pool. A Go orchestrator schedules submissions onto pre-warmed language containers — seccomp-locked, egress-denied, ephemeral rootfs. Ten programming languages by month twelve, p99 cold start under 500 ms.

Closed-loop analytics

Managed change-data-capture streams operational databases into a columnar warehouse. The data team and instructors query learner behaviour, completion curves, and content health on near-real-time dashboards. Reporting is live views with notification triggers — not a morning email.

Domain microservices, one Vue frontend, an isolated sandbox node pool, and a CDC→warehouse analytics spine.

Domain microservices, one Vue frontend, an isolated sandbox node pool, and a CDC→warehouse analytics spine.

Code Sandbox: Request to Result in Under 500 ms p99

The sandbox is the load-bearing piece of the product. A learner who waits four seconds for their first code run drops; one who waits 200 ms keeps going. Four stages run in sequence on the orchestrator's hot path, each with its own latency budget.

Stage 1

Validate

< 30 ms
6%

Auth, rate limits, payload size, language whitelist, lightweight syntax pre-check. Hard cut on obvious abuse.

Stage 2

Schedule

< 80 ms
18%

Pick a pre-warmed container from the language-specific pool, attach the submission rootfs, apply the security profile. No image pull on the hot path.

Stage 3

Execute

< 320 ms typical
62%

Run the learner's code with a wall-clock timeout, memory cap, CPU quota, and egress-denied network namespace. The hot pool keeps the cold-start tail off the p99.

Stage 4

Capture and score

< 70 ms
14%

Stream stdout/stderr, diff against expected output for assignments, persist artifacts, emit a submission event to the analytics bus.

The cheapest win was deleting the image-pull step from the hot path: a 3.2-second median cold start in week six became a 90 ms hot start by month nine.

Learner Funnel — Month 3 MVP vs Month 12 at Scale

The MVP shipped at month six with a single language, no real-time, and a single UI language. Month twelve served ten programming languages, real-time classrooms, and instructor-built assignments. The funnel compares cohorts of 1,000 newly signed-up learners on the month-3 MVP against the month-12 production platform — same channel, same onboarding email.

Step
Before
After
Signed up
1,000
1,000
Completed onboarding
61%
84%
Ran first code submission
44%
78%
Passed first assignment
21%
52%
Active on day 30
11%
22%

The largest single lift came from cutting the first-code-run delay. When the median time from signup to first successful code execution dropped from 4 minutes 12 seconds to 38 seconds, the day-7 retention curve moved with it.

Technical Stack

Customer + operator UI

Vue 3TypeScriptPinia

One frontend codebase for learner, instructor, and operator surfaces

Domain services

Laravel (PHP)Per-service Postgres

Domain-rich CRUD that ships weekly across identity, courses, billing, notifications

Sandbox orchestrator

GoDockerCustom language images

Submission scheduler in Go for predictable latency; seccomp-locked, egress-denied containers per submission

Real-time

Pusher Channels

Managed because the team was small and the SLA was tight

Analytics

Managed CDCColumnar warehouse

Bought instead of built — connector quality and schema-drift handling outpaced what an eight-person team could maintain

Infrastructure

Managed KubernetesTwo node pools (general + sandbox)

Burst capacity for cohort launches; the sandbox node pool runs under tighter isolation than general workloads

Key Technical Decisions

Custom sandbox over a hosted runner

Tradeoff: Six engineering weeks of upfront work and a node pool to operate from day one

Why: Hosted runners either capped at three languages we cared about, charged per-execution at a multiple of our own pool cost at scale, or sat behind a public egress we could not lock down. Owning the sandbox kept p99 cold start under 500 ms, opened the door to ten languages within twelve months, and gave us a single security profile to reason about.

Pusher Channels over a self-hosted WebSocket fleet

Tradeoff: Per-message vendor cost; channel sharding strategy had to be designed early

Why: Real-time would have eaten two engineers for three months. Pusher gave us sub-100 ms fan-out on day one. Channel namespaces by tenant kept the vendor cost linear with paying cohorts, not with total signups.

Managed CDC + warehouse over building our own ETL

Tradeoff: Vendor cost on the connector; less control over edge-case schema changes

Why: An in-house ETL would have absorbed a data engineer for six months and produced a worse result. The data team shipped instructor dashboards in month nine instead of month fifteen.

Implementation Timeline

M1–M2

Discovery, architecture, core hires

Domain map, service boundaries, hiring loop designed. First three engineers on board by end of month two.

M3–M4

Identity, courses, authoring shell

Laravel skeleton for identity and courses, Vue shell for instructor authoring, first lesson rendered end-to-end.

M5–M6

MVP — sandbox v1, quiz, assignment, billing

Code sandbox v1 in production with two languages, quiz and assignment builders shipped, Stripe live. Private beta opens at end of month six.

M7–M8

Real-time classrooms, i18n to 10 UI languages

Pusher integration for classrooms, first cohort runs, i18n bundle to 10 UI languages, more programming languages added.

M9–M10

Analytics warehouse, instructor dashboards

Managed CDC into the warehouse, learner-event schema modelled, near-real-time instructor and CRO dashboards.

M11–M12

Scale hardening and handover

Channel sharding by tenant, sandbox pool sized for peak, observability hardened, on-call handed to the in-house team. Platform exits at 5,000 concurrent and steady.

Challenges and How We Solved Them

The Problem

First sustained-load stress at ~3,000 concurrent. A paid-acquisition push in month eight ran longer than the sandbox pool had been sized for, and the real-time fan-out plus Postgres read pressure on the courses service hit at the same fifteen-minute window.

Approach

Sharded real-time channels by tenant so one noisy classroom could not starve the fleet, added read replicas behind the courses service, moved hot reads for course catalogues into Redis, and resized the sandbox pre-warm pool to track the previous hour's submission rate.

Outcome

Latency stayed inside SLA for the remaining 90 minutes of the campaign. The same architecture absorbed 5,000 concurrent two months later without a second incident. The catalogue cache cut p95 lesson-load from 1.4 s to 320 ms.

The Problem

Sandbox abuse — shortly after opening a lower-level language runner, two learners tested the boundaries. One ran an outbound port scanner from inside the container; another launched a long-lived background process that consumed compute for forty minutes before the wall-clock budget killed it.

Approach

Locked the egress namespace to DNS plus the internal package mirror, tightened the security profile to deny raw sockets, and added a per-container CPU monitor that auto-kills jobs whose wall-clock-to-CPU ratio matches a sustained-abuse profile.

Outcome

Zero confirmed escape attempts since. The abuse-detection rule has fired three times in nine months — all genuine. The egress-denied default became a documented platform invariant.

Numbers That Moved

5,000+

Concurrent learners (peak)

was 0

50,000+ / day

Code submissions executed

was 900 / day

480 ms

Sandbox cold start, p99

was 3.2 s

10

Programming languages supported

was 1

10

UI languages shipped

was 1

41%

Day-7 learner retention

was 28%

22 min

MTTR on production incidents

was 3h 40min

8

Engineers on the team

was 0

Engagement Team

Oleksandr Kotliarov — Engineering LeadBackend (domain services) — 3 engineersSandbox / Platform — 1 engineerFrontend — 2 engineersData / Analytics — 1 engineerSRE / DevOps — 1 engineer (joined month 10)

Lessons Learned

Designing for scalability on day one is cheaper than retrofitting it in month nine. The architectural decisions made before the second engineer joined paid back every time the platform absorbed a step-change in concurrent learners.

Observability is non-negotiable in a code-execution platform. Every container had structured telemetry from the first commit; sandbox abuse incidents were caught and closed in days, not weeks.

Analytics velocity is bought, not built, at this stage. Managed CDC plus a managed warehouse replaced six months of platform engineering — the data engineer instead shipped dashboards the curriculum team used every day.

Other cases

More cases

View all cases
FinCue — Consumer Lending Platform across Two Markets
FinTech | enterprise

FinCue

FinCue — Consumer Lending Platform across Two Markets

10.6% FPD30+ default rate (was 18.4%)
41% Approval rate (was 27%)
38 sec Time-to-decision, median (was 14 min)
ObjectFirst — Web Platform Rebuild for an Enterprise Storage Vendor
Enterprise | enterprise

ObjectFirst

ObjectFirst — Web Platform Rebuild for an Enterprise Storage Vendor

13 Engineers hired and operating from zero
Live Subscription product launched alongside the CAPEX appliance business
0 WordPress instances in production after month nine