Cloud Sandbox System Design

Scaling & Messaging Systems

Fault-tolerant cloud sandbox platform for isolated code execution, queue resilience, and high-throughput payload processing.

Fargate Spot FinOps + DLQ Recovery | 15k+ Req/Min Burst Tests

Live Demo Source Repository Read Production Upgrade Log

Why This Project Matters

Shows SRE-first backend platform engineering where sandbox isolation, autoscaling, queue durability, and cost-efficiency are first-class requirements.

Tech + Architecture Summary

Tech: Node.js, AWS Fargate, EventBridge, Terraform (IaC), FinOps
Architecture: ALB execution ingress -> queue and DLQ lanes -> Fargate Spot worker pool -> result store -> recovery scheduler via EventBridge.

Impact Metrics

Architected a highly elastic worker pool utilizing AWS Fargate Spot instances via Terraform, reducing distributed compute costs by 70% for asynchronous payload processing.
Engineered a self-healing queue ecosystem using Redis Dead Letter Queues (DLQ) and AWS EventBridge cron triggers, achieving 100% payload recovery during staged network partition drills.
Tuned Node.js V8 garbage collection and libuv thread-pool sizing to prevent memory leaks during sustained 15,000+ req/min payload spikes.

Core Problem

Execute untrusted user code safely while controlling runtime limits, output size, and request-level isolation.

Build Notes

What I Owned

This project is where I practiced separating request intake from execution work so one slow or unsafe job does not control the whole service path.

Hard Lesson

The important lesson was that execution platforms are mostly about isolation and backpressure; the language runner matters less than the safety boundary around it.

Next Enhancement

Next I would add a visible job timeline with queued, running, completed, failed, and DLQ states so reviewers can watch the lifecycle instead of only seeing the API response.

High-Level Architecture

mermaid
graph LR
  Client[Web Client]-->Control[Execution Control API]
  Control-->API[Execution API Endpoint]
  API-->Queue[Execution Queue]
  Queue-->Worker[Sandboxed Workers]
  Worker-->Result[Execution Result Store]
  Result-->API
  API-->Control

Production-Grade Capabilities

Asynchronous queue-worker execution model with bounded retries and durable result flow.
Tenant-aware API boundary with safer sandbox and runtime guardrails.
Terraform-managed infrastructure topology with explicit control-plane and execution-plane separation.

Engineering Decisions

Strict sandbox limits improve safety but can reject edge-case workloads that need higher resource ceilings.
Queue-based execution improves throughput stability, but adds extra latency compared to direct synchronous execution.
Splitting web and API deployments improves scalability isolation, but increases operational surface area.

Behavioral + Impact Signals

Designed around safe defaults for sandboxing and bounded retries.
Prioritized service isolation to protect user-facing workflows from backend spikes.
Added operational observability and auditability for execution lifecycle events.

Quality Guarantees

Every execution request runs with bounded CPU and memory limits.
Execution output is returned in a deterministic response format.
Failed runs do not block subsequent queue processing.

Recent Upgrades

Introduced Terraform-governed dual-endpoint model for control plane and execution API traffic separation.
Expanded Cloud Sandbox scope into a mini Replit/Judge0-style platform with async queue-worker execution and tenant quota controls.
Added stronger sandbox controls: bounded runtime resources, idempotent job handling, and audit visibility.
Clarified the live ALB endpoint as the execution API proof path and documented queue/DLQ recovery in the production upgrade log.

Outcome Highlights

Upgraded deployment to separate web app and API endpoints for clearer platform architecture.
Shipped a public cloud API endpoint for real execution requests.
Designed for isolation-first execution behavior under backend constraints.
Implemented execution flow with queue-worker reliability patterns.