Mohammed Vepari

Backend & Infrastructure Engineer

Honours Bachelor of Computer Science (2026) • 83.95% GPA • Architecting fault-tolerant, auto-scaling distributed systems using Go, Node, and AWS Terraform.

Reliability | Observability | FinOps | Operational Excellence

Available for on-site, hybrid, or remote roles (EST/EDT).

Brampton, ON

5-Second Summary

Backend / Systems Engineer building reliable distributed systems focused on concurrency, failure handling, observability, and scalable backend design.

Flagship project: NetPulse. Academic foundation includes Operating Systems, Distributed Systems (85%), Theory of Computing, Java, Python, and database systems.

Live Systems

6

Design Docs

6

Engineering Posts

5

Enterprise SaaS Alignment Snapshot

Fast scan for new-grad backend and enterprise SaaS teams: fundamentals, office readiness, communication through documentation, and shipped systems.

Enterprise SaaS New Grad

BCS 2026, 83.95% GPA, Java/Python/TypeScript foundation, and Toronto-area availability for office-based teams.

Backend Engineer

Java/Python-ready fundamentals, Node.js APIs, queue-worker execution paths, Postgres/Redis persistence, and clear service boundaries.

Platform Engineer

AWS ECS/Fargate, Terraform-managed topology, ALB deployment paths, and infrastructure-first design decisions.

CS Fundamentals

Operating Systems, Distributed Systems (85%), Theory of Computing, data structures, algorithms, and databases.

[Engineering Identity]

Honours Bachelor of Computer Science (2026) • 83.95% GPA • Architecting fault-tolerant, auto-scaling distributed systems using Go, Node, and AWS Terraform.

Focused on SRE-grade platform delivery with infrastructure provisioning, cost-aware scaling, and production failure-recovery automation.

// LIVE_CANDIDATE_SIGNAL: PORTFOLIO_NODE_01
[0.0002] CANDIDATE_PROFILE: backend_infrastructure_engineer
[0.0005] ACTIVE_SIGNAL: flagship_systems_live
[0.0008] CORE_STACK: go_node_terraform_fargate
[0.0011] PRINCIPLES: frugality_operational_excellence
[0.0014] STATUS: available_onsite_hybrid_remote_est_edt
_

System Design Docs

6 Architecture Deep Dives

Dedicated design pages for the systems behind the portfolio: routing, queues, telemetry, AI evidence grounding, full-stack product flows, and NetPulse reliability upgrades.

Flagship Design Track

NetPulse: Distributed Uptime Monitoring SaaS

mTLS regional checkers -> queue -> monitoring engine -> PgBouncer + Postgres/Redis -> status dashboard + incident lifecycle.

Phase 2

Implemented

Production Onboarding + Reviewer-Safe Demo Access

Strengthened NetPulse from an architecture demo into a reviewable SaaS workflow with Cognito registration, email verification, login, and demo-safe access paths.

Phase 3

Next Build Target

Real-User Evidence + Public Status Page Upgrade

Moves NetPulse from staged validation language toward verifiable live-product evidence through public status pages, timestamped uptime summaries, and safer demo-mode boundaries.

Full-Stack Product Engineering

moveYSplash: Social Platform Prototype

Next.js UI + app APIs -> Supabase Auth/Postgres -> feed composer + indexed search pipeline.

Next.jsTypeScriptTailwind CSSSupabase

- Responsive multi-device UX with authenticated user workflows.

- SQL-backed persistence with query optimization and indexing strategy.

Latest: Published a live deployment link to make academic project outcomes directly reviewable.

Open System Design

Scaling & Messaging Systems

Cloud Sandbox

ALB execution ingress -> queue and DLQ lanes -> Fargate Spot worker pool -> result store -> recovery scheduler via EventBridge.

Node.jsAWS FargateEventBridgeTerraform (IaC)

- Asynchronous queue-worker execution model with bounded retries and durable result flow.

- Tenant-aware API boundary with safer sandbox and runtime guardrails.

Latest: Introduced Terraform-governed dual-endpoint model for control plane and execution API traffic separation.

Open System Design

Scaling & Messaging Systems

Real-Time Transit Telemetry Dashboard

Transit data feeds -> telemetry processor -> event store -> live dashboard + websocket broadcaster + alert hooks.

DashboardTelemetryJavaScriptAWS S3

- Event-time ordering + idempotency dedupe for resilient streaming semantics.

- Adaptive backpressure and queue buffering for burst handling stability.

Latest: Added event-time ordering, idempotency dedupe, and late-arrival correction for robust stream semantics.

Open System Design

Distributed Systems & Cloud APIs

Edge Balancer (Go)

miniloadbalancer.io -> ALB/TLS ingress -> regular ECS service running Go proxy + control plane -> Consul discovery -> backend pool -> Prometheus/Grafana telemetry.

AWS ECSGoConsulPrometheus

- Multiple runtime-selectable balancing strategies with control-plane visibility.

- Health-aware failover with hysteresis and graceful draining for safer lifecycle transitions.

Latest: Migrated the public deployment from AWS App Runner to regular ECS and cut over the live endpoint to miniloadbalancer.io.

Open System Design

Full-Stack Product Engineering

AI Gateway Platform

sharedaigateway.com -> ingress -> ECS Express Mode service -> requirement parser -> project evidence retrieval -> prompt orchestration -> structured fit brief UI.

AWS ECS Express ModeNext.jsTypeScriptLLM Integration

- Requirement parsing and evidence retrieval before prompt execution.

- Structured response generation designed for consistent skimability.

Latest: Migrated the public deployment from AWS App Runner to ECS Express Mode and cut over the live endpoint to sharedaigateway.com.

Open System Design

Architecture_And_Runbooks

Engineering writing focused on architecture decisions, incident response, and operating economics. This is the evidence layer behind the project metrics.

Upgrade Log: Portfolio Apps as Production-Style Systems

How NetPulse, Cloud Sandbox, Transit Telemetry, Edge Balancer, AI Gateway Platform, and moveYSplash now map to clearer architecture proof and live evidence paths.

Read Upgrade Log

Migration Notes: App Runner to ECS for AI Gateway Platform and Edge Balancer

Why both services outgrew App Runner, why AI Gateway Platform moved to ECS Express Mode while Edge Balancer moved to regular ECS, and which AWS alternatives remained viable.

Read Migration Notes

ADR: Fargate Spot and Firecracker Isolation Strategy

Why I selected AWS Fargate Spot + isolation-first execution boundaries over EC2 worker fleets for asynchronous payload workloads.

Read ADR

Post-Mortem: Surviving a 15k Req/Min Payload Spike

Failure timeline, root-cause analysis, and queue-decoupling response path used to stabilize throughput and memory pressure.

Read Post-Mortem

FinOps Report: 70% Compute Reduction with Spot

Cost and scaling analysis for elastic worker execution using spot capacity, queue-depth triggers, and recovery guardrails.

Read FinOps Report

Engineering_Notes

Incident-style writeups that show architecture decisions, load behavior, bottleneck analysis, and measurable outcomes.

View All Posts
2026-06-08 · 12 min

Portfolio App Upgrade Log: Queues, ECS, Pooling, and Evidence-Grounded AI

How I upgraded the portfolio apps from isolated demos into reviewable production-style systems with clearer runtime boundaries, system design docs, and live evidence paths.

- Problem, architecture, stress, resolution, impact

- Target teams: Veeva, Amazon, Stripe, Canonical

2026-04-01 · 11 min

Migrating from AWS App Runner to ECS: Why I Split AI Gateway Platform and Edge Balancer by Workload Fit

Why App Runner was useful for first delivery, where it stopped fitting these workloads, how I moved AI Gateway Platform to ECS Express Mode and Edge Balancer to regular ECS, and which AWS alternatives remained viable.

- Problem, architecture, stress, resolution, impact

- Target teams: Amazon, Canonical, Veeva, Stripe

2026-03-07 · 9 min

Queue-First Cloud Sandbox: Preventing Worker Starvation Under Burst Load

How I shifted from request-coupled execution to queue-worker isolation to keep execution throughput stable under burst traffic.

- Problem, architecture, stress, resolution, impact

- Target teams: Amazon, Stripe, DoorDash

Core Infrastructure Engineering

Production Systems Portfolio

Core infrastructure systems and reliability engineering projects. All systems are provisioned via Infrastructure as Code (Terraform), instrumented with deep observability pipelines, and rigorously tested through chaos drills and load validation.

Engineering Behavioral Signals

- Built systems with explicit failure handling, retries, and reliability controls.

- Documented architecture tradeoffs and scaling decisions in each project deep dive.

- Demonstrated deployment + observability readiness with public demos and live metrics.

Distributed Systems & Cloud APIs

Reliability-focused backend systems with routing, failover, and API-driven operations.

PgBouncer + mTLS | 10k+ Regional Write Spike Validation

Distributed Systems & Cloud APIs

NetPulse: Distributed Uptime Monitoring SaaS

Demonstrates secure and high-concurrency monitoring architecture with reliability controls that stay stable under aggressive regional write spikes.

Tech: Next.js, Node.js, PostgreSQL, PgBouncer, mTLS, Docker

Architecture: mTLS regional checkers -> queue -> monitoring engine -> PgBouncer + Postgres/Redis -> status dashboard + incident lifecycle.

Architecture Snapshot

mTLS regional checkers
  -> queue
  -> monitoring engine
  -> PgBouncer + Postgres/Redis
  -> status dashboard + incident lifecycle.

- Implemented PgBouncer for advanced PostgreSQL connection pooling, preventing database connection exhaustion during 10,000+ concurrent regional worker write load tests.

- Enforced Zero-Trust architecture by establishing Mutual TLS (mTLS) encryption between distributed regional checkers and the centralized monitoring engine.

Update: Added dedicated registration with Cognito email verification and full login flow for production-style onboarding.

Roadmap: Phase 2 + Phase 3 improvements documented

App Runner -> Regular ECS | Go pprof + Consul | Prometheus/Grafana

Distributed Systems & Cloud APIs

Edge Balancer (Go)

Shows when an edge-routing service outgrows App Runner and needs regular ECS service-level control for proxying, health management, and deployment behavior.

Tech: AWS ECS, Go, Consul, Prometheus, Grafana, pprof

Architecture: miniloadbalancer.io -> ALB/TLS ingress -> regular ECS service running Go proxy + control plane -> Consul discovery -> backend pool -> Prometheus/Grafana telemetry.

Architecture Snapshot

miniloadbalancer.io
  -> ALB/TLS ingress
  -> regular ECS service running Go proxy + control plane
  -> Consul discovery
  -> backend pool
  -> Prometheus/Grafana telemetry.

- Migrated the service from AWS App Runner to regular ECS so service rollout policy, health-probe cadence, task behavior, and ingress wiring could be controlled directly.

- Conducted deep runtime profiling using Go pprof to identify and eliminate memory allocation bottlenecks, optimizing goroutine scheduling for high-throughput TCP proxying.

Update: Migrated the public deployment from AWS App Runner to regular ECS and cut over the live endpoint to miniloadbalancer.io.

Scaling & Messaging Systems

Queue-based and real-time data pipelines designed for throughput, isolation, and safe execution.

Fargate Spot FinOps + DLQ Recovery | 15k+ Req/Min Burst Tests

Scaling & Messaging Systems

Cloud Sandbox

Shows SRE-first backend platform engineering where sandbox isolation, autoscaling, queue durability, and cost-efficiency are first-class requirements.

Tech: Node.js, AWS Fargate, EventBridge, Terraform (IaC), FinOps

Architecture: ALB execution ingress -> queue and DLQ lanes -> Fargate Spot worker pool -> result store -> recovery scheduler via EventBridge.

Architecture Snapshot

ALB execution ingress
  -> queue and DLQ lanes
  -> Fargate Spot worker pool
  -> result store
  -> recovery scheduler via EventBridge.

- Architected a highly elastic worker pool utilizing AWS Fargate Spot instances via Terraform, reducing distributed compute costs by 70% for asynchronous payload processing.

- Engineered a self-healing queue ecosystem using Redis Dead Letter Queues (DLQ) and AWS EventBridge cron triggers, achieving 100% payload recovery during staged network partition drills.

Update: Introduced Terraform-governed dual-endpoint model for control plane and execution API traffic separation.

Node.jsAWS FargateEventBridgeTerraform (IaC)FinOps
Live Dashboard | Real-Time Signal View

Scaling & Messaging Systems

Real-Time Transit Telemetry Dashboard

Demonstrates real-time data engineering, stream correctness, and observability-first operations reporting.

Tech: Dashboard, Telemetry, JavaScript, AWS S3, Data Visualization

Architecture: Transit data feeds -> telemetry processor -> event store -> live dashboard + websocket broadcaster + alert hooks.

Architecture Snapshot

Transit data feeds
  -> telemetry processor
  -> event store
  -> live dashboard + websocket broadcaster + alert hooks.

- WebSocket telemetry updates delivered route refreshes in ~1 second windows under normal load.

- Idempotency + late-event correction eliminated duplicate state writes in replay testing.

Update: Added event-time ordering, idempotency dedupe, and late-arrival correction for robust stream semantics.

DashboardTelemetryJavaScriptAWS S3Data Visualization

Full-Stack Product Engineering

End-to-end applications with product UX, backend workflows, and documented design decisions.

Search Performance Improved by 90%

Full-Stack Product Engineering

moveYSplash: Social Platform Prototype

Demonstrates end-to-end product delivery, responsive UX, and measurable query optimization in a live academic project.

Tech: Next.js, TypeScript, Tailwind CSS, Supabase, PostgreSQL

Architecture: Next.js UI + app APIs -> Supabase Auth/Postgres -> feed composer + indexed search pipeline.

Architecture Snapshot

Next.js UI + app APIs
  -> Supabase Auth/Postgres
  -> feed composer + indexed search pipeline.

- Search latency improved by ~90% after SQL query and indexing optimization.

- Maintained responsive interaction across mobile and desktop breakpoints.

Update: Published a live deployment link to make academic project outcomes directly reviewable.

Next.jsTypeScriptTailwind CSSSupabasePostgreSQL
App Runner -> ECS Express Mode | Structured LLM Workflow

Full-Stack Product Engineering

AI Gateway Platform

Shows applied AI product engineering plus infrastructure judgment: App Runner was fast for first launch, but ECS Express Mode became a better fit once deployment behavior, ingress policy, and service tuning mattered more.

Tech: AWS ECS Express Mode, Next.js, TypeScript, LLM Integration, Prompt Orchestration, Structured Outputs

Architecture: sharedaigateway.com -> ingress -> ECS Express Mode service -> requirement parser -> project evidence retrieval -> prompt orchestration -> structured fit brief UI.

Architecture Snapshot

sharedaigateway.com
  -> ingress
  -> ECS Express Mode service
  -> requirement parser
  -> project evidence retrieval
  -> prompt orchestration
  -> structured fit brief UI.

- Migrated the AI service from AWS App Runner to ECS Express Mode to keep a lighter managed experience while gaining more explicit control over deployment behavior and service tuning.

- Converted unstructured job descriptions into normalized requirement signals and evidence-backed summaries for repeatable analysis.

Update: Migrated the public deployment from AWS App Runner to ECS Express Mode and cut over the live endpoint to sharedaigateway.com.

AWS ECS Express ModeNext.jsTypeScriptLLM IntegrationPrompt OrchestrationStructured Outputs

Recent Engineering Upgrades

Deployed June 2026

Recent platform and portfolio updates with direct proof links so reviewers can verify shipped improvements without hunting through the site.

Architecture Evidence

Production Architecture Upgrade Log Published

New blog post explains the latest app upgrades across NetPulse, Cloud Sandbox, Transit Telemetry, Edge Balancer, AI Gateway Platform, and moveYSplash with proof links and workload-fit tradeoffs.

Read Upgrade Log

Architecture Evidence

Project System Design Library Added

Homepage now surfaces dedicated system design docs for NetPulse, Cloud Sandbox, Transit Telemetry, Edge Balancer, AI Gateway Platform, and moveYSplash.

View Design Docs

Flagship Roadmap

NetPulse Phase 2-3 Track Published

NetPulse now calls out Phase 2 onboarding/demo-access improvements and the Phase 3 real-user evidence plan for public status pages, timestamped uptime summaries, and demo-mode boundaries.

Open NetPulse Design

Deployment Architecture

ECS Migration Reflected for AI Gateway Platform and Edge Balancer

AI Gateway Platform and Edge Balancer now document the App Runner to ECS migration path, public domain cutovers, and why workload fit drove ECS Express Mode for AI Gateway Platform versus regular ECS for Edge Balancer.

Open Migration Deep Dive

Homepage Structure

Core Infrastructure Positioned First

Core infrastructure systems now appear ahead of non-engineering history so reviewers see production work first.

View Core Infrastructure

Technical Positioning

Infrastructure Skills Map Refreshed

Technical skills now foreground Go, AWS, Terraform, Docker, Prometheus, Grafana, and Redis/BullMQ reliability workflows.

View Technical Skills

Access Paths

Live Portfolio Apps Centralized

Contact section now includes direct live links for NetPulse, Cloud Sandbox, Transit Telemetry, Edge Balancer, and AI Gateway Platform.

Open Contact Section

Project Evidence

Project Metrics Upgraded

Project cards now emphasize workload-fit deployment choices, queue/DLQ recovery, PgBouncer/mTLS, event-time telemetry, and pprof-driven optimization outcomes.

View Flagship Systems

Narrative Structure

Runbooks Elevated in Homepage Flow

Architecture runbooks and incident-style writing now sit near the top of the homepage to keep the portfolio centered on engineering proof.

Open Runbooks

Technical Skills

Systems_Stack_Map

Languages & Runtime

Java

Backend fundamentals and object-oriented systems design.

STRONG

Python

Coding challenge preparation, scripting, and systems tooling support.

STRONG

TypeScript

Type-safe backend and frontend platform development.

ACTIVE

Node.js

Distributed API services, queue workers, and async processing.

ACTIVE

Go

Load balancing, concurrency control, and runtime profiling.

ACTIVE

SQL

Schema design, indexing, and query optimization.

ACTIVE

CS Fundamentals

Operating Systems

Relevant coursework for systems and infrastructure roles.

COMPLETE

Distributed Systems

Coursework aligned with monitoring, queues, and reliability tradeoffs.

85%

Theory of Computing

Formal foundations for computation, language, and complexity reasoning.

COMPLETE

Data Structures

Core interview preparation and project data-modeling decisions.

STRONG

Algorithms

Routing visualizer, graph traversal, and complexity tradeoffs.

STRONG

Infrastructure & Cloud

AWS (Fargate, ALB, VPC)

Cloud-native deployment and service networking.

ACTIVE

Terraform (IaC)

Repeatable infrastructure provisioning and change control.

ACTIVE

Docker

Containerized service packaging and runtime consistency.

ACTIVE

Linux / cgroups

Isolation and resource-bound execution constraints.

PROJECT

Observability & Reliability

Prometheus

Service metrics for traffic and health behavior visibility.

ACTIVE

Grafana

Operational dashboards for failure and scaling diagnostics.

ACTIVE

Go pprof

Heap and CPU profiling for runtime bottleneck elimination.

ACTIVE

Redis / BullMQ DLQ

Queue durability and dead-letter recovery workflows.

ACTIVE

mTLS + Incident Controls

Zero-trust communication and alert lifecycle hardening.

ACTIVE

Data & Web Platforms

PostgreSQL / MySQL

Relational data modeling and persistence.

ACTIVE

React / Next.js

Dashboards and technical web interfaces.

ACTIVE

REST / WebSocket APIs

Realtime and request-response service integration.

ACTIVE

Supabase

Rapid full-stack data and auth integration.

PROJECT

Angular

Capstone implementation for complex user workflows.

PROJECT

Experience

Operational ownership, reliability awareness, and delivery discipline carried into production-style project work.

Portfolio Engineering

Independent Systems & Infrastructure Developer

2024 - Present

- Architected and deployed production-grade distributed systems with live demos, system design docs, and measurable reliability metrics.

- Provisioned AWS cloud environments via Terraform, including ALB-routed services and queue-worker execution patterns.

- Implemented cost-aware autoscaling patterns, DLQ recovery workflows, and observability instrumentation for failure analysis.

- Published architecture decision records and incident-style post-mortems for public technical review.

Amazon Fulfillment

Fulfillment Associate

2016 - Present

- Managed high-volume inventory processing while consistently meeting critical path deadlines in a fast-paced logistics environment.

- Maintained 99% accuracy in order fulfillment through strict quality checks and proactive defect identification.

- Adapted quickly to shifting operational priorities and helped clear major backlogs during peak seasonal demand.

- Recognized for reliability, punctuality, and safety compliance across long-term tenure.

Academic_Foundation

Algoma University

Honours Bachelor of Computer Science

Expected Graduation: 2026
GPA: 83.95%
// Graduating with Honours

Relevant Coursework

Operating SystemsDistributed Systems (85%)Theory of ComputingData Structures and AlgorithmsObject-Oriented ProgrammingDatabase SystemsDiscrete MathematicsWeb Application Development

George Brown College

Computer Programming and Analysis

2022 - 2023
GPA: 3.72/4.0
// Graduating with Honours

Relevant Coursework

JavaPythonSQLSoftware DesignWeb Development

Engineer_Profile

Systems Mindset with Product Delivery Discipline

Honours Bachelor of Computer Science (2026) • 83.95% GPA • Architecting fault-tolerant, auto-scaling distributed systems using Go, Node, and AWS Terraform. I focus on building systems that are both understandable and resilient under stress, then presenting that evidence in ways hiring teams can verify quickly.

Education

Honours BCS, Expected 2026

Primary Focus

Distributed Systems, Platform APIs, Reliability

Operating Principles

- Design for failure first, then optimize for speed.

- Publish measurable impact, not vague implementation claims.

- Document architecture and tradeoffs so teams can reason quickly.

Contact

I am actively seeking full-time New Grad Software Engineer roles (2026). I have shipped public, production-style projects with live demos and source code. I am based in Brampton and available for Toronto-area on-site or hybrid office schedules. The fastest way to reach me is by email.