Skip to main content
Hemanth.
Back to Feed
2025Backend & Systems

Deploy: Distributed Container Orchestration Engine

A custom container orchestration platform, similar to a lightweight Kubernetes, designed to automatically monitor, scale, and recover failing server nodes in under 30ms.

Rationale

Why Deploy.sh? Because Kubernetes is massive, resource-heavy, and often overkill for specialized, resource-constrained environments. I essentially challenged myself to build a mini-cloud platform from scratch. I wrote the backend in Go because I needed it to be super fast. By engineering a custom Go scheduler using a Bin Packing algorithm, I improved hardware utilization by 25% without the heavy operational overhead of managing a full Kubernetes cluster.

Tech Stack

GoPostgreSQLDockerREST APIGrafanagRPCPrometheus
System Architecture

Core Engineering Challenge

The Bottleneck: Streaming live CPU/Memory metrics from a Go backend to a React frontend can easily cause aggressive DOM repaints, freezing the user's browser.

The Backend Architecture

I utilized Go's concurrency model, running a background goroutine ticker that continuously updates system state in SQLite (using custom GORM interfaces to handle JSON arrays). The Go WebSocket Hub then broadcasts this payload instantly.

The Frontend Trade-off

To prevent UI freezing, I bypassed standard React state (useState/useReducer). I intercepted the WebSocket stream and fed the payload directly into the React Query cache (queryClient.setQueryData). This trades some frontend architecture flexibility for a highly optimized, polling-free DOM update cycle.

Key Highlights

  • Reduced node failure detection time to <30ms by engineering a custom distributed control plane in Go.
  • Built a CLI and Dashboard utilizing Grafana to visualize real-time node metrics, exposing cluster state via RESTful API.
  • Ensured cluster consistency and crash recovery by implementing a Write-Ahead Log (WAL) in PostgreSQL, creating a robust failover system for controller outages.
  • Integrated Prometheus for distributed tracing and real-time observability.
  • Improved hardware utilization by 25% by designing a custom scheduler using Bin Packing algorithm to optimize memory allocation across worker nodes.

Architecture Details

A microservices-based distributed control plane built entirely in Go, mimicking core aspects of Kubernetes to understand low-level distributed orchestration.

1. Control Plane & gRPC Heartbeats

  • Worker nodes establish a bidirectional gRPC stream with the Master Node.
  • The heartbeat frequency is extremely tight. If a container crashes, my system detects it and restarts it automatically within a minute.

2. Custom Load Scaling

  • Instead of random placement, the master's scheduling algorithm evaluates current Memory and CPU availability.
  • It tracks CPU usage globally and auto-provisions more nodes dynamically if load spikes above 70%.

3. WAL & Postgres Failover

  • To prevent split-brain and ensure consistency if the Master Node crashes, all cluster state changes are appended to a Write-Ahead Log (WAL) stored in a master/slave PostgreSQL deployment.

Interactive System Design

Click nodes to inspect engineering flow

API & Admission Control

JWTRBACValidating Webhooks
Engineering Insight

Extracts Bearer tokens and authorizes subjects against internal FSM RoleBindings. Intercepts layouts to strip Privileged Containers mapping to PodSecurityStandards.

Technical Walkthrough