Container Orchestration: Kubernetes on Bare Metal

Every managed Kubernetes service, EKS, GKE, AKS, runs on bare metal underneath. The control plane runs on physical hardware. Your worker nodes are either virtual machines renting slices of physical servers, or bare metal instances that remove the VM layer entirely. The managed service value is in the control plane automation and ecosystem integrations, not in any fundamental infrastructure advantage.

Running K8s on InMotion bare metal or dedicated servers means your pods run directly on physical hardware with no hypervisor overhead, predictable NVMe storage for persistent volumes, and a fixed monthly cost that does not scale with node-hours or API call volume.

The Hypervisor Overhead Problem in Cloud K8s

Cloud Kubernetes worker nodes are virtual machines. KVM, Xen, or Hyper-V sits between your containers and the physical hardware. This introduces two performance taxes that bare metal eliminates:

CPU overhead: Hypervisors typically add 5-15% CPU overhead for system calls and context switches. For workloads doing heavy system call activity (network-intensive services, I/O-bound applications), this is measurable latency.

Memory overhead: Hypervisors maintain their own memory structures alongside VM memory. A 16GB cloud worker node has less than 16GB available for Kubernetes system components and pods after hypervisor and guest OS overhead.

On bare metal, a 192GB server gives Kubernetes the full 192GB minus OS kernel overhead (roughly 2-4GB). Every GB of node capacity is real, not nominal.

Cluster Architecture Options

Single-Node K8s for Development and Staging

A single InMotion Hosting Extreme server running k3s or kubeadm with master and worker roles combined is a practical staging environment. k3s is particularly suitable here: it runs a production-grade Kubernetes distribution with a single binary, SQLite (or external etcd for HA), and a minimal footprint that leaves more resources for workloads.

Single-node K8s is not production-appropriate for workloads requiring high availability (one node failure takes everything down), but it is ideal for mirroring production configurations in staging without paying for multiple servers.

Multi-Node Production Clusters

A production Kubernetes cluster needs at minimum 3 control plane nodes for etcd quorum. Practically, many teams run 1 dedicated control plane server plus 2-3 worker nodes. With InMotion dedicated servers:

Control plane: Advanced tier ($149.99/mo), 64GB RAM is sufficient for K8s control plane components on clusters under 100 nodes

Worker nodes: Extreme tier ($349.99/mo) per worker for memory-intensive workloads; Essential or Advanced for lighter pod profiles

Network: 10Gbps port on worker nodes for inter-pod traffic in high-throughput service meshes

Pod Density Planning on 192GB / 16-Core Hardware

Kubernetes pod density depends on resource requests and limits defined in pod specs. A rough planning framework:

Pod ProfileCPU RequestMemory RequestPods per 192GB NodeMicroservice (typical)100m256MB~600 pods (memory bound)Web application pod250m512MB~300 pods (memory bound)API service pod500m1GB~160 pods (memory bound)Database sidecar / operator1 core4GB~40 pods (memory bound)

In practice, Kubernetes reserves resources for system pods (kube-system namespace), the node’s OS, and eviction headroom. Allocatable memory on a 192GB node is typically around 175-180GB after these reservations. The numbers above represent theoretical maximums; real clusters run at 60-70% of maximum density to maintain scheduling headroom.

The 16-core EPYC CPU handles pod scheduling comfortably up to around 500 actively running pods before CPU becomes the constraint. Most real clusters with 100-300 pods are nowhere near this limit.

Storage: Persistent Volumes on NVMe

Local Path Provisioner

The simplest persistent volume setup for single-node or per-node storage uses the local-path provisioner (maintained by Rancher, included in k3s by default). This creates PersistentVolumeClaims backed by directories on the node’s NVMe filesystem.

For workloads that do not need storage to survive node failures (stateless applications with external databases, jobs using scratch space), local-path on NVMe provides the maximum possible storage throughput with zero network overhead. A PostgreSQL pod on local-path NVMe performs identically to PostgreSQL running directly on the same NVMe volume.

Longhorn for Replicated Storage

Longhorn (also from Rancher) is a Cloud Native Storage solution that replicates volumes across multiple cluster nodes. For multi-node clusters where pod scheduling should be independent of storage placement, Longhorn replicates PVC data to 2 or 3 nodes.

The replication overhead on NVMe is acceptable: Longhorn’s data path adds roughly 10-20% latency vs. local-path, which is still faster than cloud block storage attached over the network. For production databases in Kubernetes, Longhorn provides the resilience that local-path cannot.

Storage Class Selection by Workload

local-path: Stateless pods, CI/CD build caches, scratch volumes for batch jobs. Maximum performance, no replication.

Longhorn (1 replica): Single-node deployments wanting PVC management without node affinity pinning.

Longhorn (2-3 replicas): Production databases, stateful services requiring high availability across node failures.

Cloud Kubernetes uses vendor-specific CNI plugins (VPC CNI for EKS, etc.) that integrate with cloud networking primitives unavailable on bare metal. For bare metal K8s, three plugins cover most use cases:

Flannel: Simple VXLAN overlay, easiest to operate, acceptable performance for most workloads. Default in k3s. Lacks network policy enforcement.

Calico: BGP-based networking with full NetworkPolicy support. Recommended for production clusters needing pod-to-pod traffic isolation between namespaces.

Cilium: eBPF-based, lowest overhead of the three, replaces iptables with kernel-level packet processing. Best performance for high-throughput service meshes. More operationally complex.

For most teams starting with bare metal K8s, Calico provides the right balance: full NetworkPolicy support for security segmentation, stable operation, and good documentation. Cilium is worth evaluating when the cluster serves high-throughput east-west traffic where the iptables overhead in Calico becomes measurable.

Cloud Kubernetes automatically provisions a cloud load balancer when you create a Service with type: LoadBalancer. On bare metal, there is no cloud provider to provision that load balancer. Services get stuck in Pending status indefinitely.

MetalLB solves this. It runs as a controller in the cluster and assigns IP addresses from a configured pool to LoadBalancer services. In L2 mode (simpler), MetalLB responds to ARP requests for service IPs from the node where the service endpoint lives. In BGP mode, it advertises routes directly to upstream routers for proper load distribution.

For most InMotion dedicated server deployments running K8s, MetalLB in L2 mode with a small IP pool (even a /30 subnet of additional IPs) is sufficient to expose services externally. Add an Nginx ingress controller on top of MetalLB to handle HTTP/HTTPS routing without burning a dedicated IP per service.

FactorBare Metal K8sEKS / GKE / AKSWorker node cost per month$99-350 (dedicated)$100-800+ (VM nodes)Control plane costSelf-managed (free)$72-150/mo (managed fee)Storage latencyNVMe direct (~0.1ms)Network block (~1-5ms)Auto-scalingManual or Cluster AutoscalerNative cloud autoscalerGlobal regionsLos Angeles, Amsterdam30+ global regionsManagement overheadMedium (kubeadm/k3s)Low (managed control plane)Predictable monthly costYesVariable (usage-based)

Bare metal K8s wins on cost and storage performance for stable workloads. Managed cloud K8s wins on operational simplicity and global distribution. The right choice depends on whether your workloads have geographic distribution requirements and whether your team can manage a control plane.

Docker Swarm as a Simpler Alternative

Not every containerized workload needs Kubernetes. Docker Swarm on a single dedicated server handles dozens of containerized services with a fraction of K8s operational complexity. If your architecture has fewer than 10-15 distinct services and does not require K8s-specific features (Custom Resource Definitions, complex scheduling constraints, Helm ecosystem tooling), Swarm on an InMotion dedicated server deploys in an afternoon.

Docker Swarm’s networking model on a single node is simpler than K8s: overlay networks for service discovery, published ports for external access, Traefik or Nginx for ingress. No CNI plugins. No MetalLB. For teams that find K8s operational overhead exceeds the architectural benefits of their workload, Swarm is a valid production choice.

Getting Started

Order a bare metal or dedicated server, Extreme tier for production K8s worker nodes

Install k3s for single-node or lightweight multi-node clusters; kubeadm for full control over cluster configuration

Configure Calico CNI for NetworkPolicy support from day one

Install MetalLB in L2 mode for LoadBalancer service support

Set up local-path provisioner for development PVCs; Longhorn for production stateful workloads

Add Premier Care for OS-level management of the bare metal host

Teams currently paying $800 or more per month for managed Kubernetes worker nodes typically recover that cost in the first billing cycle after migrating steady-state workloads to bare metal. The operational investment in managing a control plane is real, but it is a one-time configuration cost, not an ongoing overhead proportional to your compute spend.

Source link