DevTools & Infra

Kubernetes Platform Foundation

Core tooling for an internal Kubernetes platform team — GitOps delivery, policy enforcement, secret management, cluster autoscaling, and backup.

Platform/infra teams at companies adopting Kubernetes who need a repeatable, secure, and operationally sound cluster foundation $0 (all OSS tooling, excluding cloud infrastructure) — typical EKS cluster with these tools: $200–$800/mo in AWS costs depending on node count and storage 📦 10 tools

The Stack

ArgoCD

— GitOps continuous delivery (declarative path) optional

Syncs Kubernetes manifests from Git to clusters. UI-first with strong RBAC and multi-cluster support. Most widely adopted GitOps tool with a large community and plugin ecosystem.

Alternatives: fluxcd

FluxCD

— GitOps continuous delivery (Kubernetes-native path) optional

Controller-only (no UI by default), pure Kubernetes CRD-based. Slightly more GitOps-orthodox than Argo CD. Better for teams that want to avoid cluster-side UI exposure.

Alternatives: argocd

Helm

— Kubernetes package manager

De-facto standard for packaging and deploying Kubernetes applications. Required for almost every third-party chart in the ecosystem. Pairs with Argo CD or Flux for GitOps delivery.

Cilium

— eBPF-based CNI networking and security optional

Replaces iptables with eBPF for high-performance pod networking and Kubernetes NetworkPolicy enforcement. Provides built-in service mesh (Cilium Mesh), Hubble observability, and L7 policy.

Alternatives: calico

cert-manager

— TLS certificate automation

Automates Let's Encrypt (and internal CA) certificate issuance and renewal for Ingress, Gateway API, and pod mTLS. Required for any cluster serving HTTPS traffic.

External Secrets Operator

— Secret sync from external vaults

Syncs secrets from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, and Azure Key Vault into Kubernetes Secrets. Eliminates storing plaintext secrets in Git.

Karpenter

— Node autoscaler optional

Provisions the right EC2 instance type for each workload within seconds — much faster than Cluster Autoscaler. Dramatically reduces over-provisioning and cuts AWS compute costs.

Alternatives: cluster-autoscaler

Kyverno

— Policy engine and admission controller optional

Kubernetes-native policy engine (no Rego required). Enforces security baselines, resource quotas, label standards, and image registry restrictions as admission webhook policies.

Alternatives: opa-gatekeeper

Crossplane

— Infrastructure-as-code via Kubernetes CRDs optional

Provision and manage cloud resources (RDS, S3, GKE, etc.) via Kubernetes manifests. Enables platform teams to offer self-service infrastructure to developers through the Kubernetes API.

Alternatives: terraform, pulumi

Velero

— Cluster backup and disaster recovery

Backs up Kubernetes resource definitions and persistent volume snapshots to S3-compatible storage. Essential for disaster recovery, cluster migration, and namespace restoration.

Gotchas

  • ⚠️ Choosing between Argo CD and Flux is an early architectural decision that is expensive to reverse. Argo CD's UI wins team adoption faster; Flux integrates more naturally into pure GitOps workflows. Many teams run both — avoid this.
  • ⚠️ Cert-manager with Let's Encrypt requires outbound HTTPS from the cluster (or DNS01 challenge configuration) and rate-limit awareness. Staging environment — use Let's Encrypt staging CA to avoid hitting prod rate limits during testing.
  • ⚠️ External Secrets Operator rotates secrets on a polling interval (default 1 hour). Applications must handle secret reloads without restart, or pair ESO with Reloader to trigger rolling deployments on secret changes.
  • ⚠️ Karpenter is AWS-only (EKS). GKE and AKS have equivalent managed autoscalers but Karpenter's provider for those clouds is still maturing as of 2026.
  • ⚠️ Kyverno policies can accidentally block legitimate workloads (e.g., system DaemonSets). Start in Audit mode, review violations for one sprint, then switch to Enforce mode. Never deploy Enforce mode to production without audit baseline.
  • ⚠️ Crossplane manages cloud resources via the Kubernetes reconciliation loop — this is powerful but debugging drift or failed reconciliations requires Kubernetes operator knowledge. Ensure your team has that skill before adopting.
  • ⚠️ Velero backup does not include etcd-level cluster state by default. For full cluster recovery, combine Velero with your cloud provider's etcd backup (EKS automated etcd snapshots, etc.).

Related Stacks