Kubernetes Platform Foundation
Core tooling for an internal Kubernetes platform team — GitOps delivery, policy enforcement, secret management, cluster autoscaling, and backup.
The Stack
ArgoCD
— GitOps continuous delivery (declarative path) optionalSyncs Kubernetes manifests from Git to clusters. UI-first with strong RBAC and multi-cluster support. Most widely adopted GitOps tool with a large community and plugin ecosystem.
Alternatives: fluxcd
FluxCD
— GitOps continuous delivery (Kubernetes-native path) optionalController-only (no UI by default), pure Kubernetes CRD-based. Slightly more GitOps-orthodox than Argo CD. Better for teams that want to avoid cluster-side UI exposure.
Alternatives: argocd
Helm
— Kubernetes package managerDe-facto standard for packaging and deploying Kubernetes applications. Required for almost every third-party chart in the ecosystem. Pairs with Argo CD or Flux for GitOps delivery.
Cilium
— eBPF-based CNI networking and security optionalReplaces iptables with eBPF for high-performance pod networking and Kubernetes NetworkPolicy enforcement. Provides built-in service mesh (Cilium Mesh), Hubble observability, and L7 policy.
Alternatives: calico
cert-manager
— TLS certificate automationAutomates Let's Encrypt (and internal CA) certificate issuance and renewal for Ingress, Gateway API, and pod mTLS. Required for any cluster serving HTTPS traffic.
External Secrets Operator
— Secret sync from external vaultsSyncs secrets from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, and Azure Key Vault into Kubernetes Secrets. Eliminates storing plaintext secrets in Git.
Karpenter
— Node autoscaler optionalProvisions the right EC2 instance type for each workload within seconds — much faster than Cluster Autoscaler. Dramatically reduces over-provisioning and cuts AWS compute costs.
Alternatives: cluster-autoscaler
Kyverno
— Policy engine and admission controller optionalKubernetes-native policy engine (no Rego required). Enforces security baselines, resource quotas, label standards, and image registry restrictions as admission webhook policies.
Alternatives: opa-gatekeeper
Crossplane
— Infrastructure-as-code via Kubernetes CRDs optionalProvision and manage cloud resources (RDS, S3, GKE, etc.) via Kubernetes manifests. Enables platform teams to offer self-service infrastructure to developers through the Kubernetes API.
Alternatives: terraform, pulumi
Velero
— Cluster backup and disaster recoveryBacks up Kubernetes resource definitions and persistent volume snapshots to S3-compatible storage. Essential for disaster recovery, cluster migration, and namespace restoration.
Gotchas
- ⚠️ Choosing between Argo CD and Flux is an early architectural decision that is expensive to reverse. Argo CD's UI wins team adoption faster; Flux integrates more naturally into pure GitOps workflows. Many teams run both — avoid this.
- ⚠️ Cert-manager with Let's Encrypt requires outbound HTTPS from the cluster (or DNS01 challenge configuration) and rate-limit awareness. Staging environment — use Let's Encrypt staging CA to avoid hitting prod rate limits during testing.
- ⚠️ External Secrets Operator rotates secrets on a polling interval (default 1 hour). Applications must handle secret reloads without restart, or pair ESO with Reloader to trigger rolling deployments on secret changes.
- ⚠️ Karpenter is AWS-only (EKS). GKE and AKS have equivalent managed autoscalers but Karpenter's provider for those clouds is still maturing as of 2026.
- ⚠️ Kyverno policies can accidentally block legitimate workloads (e.g., system DaemonSets). Start in Audit mode, review violations for one sprint, then switch to Enforce mode. Never deploy Enforce mode to production without audit baseline.
- ⚠️ Crossplane manages cloud resources via the Kubernetes reconciliation loop — this is powerful but debugging drift or failed reconciliations requires Kubernetes operator knowledge. Ensure your team has that skill before adopting.
- ⚠️ Velero backup does not include etcd-level cluster state by default. For full cluster recovery, combine Velero with your cloud provider's etcd backup (EKS automated etcd snapshots, etc.).
Related Stacks
Modern Full-Stack Observability
Cover logs, metrics, traces, and RUM for production engineering teams — with both a cost-efficient OSS path and a premium managed path.
Modern CI/CD Starter Kit
Fast, maintainable CI/CD for a new codebase — GitHub Actions as the orchestrator, turbo/Nx for build caching, and drop-in fast runners to cut pipeline minutes.
Self-Hosted OSS Replacements for Common SaaS
Replace Zapier, Google Analytics, Mixpanel, Auth0, Firebase, and Slack with open-source self-hosted alternatives — for privacy compliance or cost control.