Most platform teams are expensive YAML wranglers. I build platforms that make developers faster and ops teams redundant.
15 years. 40+ AWS accounts. Millions of requests. Zero tolerance for manual processes.
If it's not in Git, it doesn't exist. If it's not automated, it's a future incident waiting to happen.
I don't follow platform best practices β I write them.
| βοΈ Edge & CDN | Global CDN, WAF, DNS β designed, operated, and owned. Edge performance and security at org scale. |
| βοΈ Platform Engineering | EKS, OpenShift, Linkerd, KEDA, FluxCD, ArgoCD. The platform hundreds of engineers ship on every day. |
| π‘οΈ Policy as Code | Bad config doesn't reach prod. It gets rejected at the door. OPA, Kyverno, Gatekeeper β guardrails with teeth. |
| π’ Cloud Landing Zones | Multi-account AWS and Azure foundations at org scale β networking, identity, guardrails, account vending, all Terraformed. |
| π‘ Observability | Datadog and Dynatrace in production β not dashboards for dashboards' sake. Anomaly detection, SLOs, distributed tracing, and alerts that page on signal not noise. |
| π€ AI Γ Platform | I'm not waiting for AI to mature β I'm already shipping Claude Code skills, LLM workflows, and AI-assisted ops in production. |
Most AI assistants give platform advice that would get you paged at 3am. I built platform-skills because generic answers kill production systems. It's a Claude Code skill that actually knows Kubernetes, Terraform, GitOps, KEDA, Linkerd, OPA, Kyverno, AWS β patterns from real incidents, not documentation summaries. Use it or stay slow.
A landing zone isn't a project. It's the foundation every team in the org builds on β get it wrong and you're paying the interest forever.
I've designed and operated multi-cloud landing zones at org scale β not as a consulting exercise, as a day job.
| AWS | 40+ account org with Control Tower, SCPs, and AWS Organizations. Centralised networking via Transit Gateway. Security Hub, GuardDuty, Config Rules enforced org-wide. Account vending via Terraform β new accounts in minutes, not tickets. |
| Azure | Management Group hierarchy, Policy initiatives at scale, Azure Landing Zone accelerator patterns. Hub-spoke networking with Azure Firewall. Entra ID integration and PIM for just-in-time access. |
| Identity | OIDC everywhere β GitHub Actions, EKS, Azure Workload Identity. No static credentials. No exceptions. IAM roles scoped per workload, not per team. |
| Networking | VPC design that doesn't paint you into a corner. IPAM before you run out of /16s. PrivateLink over public endpoints. DNS that doesn't lie to you. |
| Guardrails | SCPs and Azure Policy that say no before a developer can say yes. Preventive > detective > reactive. |
The cloud is not a place you click around in. It's a codebase.
Every resource is Terraform. Every cluster state is Git. Every secret is in a vault. Every policy is enforced at admission β not discovered in a retro.
| Terraform | Reusable modules across every account. Remote state, DynamoDB locking, OIDC auth in CI. If you have an AWS_SECRET_ACCESS_KEY in your .env, you're doing it wrong. |
| GitOps | FluxCD is the operator. Git is the source of truth. Humans don't kubectl apply in prod β that's what the reconciler is for. |
| Helm | Schema-validated values. helm unittest in CI. If the chart doesn't pass, it doesn't go near a cluster. |
| Secrets | External Secrets Operator + AWS Secrets Manager / Azure Key Vault. Plaintext in Git is a P0 incident. No exceptions. |
| State discipline | terraform plan is mandatory and reviewed. Blast radius is documented. State files are scoped per service boundary. One bad apply doesn't cascade. |
If you're waiting for a user to report an outage, your observability is decoration.
| Datadog | APM, infrastructure metrics, log management, synthetics, and custom dashboards. Monitors with noise-suppressed alerts β pages mean something broke, not that a metric spiked for 30 seconds. |
| Dynatrace | Davis AI for anomaly detection across full-stack topology. Automatic dependency mapping. Code-level traces without manual instrumentation. OneAgent on every EKS node. |
| SLOs | Error budgets defined, tracked, and burned down visibly. When the budget is at 20%, the team knows β not after an incident review. |
| Distributed Tracing | Traces from edge to service to database. If something is slow, I know exactly where and why before the ticket is raised. |
| Alerting philosophy | Alert on symptoms, not causes. Page on customer impact. Everything else goes to a channel, not a phone. |
Building a platform? Drowning in YAML? Made a Terraform mistake you can't undo? I've seen worse. Let's talk.




