Nitin Jain nitinjain999

Most platform teams are expensive YAML wranglers. I build platforms that make developers faster and ops teams redundant.
15 years. 40+ AWS accounts. Millions of requests. Zero tolerance for manual processes.
If it's not in Git, it doesn't exist. If it's not automated, it's a future incident waiting to happen.
I don't follow platform best practices — I write them.

⚡ What I'm Building


☁️ Edge & CDN	Global CDN, WAF, DNS — designed, operated, and owned. Edge performance and security at org scale.
⚙️ Platform Engineering	EKS, OpenShift, Linkerd, KEDA, FluxCD, ArgoCD. The platform hundreds of engineers ship on every day.
🛡️ Policy as Code	Bad config doesn't reach prod. It gets rejected at the door. OPA, Kyverno, Gatekeeper — guardrails with teeth.
🏢 Cloud Landing Zones	Multi-account AWS and Azure foundations at org scale — networking, identity, guardrails, account vending, all Terraformed.
📡 Observability	Datadog and Dynatrace in production — not dashboards for dashboards' sake. Anomaly detection, SLOs, distributed tracing, and alerts that page on signal not noise.
🤖 AI × Platform	I'm not waiting for AI to mature — I'm already shipping Claude Code skills, LLM workflows, and AI-assisted ops in production.

🚀 Open Source

platform-skills

Most AI assistants give platform advice that would get you paged at 3am. I built platform-skills because generic answers kill production systems. It's a Claude Code skill that actually knows Kubernetes, Terraform, GitOps, KEDA, Linkerd, OPA, Kyverno, AWS — patterns from real incidents, not documentation summaries. Use it or stay slow.

🧰 Tech Stack

🌍 AWS & Azure Landing Zones

A landing zone isn't a project. It's the foundation every team in the org builds on — get it wrong and you're paying the interest forever.

I've designed and operated multi-cloud landing zones at org scale — not as a consulting exercise, as a day job.


AWS	40+ account org with Control Tower, SCPs, and AWS Organizations. Centralised networking via Transit Gateway. Security Hub, GuardDuty, Config Rules enforced org-wide. Account vending via Terraform — new accounts in minutes, not tickets.
Azure	Management Group hierarchy, Policy initiatives at scale, Azure Landing Zone accelerator patterns. Hub-spoke networking with Azure Firewall. Entra ID integration and PIM for just-in-time access.
Identity	OIDC everywhere — GitHub Actions, EKS, Azure Workload Identity. No static credentials. No exceptions. IAM roles scoped per workload, not per team.
Networking	VPC design that doesn't paint you into a corner. IPAM before you run out of `/16`s. PrivateLink over public endpoints. DNS that doesn't lie to you.
Guardrails	SCPs and Azure Policy that say no before a developer can say yes. Preventive > detective > reactive.

🏗️ Infrastructure as Code

The cloud is not a place you click around in. It's a codebase.

Every resource is Terraform. Every cluster state is Git. Every secret is in a vault. Every policy is enforced at admission — not discovered in a retro.


Terraform	Reusable modules across every account. Remote state, DynamoDB locking, OIDC auth in CI. If you have an `AWS_SECRET_ACCESS_KEY` in your `.env`, you're doing it wrong.
GitOps	FluxCD is the operator. Git is the source of truth. Humans don't `kubectl apply` in prod — that's what the reconciler is for.
Helm	Schema-validated values. `helm unittest` in CI. If the chart doesn't pass, it doesn't go near a cluster.
Secrets	External Secrets Operator + AWS Secrets Manager / Azure Key Vault. Plaintext in Git is a P0 incident. No exceptions.
State discipline	`terraform plan` is mandatory and reviewed. Blast radius is documented. State files are scoped per service boundary. One bad apply doesn't cascade.

👁️ Observability

If you're waiting for a user to report an outage, your observability is decoration.


Datadog	APM, infrastructure metrics, log management, synthetics, and custom dashboards. Monitors with noise-suppressed alerts — pages mean something broke, not that a metric spiked for 30 seconds.
Dynatrace	Davis AI for anomaly detection across full-stack topology. Automatic dependency mapping. Code-level traces without manual instrumentation. OneAgent on every EKS node.
SLOs	Error budgets defined, tracked, and burned down visibly. When the budget is at 20%, the team knows — not after an incident review.
Distributed Tracing	Traces from edge to service to database. If something is slow, I know exactly where and why before the ticket is raised.
Alerting philosophy	Alert on symptoms, not causes. Page on customer impact. Everything else goes to a channel, not a phone.

📊 GitHub Stats

🐍 Contribution Activity

github-snake

🤝 Let's Connect

Building a platform? Drowning in YAML? Made a Terraform mistake you can't undo? I've seen worse. Let's talk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly