Passionate Site Reliability Engineer with 6 years of experience developing, deploying, and managing highly available, scalable cloud systems. Specialized in Cloud FinOps with proven expertise in cost optimization, automation, and building resilient ML-based infrastructures.
Site Reliability Engineer with 6 years of experience in developing, deploying, and managing highly available, scalable, and resilient cloud-based systems and ML infrastructures. Proven expertise in automation, scripting, DevOps, SRE, and MLops domains. Specialized in cloud FinOps with deep experience in cost optimization, budget allocation, and building cost-aware systems that don't compromise on reliability.
Joined MoEngage with a mission: Make cloud costs go down and reliability go up. Wearing the FinOps + SRE hat, I now help manage a $12M+ AWS cloud portfolio—and yes, I meet the CTO weekly to talk dollars, graphs, and how to keep both happy.
💰 Achieved $100K+ monthly cost savings (and counting) by identifying and eliminating waste across compute, storage, and networking.
💡 Designed and enforced tagging policies to bring high-cardinality cost visibility across teams.
🤖 Built automation with AWS Lambda (Python + Boto3) to clean up unattached EBS volumes, zombie instances, S3 leftovers, and more.
🚪 Replaced managed NAT Gateways with a self-hosted alternative, slashing 40% of NAT costs.
🧠 Ran Karpenter-based optimizations to right-size EKS nodes, saving ~$30K/month on compute.
📦 Rolled out VPA-based container resizing to ensure 70%+ resource utilization across pods.
📊 Developed an internal cost analytics platform using Django, integrating CloudWatch, Prometheus, and CUR data—complete with Jira ticket automation for cost actions.
🧾 Implemented per-customer cost reporting using AWS CUR + business logic (because FinOps ≠ finance spreadsheets).
🔍 On the SRE front, I also:
🚀 Owned CI/CD workflows using Woodpecker, Helm, and ArgoCD
📊 Built out observability stack with VictoriaMetrics, Grafana, and Vector for logs
🔧 Standardized infra practices across environments with Terraform and custom tooling
🧑💼 Bonus: I don't just talk infra—I partner with the CTO and finance team directly, presenting forecasts, anomalies, and optimization plans like a FinOps nerd on a mission.
Promoted! 🎉 With a shiny new title came real production firepower—and a massive product: Freshdesk from the CX Unit, handling 450K+ RPM. I officially graduated from “incident responder” to “incident preventer.”
🤝 Worked closely with product architects to beef up Freshdesk’s reliability using real-world tech stacks—not just theoretical ones.
🛡️ Led DDoS defense missions using HAproxy, connection throttling, and IP filters (because firewalls can be fun).
🚀 Ran 40+ EKS cluster upgrades, juggling upgrades of HAproxy, KEDA, autoscalers, metrics exporters, and logging agents—like a production circus act, but without downtime.
🔍 Built out our observability stack:
📊 Metrics – Prometheus, Grafana, VictoriaMetrics
📦 Logs – SumoLogic, ELK Stack
🧵 Traces – OpenTelemetry, connecting the dots like a log detective
🔁 Ran Mock DR drills and actually did something with the RTO/RPO numbers we got.
🧠 Helped developers debug faster with smarter alerts, faster dashboards, and a “less yelling, more solving” approach.
📟 Reduced MTTD & MTTR so we weren’t solving incidents after they solved themselves.
🔐 Partnered with security teams to prep for PCI and SOC-2 audits—aka compliance bootcamp.
⚙️ Set up robust CI/CD flows with Jenkins, Helm, and Kustomize, making deployments faster, safer, and less Friday-ish.
It was the phase where YAML, alerts, and CI jobs all reported to me—and thankfully, most of them behaved.
From “intern with potential” to “full-time SRE with PagerDuty access” — I returned after college with a full-time badge, full-time salary, and full-blown responsibility.
✅ Took charge of automating mock-production environments using Terraform + Ruby (yes, I automated my own nightmares).
🚢 Owned the production deployment pipeline—smoothed it, optimized it, and made it a lot less scary.
📊 Obsessed over metrics—set up alerting with Prometheus, Grafana, and SumoLogic, cutting down MTTD like a champ.
🧠 Gained deep knowledge of AWS, Kubernetes, Docker... and of course, the art of not breaking things during releases.
⚙️ Supported 99.99% uptime goals like a pro while learning to troubleshoot on the fly and talk to logs like they're old friends.
Also realized: “It works on my machine” doesn’t help in production. At all.
My first foray into the wild world of SRE. I learned that servers never sleep—and neither does PagerDuty.
✅ Got my hands dirty (and then clean, with automation) on Kubernetes, AWS, and Docker.
🔍 Learned to monitor like a hawk—Grafana, Prometheus, and SumoLogic became my new best friends.
🛠️ Helped automate tasks that nobody wanted to do manually (especially me).
📉 Learned that 99.99% uptime actually means fixing things at 3AM.
It was here I realized: I love building reliable systems and reducing fire drills.
Building highly available systems with 99.99% uptime and robust disaster recovery strategies
Driving significant cost savings through strategic FinOps practices and automated optimization
Implementing intelligent automation to reduce manual overhead and improve operational efficiency
Creating comprehensive monitoring and alerting systems for proactive issue resolution
I can debug production issues faster after my second cup of coffee
I treat infrastructure optimization like a video game - always trying to beat my high score
I've never met a manual process I didn't want to automate (my colleagues love/hate this)
I wake up in cold sweats from nightmares about 404 errors and memory leaks
I speak fluent YAML, JSON, and can translate between engineer and business speak
My "Eureka!" moments usually happen in the shower or at 3 AM
I'm fluent in multi-cloud FinOps - translating complex AWS, GCP, Azure billing into CFO language and uncovering hidden cost burns
I'm passionate about saving engineer's from layoffs and preventing companies from bankruptcy - With optimized cloud spend
A comprehensive command-line tool for managing Kubernetes deployments and installations. Features automated deployment creation, KEDA integration for event-driven autoscaling, HPA support, and resource management with support for multiple scaling triggers including CPU, Memory, Prometheus, Kafka, and Redis.
A Python utility for merging multiple JSON files with size-based output control. Efficiently processes JSON files from a directory, merges them based on prefix patterns, and manages output file sizes with configurable limits. Built using Python's native JSON and OS libraries.
A comprehensive desktop application for bookstore management built with Python Tkinter. Features include adding new book entries, viewing existing inventory, updating book information, searching by author/year/ISBN, and deleting records. Uses SQLite3 database for persistent data storage.
I believe in the power of mentorship to accelerate learning and career growth. Whether you're just starting your journey in tech or looking to advance your skills, I'm here to help.
I believe in personalized mentorship that adapts to your learning style and goals. Our sessions will be practical, focused, and designed to provide you with actionable insights and clear next steps.
Let's schedule a call to discuss your goals and how I can help you achieve them.
Or reach out directly:
Send Email