Rishikumar S - Site Reliability Engineer & Cloud FinOps Specialist

About Me

Building Resilient Cloud Infrastructure with FinOps Excellence

Site Reliability Engineer with 6 years of experience in developing, deploying, and managing highly available, scalable, and resilient cloud-based systems and ML infrastructures. Proven expertise in automation, scripting, DevOps, SRE, and MLops domains. Specialized in cloud FinOps with deep experience in cost optimization, budget allocation, and building cost-aware systems that don't compromise on reliability.

Years Experience

% Uptime Achieved

K+ Monthly Savings

                                    
infrastructure.yaml

                                    apiVersion: 
                                    sre/v1
                                
                                    kind: 
                                    Engineer
                                
                                    metadata:
                                
                                    name: 
                                    "rishikumar-s"
                                
                                    role: 
                                    "senior-sre-finops"
                                
                                    spec:
                                
                                    uptime: 
                                    "99.99%"
                                
                                    costOptimization: 
                                    "$30K/month"
                                
                                    automation: 
                                    "if manual > 2x; then script()"

Professional Journey

Senior SRE - Cost Engineering

Oct 2023 - Present • MoEngage

Joined MoEngage with a mission: Make cloud costs go down and reliability go up. Wearing the FinOps + SRE hat, I now help manage a $12M+ AWS cloud portfolio—and yes, I meet the CTO weekly to talk dollars, graphs, and how to keep both happy.

💰 Achieved $100K+ monthly cost savings (and counting) by identifying and eliminating waste across compute, storage, and networking.

💡 Designed and enforced tagging policies to bring high-cardinality cost visibility across teams.

🤖 Built automation with AWS Lambda (Python + Boto3) to clean up unattached EBS volumes, zombie instances, S3 leftovers, and more.

🚪 Replaced managed NAT Gateways with a self-hosted alternative, slashing 40% of NAT costs.

🧠 Ran Karpenter-based optimizations to right-size EKS nodes, saving ~$30K/month on compute.

📦 Rolled out VPA-based container resizing to ensure 70%+ resource utilization across pods.

📊 Developed an internal cost analytics platform using Django, integrating CloudWatch, Prometheus, and CUR data—complete with Jira ticket automation for cost actions.

🧾 Implemented per-customer cost reporting using AWS CUR + business logic (because FinOps ≠ finance spreadsheets).

🔍 On the SRE front, I also:

🚀 Owned CI/CD workflows using Woodpecker, Helm, and ArgoCD

📊 Built out observability stack with VictoriaMetrics, Grafana, and Vector for logs

🔧 Standardized infra practices across environments with Terraform and custom tooling

🧑‍💼 Bonus: I don't just talk infra—I partner with the CTO and finance team directly, presenting forecasts, anomalies, and optimization plans like a FinOps nerd on a mission.

Senior Software Engineer - SRE

Apr 2022 - Sep 2023 • Freshworks

Promoted! 🎉 With a shiny new title came real production firepower—and a massive product: Freshdesk from the CX Unit, handling 450K+ RPM. I officially graduated from “incident responder” to “incident preventer.”

🤝 Worked closely with product architects to beef up Freshdesk’s reliability using real-world tech stacks—not just theoretical ones.

🛡️ Led DDoS defense missions using HAproxy, connection throttling, and IP filters (because firewalls can be fun).

🚀 Ran 40+ EKS cluster upgrades, juggling upgrades of HAproxy, KEDA, autoscalers, metrics exporters, and logging agents—like a production circus act, but without downtime.

🔍 Built out our observability stack:

📊 Metrics – Prometheus, Grafana, VictoriaMetrics

📦 Logs – SumoLogic, ELK Stack

🧵 Traces – OpenTelemetry, connecting the dots like a log detective

🔁 Ran Mock DR drills and actually did something with the RTO/RPO numbers we got.

🧠 Helped developers debug faster with smarter alerts, faster dashboards, and a “less yelling, more solving” approach.

📟 Reduced MTTD & MTTR so we weren’t solving incidents after they solved themselves.

🔐 Partnered with security teams to prep for PCI and SOC-2 audits—aka compliance bootcamp.

⚙️ Set up robust CI/CD flows with Jenkins, Helm, and Kustomize, making deployments faster, safer, and less Friday-ish.

It was the phase where YAML, alerts, and CI jobs all reported to me—and thankfully, most of them behaved.

Software Engineer - SRE

Oct 2020 - Mar 2022 • Freshworks

From “intern with potential” to “full-time SRE with PagerDuty access” — I returned after college with a full-time badge, full-time salary, and full-blown responsibility.

✅ Took charge of automating mock-production environments using Terraform + Ruby (yes, I automated my own nightmares).

🚢 Owned the production deployment pipeline—smoothed it, optimized it, and made it a lot less scary.

📊 Obsessed over metrics—set up alerting with Prometheus, Grafana, and SumoLogic, cutting down MTTD like a champ.

🧠 Gained deep knowledge of AWS, Kubernetes, Docker... and of course, the art of not breaking things during releases.

⚙️ Supported 99.99% uptime goals like a pro while learning to troubleshoot on the fly and talk to logs like they're old friends.

Also realized: “It works on my machine” doesn’t help in production. At all.

Software Engineering Intern

Jan 2020 - Sep 2020 • Freshworks

My first foray into the wild world of SRE. I learned that servers never sleep—and neither does PagerDuty.

✅ Got my hands dirty (and then clean, with automation) on Kubernetes, AWS, and Docker.

🔍 Learned to monitor like a hawk—Grafana, Prometheus, and SumoLogic became my new best friends.

🛠️ Helped automate tasks that nobody wanted to do manually (especially me).

📉 Learned that 99.99% uptime actually means fixing things at 3AM.

It was here I realized: I love building reliable systems and reducing fire drills.

Technical Expertise

Programming & Scripting

Python Ruby Shell Script C

Cloud & Infrastructure

Linux Networking AWS Kubernetes Docker Terraform

CI & CD

Jenkins Woodpecker Git Git Actions ArgoCD

FinOps & Monitoring

Cost Explorer Budgets CUR Kubecost Prometheus Grafana QuickSight

Network Proxies

HAproxy Nginx Istio Squid Proxy

K8s Tools

HPA VPA Karpenter KEDA Gatekeeper Velero

ML tools & Frameworks

MLflow Nvidia Device Plugin Tensorflow PyTorch Keras

Databases & Frameworks

MySQL Redis Ruby on Rails Django

What Drives Me

⚡

Reliability

Building highly available systems with 99.99% uptime and robust disaster recovery strategies

💰

Cost Optimization

Driving significant cost savings through strategic FinOps practices and automated optimization

🤖

Automation

Implementing intelligent automation to reduce manual overhead and improve operational efficiency

📊

Observability

Creating comprehensive monitoring and alerting systems for proactive issue resolution

Fun Facts About Me

☕

I can debug production issues faster after my second cup of coffee

🎮

I treat infrastructure optimization like a video game - always trying to beat my high score

🔥

I've never met a manual process I didn't want to automate (my colleagues love/hate this)

🚨

I wake up in cold sweats from nightmares about 404 errors and memory leaks

🧙‍♂️

I speak fluent YAML, JSON, and can translate between engineer and business speak

💡

My "Eureka!" moments usually happen in the shower or at 3 AM

🌐

I'm fluent in multi-cloud FinOps - translating complex AWS, GCP, Azure billing into CFO language and uncovering hidden cost burns

🚀

I'm passionate about saving engineer's from layoffs and preventing companies from bankruptcy - With optimized cloud spend

Hi, I'm Rishikumar S

Site Reliability Engineer & Cloud FinOps Specialist

About Me

Building Resilient Cloud Infrastructure with FinOps Excellence

Professional Journey

Senior SRE - Cost Engineering

Senior Software Engineer - SRE

Software Engineer - SRE

Software Engineering Intern

Technical Expertise

Programming & Scripting

Cloud & Infrastructure

CI & CD

FinOps & Monitoring

Network Proxies

K8s Tools

ML tools & Frameworks

Databases & Frameworks

What Drives Me

Reliability

Cost Optimization

Automation

Observability

Fun Facts About Me

My Projects

k8s-tool - Kubernetes Automation Tool

JSON Merging Tool

Bookstore Management GUI

Mentorship

Passionate About Helping Others Grow

Areas I Can Help With:

My Approach:

Ready to Start Your Growth Journey?

💡 Strategy Session (30 min)