Hi, I'm Rishikumar S

Site Reliability Engineer & Cloud FinOps Specialist

Passionate Site Reliability Engineer with 6 years of experience developing, deploying, and managing highly available, scalable cloud systems. Specialized in Cloud FinOps with proven expertise in cost optimization, automation, and building resilient ML-based infrastructures.

Rishi Kumar

About Me

Building Resilient Cloud Infrastructure with FinOps Excellence

Site Reliability Engineer with 6 years of experience in developing, deploying, and managing highly available, scalable, and resilient cloud-based systems and ML infrastructures. Proven expertise in automation, scripting, DevOps, SRE, and MLops domains. Specialized in cloud FinOps with deep experience in cost optimization, budget allocation, and building cost-aware systems that don't compromise on reliability.

0
Years Experience
0
% Uptime Achieved
0
K+ Monthly Savings
infrastructure.yaml
apiVersion: sre/v1
kind: Engineer
metadata:
name: "rishikumar-s"
role: "senior-sre-finops"
spec:
uptime: "99.99%"
costOptimization: "$30K/month"
automation: "if manual > 2x; then script()"

Professional Journey

Senior SRE - Cost Engineering

Oct 2023 - Present • MoEngage

Joined MoEngage with a mission: Make cloud costs go down and reliability go up. Wearing the FinOps + SRE hat, I now help manage a $12M+ AWS cloud portfolio—and yes, I meet the CTO weekly to talk dollars, graphs, and how to keep both happy.

💰 Achieved $100K+ monthly cost savings (and counting) by identifying and eliminating waste across compute, storage, and networking.

💡 Designed and enforced tagging policies to bring high-cardinality cost visibility across teams.

🤖 Built automation with AWS Lambda (Python + Boto3) to clean up unattached EBS volumes, zombie instances, S3 leftovers, and more.

🚪 Replaced managed NAT Gateways with a self-hosted alternative, slashing 40% of NAT costs.

🧠 Ran Karpenter-based optimizations to right-size EKS nodes, saving ~$30K/month on compute.

📦 Rolled out VPA-based container resizing to ensure 70%+ resource utilization across pods.

📊 Developed an internal cost analytics platform using Django, integrating CloudWatch, Prometheus, and CUR data—complete with Jira ticket automation for cost actions.

🧾 Implemented per-customer cost reporting using AWS CUR + business logic (because FinOps ≠ finance spreadsheets).

🔍 On the SRE front, I also:

🚀 Owned CI/CD workflows using Woodpecker, Helm, and ArgoCD

📊 Built out observability stack with VictoriaMetrics, Grafana, and Vector for logs

🔧 Standardized infra practices across environments with Terraform and custom tooling

🧑‍💼 Bonus: I don't just talk infra—I partner with the CTO and finance team directly, presenting forecasts, anomalies, and optimization plans like a FinOps nerd on a mission.

Senior Software Engineer - SRE

Apr 2022 - Sep 2023 • Freshworks

Promoted! 🎉 With a shiny new title came real production firepower—and a massive product: Freshdesk from the CX Unit, handling 450K+ RPM. I officially graduated from “incident responder” to “incident preventer.”

🤝 Worked closely with product architects to beef up Freshdesk’s reliability using real-world tech stacks—not just theoretical ones.

🛡️ Led DDoS defense missions using HAproxy, connection throttling, and IP filters (because firewalls can be fun).

🚀 Ran 40+ EKS cluster upgrades, juggling upgrades of HAproxy, KEDA, autoscalers, metrics exporters, and logging agents—like a production circus act, but without downtime.

🔍 Built out our observability stack:

📊 Metrics – Prometheus, Grafana, VictoriaMetrics

📦 Logs – SumoLogic, ELK Stack

🧵 Traces – OpenTelemetry, connecting the dots like a log detective

🔁 Ran Mock DR drills and actually did something with the RTO/RPO numbers we got.

🧠 Helped developers debug faster with smarter alerts, faster dashboards, and a “less yelling, more solving” approach.

📟 Reduced MTTD & MTTR so we weren’t solving incidents after they solved themselves.

🔐 Partnered with security teams to prep for PCI and SOC-2 audits—aka compliance bootcamp.

⚙️ Set up robust CI/CD flows with Jenkins, Helm, and Kustomize, making deployments faster, safer, and less Friday-ish.

It was the phase where YAML, alerts, and CI jobs all reported to me—and thankfully, most of them behaved.

Software Engineer - SRE

Oct 2020 - Mar 2022 • Freshworks

From “intern with potential” to “full-time SRE with PagerDuty access” — I returned after college with a full-time badge, full-time salary, and full-blown responsibility.

Took charge of automating mock-production environments using Terraform + Ruby (yes, I automated my own nightmares).

🚢 Owned the production deployment pipeline—smoothed it, optimized it, and made it a lot less scary.

📊 Obsessed over metrics—set up alerting with Prometheus, Grafana, and SumoLogic, cutting down MTTD like a champ.

🧠 Gained deep knowledge of AWS, Kubernetes, Docker... and of course, the art of not breaking things during releases.

⚙️ Supported 99.99% uptime goals like a pro while learning to troubleshoot on the fly and talk to logs like they're old friends.

Also realized: “It works on my machine” doesn’t help in production. At all.

Software Engineering Intern

Jan 2020 - Sep 2020 • Freshworks

My first foray into the wild world of SRE. I learned that servers never sleep—and neither does PagerDuty.

Got my hands dirty (and then clean, with automation) on Kubernetes, AWS, and Docker.

🔍 Learned to monitor like a hawk—Grafana, Prometheus, and SumoLogic became my new best friends.

🛠️ Helped automate tasks that nobody wanted to do manually (especially me).

📉 Learned that 99.99% uptime actually means fixing things at 3AM.

It was here I realized: I love building reliable systems and reducing fire drills.

Technical Expertise

Programming & Scripting

Python Ruby Shell Script C

Cloud & Infrastructure

Linux Networking AWS Kubernetes Docker Terraform

CI & CD

Jenkins Woodpecker Git Git Actions ArgoCD

FinOps & Monitoring

Cost Explorer Budgets CUR Kubecost Prometheus Grafana QuickSight

Network Proxies

HAproxy Nginx Istio Squid Proxy

K8s Tools

HPA VPA Karpenter KEDA Gatekeeper Velero

ML tools & Frameworks

MLflow Nvidia Device Plugin Tensorflow PyTorch Keras

Databases & Frameworks

MySQL Redis Ruby on Rails Django

What Drives Me

Reliability

Building highly available systems with 99.99% uptime and robust disaster recovery strategies

💰

Cost Optimization

Driving significant cost savings through strategic FinOps practices and automated optimization

🤖

Automation

Implementing intelligent automation to reduce manual overhead and improve operational efficiency

📊

Observability

Creating comprehensive monitoring and alerting systems for proactive issue resolution

Fun Facts About Me

I can debug production issues faster after my second cup of coffee

🎮

I treat infrastructure optimization like a video game - always trying to beat my high score

🔥

I've never met a manual process I didn't want to automate (my colleagues love/hate this)

🚨

I wake up in cold sweats from nightmares about 404 errors and memory leaks

🧙‍♂️

I speak fluent YAML, JSON, and can translate between engineer and business speak

💡

My "Eureka!" moments usually happen in the shower or at 3 AM

🌐

I'm fluent in multi-cloud FinOps - translating complex AWS, GCP, Azure billing into CFO language and uncovering hidden cost burns

🚀

I'm passionate about saving engineer's from layoffs and preventing companies from bankruptcy - With optimized cloud spend

My Projects

k8s-tool - Kubernetes Automation Tool

A comprehensive command-line tool for managing Kubernetes deployments and installations. Features automated deployment creation, KEDA integration for event-driven autoscaling, HPA support, and resource management with support for multiple scaling triggers including CPU, Memory, Prometheus, Kafka, and Redis.

Python Kubernetes KEDA CLI DevOps

JSON Merging Tool

A Python utility for merging multiple JSON files with size-based output control. Efficiently processes JSON files from a directory, merges them based on prefix patterns, and manages output file sizes with configurable limits. Built using Python's native JSON and OS libraries.

Python JSON Data Processing Automation

Bookstore Management GUI

A comprehensive desktop application for bookstore management built with Python Tkinter. Features include adding new book entries, viewing existing inventory, updating book information, searching by author/year/ISBN, and deleting records. Uses SQLite3 database for persistent data storage.

Python Tkinter SQLite3 GUI CRUD

Mentorship

Passionate About Helping Others Grow

I believe in the power of mentorship to accelerate learning and career growth. Whether you're just starting your journey in tech or looking to advance your skills, I'm here to help.

Areas I Can Help With:

  • Site Reliability Engineering: Building highly available and scalable systems
  • Cloud FinOps: Cost optimization strategies and budget management
  • Infrastructure Automation: Terraform, Kubernetes, and CI/CD best practices
  • Monitoring & Observability: Prometheus, Grafana, and alerting strategies
  • Career Growth: Transitioning into SRE and advancing in cloud technologies
  • Career Planning: Helping college students navigate from academics to tech careers with strategic goal-setting and actionable roadmaps
  • Basic Financial Planning: Personal finance fundamentals for freshers starting their IT journey - budgeting, savings, and money management basics. *Note: I'm not a financial expert or CA, but sharing hard-learned experiences to help others avoid common financial mistakes I made early in my career.

My Approach:

I believe in personalized mentorship that adapts to your learning style and goals. Our sessions will be practical, focused, and designed to provide you with actionable insights and clear next steps.

Ready to Start Your Growth Journey?

Let's schedule a call to discuss your goals and how I can help you achieve them.

💡 Strategy Session (30 min)

Deep dive into your career goals

Schedule Strategy Session

Or reach out directly:

Send Email