EXPERIENCE
Tenure | Based In | |
---|---|---|
Vice President - Lead Site Reliability Engineer | 13 years |
Vice President - Lead SRE
JPMorganChase
31st July,2023 - Present CIB Market's Sales Reserach & Data Technologies (SRDT)
SRDT's SRE Team
Governed and modernized AWS cloud and on-premise infrastructure for multiple application development teams within the CIB Markets LOB at JPMorgan Chase, ensuring high availability and compliance with enterprise standards.
Engineered and automated cloud infrastructure (IaC) using Terraform and Python, and established robust CI/CD pipelines leveraging Jenkins and Spinnaker to accelerate development cycles.
Designed and implemented a comprehensive observability strategy utilizing Datadog, Grafana, Prometheus, and CloudWatch/Splunk to provide deep insights into system performance and enable proactive issue resolution.
Spearheaded FinOps initiatives that optimized cloud spend, eliminated resource waste, and performed rightsizing of assets, resulting in significant cost savings and improved operational efficiency.
Championed system resilience by instituting chaos engineering practices using AWS FIS, Gremlin, and manual failover tests to validate and improve product reliability under failure conditions.
Senior Technology Engineer - SRE
Mckinsey & Company
28th November,2016 - 28th July,2023 Technology & Digital (T&D)
DevOps Implementation team (DIT) Team - SRE
DIT team is resposible for establishing devops culture ,CI/CD tooling, modernize the infrastructure components across T&D.
- Recruited as core member of team,
- Setup Kubernetes, DC/OS mesosphere, Grafana Prometheus Alert manager, Persistence storage for K8s using portworx.
- VSpehere to AWS migration to modernise the app products
- CI/CD implementation using goCD/Jenkins, Ansible, Docker, Linux shell scripts.
AI Studio Team - SRE Cloud (MLOps Engineer)
Technology & Digital AI Capability & Driving Innovation at McKinsey.
- My role was MLOps Engineer, provisioning AWS infrastructure using CI/CD Terraform Python.
- Support Data scientist , ML engineers for Jupyter notebooks,Spacy, RStudio, Kedro.
- Setup Infrastructure for ML requirements including GPUs, Data pipelines with Terraform and Python.
- Setup CI/CD for ML workflows on Jenkins. ML metrics collection and evaluation using MLFlow.
Data Science Innovation (DSI) Team - SRE Cloud (MLOps Engineer)
Data Science Innovation Support Data Science Team for training and running ML workloads
- Responsibilty include setting up and managing the Kubernetes setup on 2 big GPUs based server (linux based) using Rancher, Longhorn as persistence storage, prometheus grafana as monitoring solution. argoCD and argoWorkflow setup.
- Helmcharts for K8s deployments with aroCD.
- MinIO as object storage and backup for K8s.
- Maintaining the GitLab instance for our team .
- Keycloak for IAM setup and integrate it with K8s Rancher, Gitlab,argoCD as authentication Layer.
- certificates lifecycle management using cert manager on k8s and for Rancher cluster.
- Collaborating with other verticals at mckinsey like Quantumblack & LeapX for knowledge sharing & implementing the infra solutions around k8s and Datascience workflows.
- Have worked on Kubernetes AWS and On-Prem with GPUs Nodes.