senior site reliability Engineer

Engineering

3 months ago

No applicants yet

? 2M - 3M

Job Description

Job Title: Senior Site Reliability Engineer (SRE)\nLocation: Bengaluru (Hybrid )\nExperience: 6 – 10 Years\nClient : SONY\nSalary : 30LPA ( negotiable for right candidates )\n \nImportant Note:\n• We are considering only local candidates for this requirement.\n• Candidates must be available for face-to-face interviews on short notice.\n \n?? Job Overview\nWe are looking for a Senior Site Reliability Engineer (SRE) with strong expertise in observability, cloud-native platforms, and Kubernetes-based systems. This is a hands-on role focused on building, operating, and improving reliable, scalable, and observable platforms in GCP (preferred) and AWS environments.\n \nKey Responsibilities\n?? Reliability & Operations\n• Design and maintain highly available, resilient systems on Kubernetes\n• Define and manage SLOs, SLIs, and error budgets\n• Lead incident response, perform RCA, and drive blameless postmortems\n• Improve platform reliability through automation and tooling\n \n?? Observability (Core Focus)\n• Build and operate centralized observability platforms (metrics, logs, traces, alerts)\n• Hands-on with Prometheus, Alertmanager, Grafana\n• Logging & tracing using ELK / OpenSearch, Loki, OpenTelemetry\n• Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)\n• Define actionable and noise-free alerting standards\n \n?? Cloud & Platform Engineering\n• Build and manage infrastructure on GCP (preferred) or AWS\n• Operate Kubernetes clusters (GKE preferred, EKS acceptable)\n• Deploy services using Helm\n• Manage containerized workloads with Docker\n• Use Terraform / Ansible / Packer for infrastructure automation\n \n?? Automation & Tooling\n• Strong Python skills for automation and reliability tooling\n• Build internal tools for observability, SLO tracking, and incident workflows\n• Integrate CI/CD pipelines (Jenkins) with reliability and observability checks\n \n?? Collaboration & Leadership\n• Mentor junior engineers\n• Influence architecture and reliability best practices\n• Collaborate closely with platform, application, and cloud teams\n \n? Mandatory Skills\n• Site Reliability Engineering (SRE)\n• Python ( Coding ) not just scripting\n• ELK stack\n• Kubernetes\n• AWS and/or GCP\n• Prometheus, Grafana\n• Docker, Helm\n• Terraform\n• Linux\n• CI/CD (Jenkins)\n \n? Nice to Have\n• Splunk, Datadog, Cribl, Vectors\n• OpenTelemetry\n• Multi-cloud experience\n• Platform security exposure\n \n?? Project Highlights\n• Build and operate a centralized observability platform\n• Drive SLOs and error budgets to reduce MTTR\n• Lead production incident response\n• Optimize scalability, performance, and cloud costs\nAct as a technical leader for SRE & observability initiatives

Key Responsibilities

.\n \nKey Responsibilities\n?? Reliability & Operations\n• Design and maintain highly available, resilient systems on Kubernetes\n• Define and manage SLOs, SLIs, and error budgets\n• Lead incident response, perform RCA, and drive blameless postmortems\n• Improve platform reliability through automation and tooling\n \n?? Observability (Core Focus)\n• Build and operate centralized observability platforms (metrics, logs, traces, alerts)\n• Hands-on with Prometheus, Alertmanager, Grafana\n• Logging & tracing using ELK / OpenSearch, Loki, OpenTelemetry\n• Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)\n• Define actionable and noise-free alerting standards\n \n?? Cloud & Platform Engineering\n• Build and manage infrastructure on GCP (preferred) or AWS\n• Operate Kubernetes clusters (GKE preferred, EKS acceptable)\n• Deploy services using Helm\n• Manage containerized workloads with Docker\n• Use Terraform / Ansible / Packer for infrastructure automation\n \n?? Automation & Tooling\n• Strong Python skills for automation and reliability tooling\n• Build internal tools for observability, SLO tracking, and incident workflows\n• Integrate CI/CD pipelines (Jenkins) with reliability and observability checks\n \n?? Collaboration & Leadership\n• Mentor junior engineers\n• Influence architecture and reliability best practices\n• Collaborate closely with platform, application, and cloud teams

Skill & Experience

Google Cloud Platform (GCP)
Python
Kubernetes
AWS
Prometheus
Grafana
Docker
Terraform
Linux Administration
CI/CD

Job Overview

Date Posted:

26th Mar, 2026

Expiration date:

31st Aug, 2026

Location:

Bengaluru , Karnataka, India

Job Type:

Engineer

Job Shift:

Fixed Shift

Functional Areas:

IT Support

Positions:

Job Experience:

10 Year

Salary Period:

Monthly Pay Period

senior site reliability Engineer

Job Description

Key Responsibilities

Skill & Experience

Share This Job:

Job Overview