company-details

senior site reliability Engineer

Engineering

2 weeks ago

No applicants yet

Job Description

Job Title: Senior Site Reliability Engineer (SRE)
Location: Bengaluru (Hybrid )
Experience: 6 – 10 Years
Client : SONY
Salary : 30LPA ( negotiable for right candidates )

Important Note:
• We are considering only local candidates for this requirement.
• Candidates must be available for face-to-face interviews on short notice.

🔎 Job Overview
We are looking for a Senior Site Reliability Engineer (SRE) with strong expertise in observability, cloud-native platforms, and Kubernetes-based systems. This is a hands-on role focused on building, operating, and improving reliable, scalable, and observable platforms in GCP (preferred) and AWS environments.

Key Responsibilities
🔹 Reliability & Operations
• Design and maintain highly available, resilient systems on Kubernetes
• Define and manage SLOs, SLIs, and error budgets
• Lead incident response, perform RCA, and drive blameless postmortems
• Improve platform reliability through automation and tooling

🔹 Observability (Core Focus)
• Build and operate centralized observability platforms (metrics, logs, traces, alerts)
• Hands-on with Prometheus, Alertmanager, Grafana
• Logging & tracing using ELK / OpenSearch, Loki, OpenTelemetry
• Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
• Define actionable and noise-free alerting standards

🔹 Cloud & Platform Engineering
• Build and manage infrastructure on GCP (preferred) or AWS
• Operate Kubernetes clusters (GKE preferred, EKS acceptable)
• Deploy services using Helm
• Manage containerized workloads with Docker
• Use Terraform / Ansible / Packer for infrastructure automation

🔹 Automation & Tooling
• Strong Python skills for automation and reliability tooling
• Build internal tools for observability, SLO tracking, and incident workflows
• Integrate CI/CD pipelines (Jenkins) with reliability and observability checks

🔹 Collaboration & Leadership
• Mentor junior engineers
• Influence architecture and reliability best practices
• Collaborate closely with platform, application, and cloud teams

✅ Mandatory Skills
• Site Reliability Engineering (SRE)
• Python ( Coding ) not just scripting
• ELK stack
• Kubernetes
• AWS and/or GCP
• Prometheus, Grafana
• Docker, Helm
• Terraform
• Linux
• CI/CD (Jenkins)

⭐ Nice to Have
• Splunk, Datadog, Cribl, Vectors
• OpenTelemetry
• Multi-cloud experience
• Platform security exposure

📅 Project Highlights
• Build and operate a centralized observability platform
• Drive SLOs and error budgets to reduce MTTR
• Lead production incident response
• Optimize scalability, performance, and cloud costs
Act as a technical leader for SRE & observability initiatives

Key Responsibilities
.

Key Responsibilities
🔹 Reliability & Operations
• Design and maintain highly available, resilient systems on Kubernetes
• Define and manage SLOs, SLIs, and error budgets
• Lead incident response, perform RCA, and drive blameless postmortems
• Improve platform reliability through automation and tooling

🔹 Observability (Core Focus)
• Build and operate centralized observability platforms (metrics, logs, traces, alerts)
• Hands-on with Prometheus, Alertmanager, Grafana
• Logging & tracing using ELK / OpenSearch, Loki, OpenTelemetry
• Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
• Define actionable and noise-free alerting standards

🔹 Cloud & Platform Engineering
• Build and manage infrastructure on GCP (preferred) or AWS
• Operate Kubernetes clusters (GKE preferred, EKS acceptable)
• Deploy services using Helm
• Manage containerized workloads with Docker
• Use Terraform / Ansible / Packer for infrastructure automation

🔹 Automation & Tooling
• Strong Python skills for automation and reliability tooling
• Build internal tools for observability, SLO tracking, and incident workflows
• Integrate CI/CD pipelines (Jenkins) with reliability and observability checks

🔹 Collaboration & Leadership
• Mentor junior engineers
• Influence architecture and reliability best practices
• Collaborate closely with platform, application, and cloud teams
Skill & Experience
  • Google Cloud Platform (GCP)
  • Python
  • Kubernetes
  • AWS
  • Prometheus
  • Grafana
  • Docker
  • Terraform
  • Linux Administration
  • CI/CD
Job Overview

Date Posted:

26th Mar, 2026

Expiration date:

31st Aug, 2026

Location:

Bengaluru , Karnataka, India

Job Type:

Engineer

Job Shift:

Fixed Shift

Functional Areas:

IT Support

Positions:

0

Job Experience:

10 Year

Salary Period:

Monthly Pay Period