Job Title: Senior Site Reliability Engineer (SRE)
Location: Bengaluru (Hybrid )
Experience: 6 – 10 Years
Client : SONY
Salary : 30LPA ( negotiable for right candidates )
Important Note:
• We are considering only local candidates for this requirement.
• Candidates must be available for face-to-face interviews on short notice.
🔎 Job Overview
We are looking for a Senior Site Reliability Engineer (SRE) with strong expertise in observability, cloud-native platforms, and Kubernetes-based systems. This is a hands-on role focused on building, operating, and improving reliable, scalable, and observable platforms in GCP (preferred) and AWS environments.
Key Responsibilities
🔹 Reliability & Operations
• Design and maintain highly available, resilient systems on Kubernetes
• Define and manage SLOs, SLIs, and error budgets
• Lead incident response, perform RCA, and drive blameless postmortems
• Improve platform reliability through automation and tooling
🔹 Observability (Core Focus)
• Build and operate centralized observability platforms (metrics, logs, traces, alerts)
• Hands-on with Prometheus, Alertmanager, Grafana
• Logging & tracing using ELK / OpenSearch, Loki, OpenTelemetry
• Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
• Define actionable and noise-free alerting standards
🔹 Cloud & Platform Engineering
• Build and manage infrastructure on GCP (preferred) or AWS
• Operate Kubernetes clusters (GKE preferred, EKS acceptable)
• Deploy services using Helm
• Manage containerized workloads with Docker
• Use Terraform / Ansible / Packer for infrastructure automation
🔹 Automation & Tooling
• Strong Python skills for automation and reliability tooling
• Build internal tools for observability, SLO tracking, and incident workflows
• Integrate CI/CD pipelines (Jenkins) with reliability and observability checks
🔹 Collaboration & Leadership
• Mentor junior engineers
• Influence architecture and reliability best practices
• Collaborate closely with platform, application, and cloud teams
✅ Mandatory Skills
• Site Reliability Engineering (SRE)
• Python ( Coding ) not just scripting
• ELK stack
• Kubernetes
• AWS and/or GCP
• Prometheus, Grafana
• Docker, Helm
• Terraform
• Linux
• CI/CD (Jenkins)
⭐ Nice to Have
• Splunk, Datadog, Cribl, Vectors
• OpenTelemetry
• Multi-cloud experience
• Platform security exposure
📅 Project Highlights
• Build and operate a centralized observability platform
• Drive SLOs and error budgets to reduce MTTR
• Lead production incident response
• Optimize scalability, performance, and cloud costs
Act as a technical leader for SRE & observability initiatives
Date Posted:
26th Mar, 2026
Expiration date:
31st Aug, 2026
Location:
Bengaluru , Karnataka, India
Job Type:
Engineer
Job Shift:
Fixed Shift
Functional Areas:
IT Support
Positions:
0
Job Experience:
10 Year
Salary Period:
Monthly Pay Period