Observability Software Engineer - Prometheus | Grafana | Splunk | Datadog | Kubernetes | EKS
Bringing the brightest minds to collaborate and drive mutual success.Home > Careers and Jobs > Observability Software Engineer - Prometheus | Grafana | Splunk | Datadog | Kubernetes | EKS
Job Description:
We are seeking an experienced Observability Software Engineer to design, implement, and optimize scalable monitoring and observability solutions for cloud-native environments. The ideal candidate will have expertise in cloud platforms, Kubernetes, distributed systems, DevOps automation, and modern observability tools.
Responsibilities:
- Design and maintain observability solutions across AWS, Azure, and GCP.
- Support Kubernetes/EKS infrastructure and containerized workloads.
- Implement monitoring, logging, and tracing platforms using Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, Dynatrace, ELK, CloudWatch, and New Relic.
- Develop dashboards, alerts, and operational reporting.
- Build automation tools and scripts using Golang, Python, Bash, or Ruby.
- Support CI/CD and Infrastructure as Code using Jenkins, Terraform, ArgoCD, CloudFormation, Ansible, and Puppet.
- Troubleshoot distributed systems and improve system reliability and performance.
- Leverage AI-driven operational workflows and observability automation.
- Collaborate with engineering and operations teams in Agile environments.
Technologies
Cloud: AWS, Azure, GCP
Monitoring: Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, Dynatrace, ELK, CloudWatch, New Relic
DevOps: Jenkins, Terraform, ArgoCD, Ansible, Puppet, CloudFormation
Languages: Golang, Python, Ruby, PHP (Optional)
Platforms: Kubernetes, Linux, Distributed Systems
REQUIRED SKILLS & QUALIFICATIONS:
- 5+ years of software engineering or DevOps experience.
- Strong experience with AWS, Kubernetes, and cloud-native technologies.
- Hands-on experience with observability and monitoring platforms.
- Experience with Linux/Unix administration and scripting.
- Knowledge of distributed systems, SRE practices, and CI/CD pipelines.
- AWS and Amazon EKS expertise.
- Experience with OpenTelemetry and AI-assisted operational tooling.
- Familiarity with GitOps workflows and large-scale production environments.
- Strong problem-solving and communication skills.

