This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in the United States.
As a Site Reliability Engineer, you will play a critical role in designing and maintaining the reliability of cutting-edge distributed systems for global network orchestration platforms. You will work in a fast-paced, high-growth startup environment, contributing to mission-critical systems that support satellite constellations and aerospace fleets. This role demands a combination of hands-on technical expertise, strategic thinking, and leadership in platform maturity. You will define and monitor service reliability, automate infrastructure, optimize performance, and collaborate closely with engineering teams to ensure high availability and scalability. The work will directly impact large-scale aerospace operations while allowing for rapid professional growth and technical ownership.
· Design, implement, and maintain observability platforms including Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems.
· Define, monitor, and manage SLOs, SLIs, and Error Budgets to ensure mission-critical system reliability.
· Lead the automation and orchestration of infrastructure using Terraform, Kubernetes, and GitOps tools (ArgoCD).
· Optimize and develop platform tooling using Go and/or Python.
· Collaborate with engineering teams to instrument applications, troubleshoot system issues, and enhance performance.
· Participate in architecture reviews and drive platform scalability, resilience, and operational best practices.
· Implement security best practices and maintain compliance for mission-critical aerospace environments.
Requirements
· Proven experience in site reliability or systems engineering for high-availability distributed systems.
· Expert-level proficiency with observability tools: Prometheus, Grafana, Loki, OpenTelemetry, distributed tracing.
· Strong cloud and orchestration experience: Kubernetes, GCP; multi-cloud (AWS) preferred.
· Proficiency in systems programming with Go and/or Python; experience with C++ a plus.
· Skilled in Infrastructure as Code (Terraform) and GitOps workflows (ArgoCD).
· Experience with service mesh technologies (Istio/Linkerd) and HPC workloads is a plus.
· US Citizenship required; ability to obtain a security clearance (active clearance strongly preferred).
· Strong problem-solving, communication, and collaboration skills for cross-functional teams.
Benefits
· Competitive salary with bonus and equity potential.
· Health, dental, and vision insurance for employees and dependents.
· Flexible remote work and hybrid options.
· Professional development and training opportunities.
· Paid time off and company holidays.
· Opportunity to work on mission-critical aerospace and satellite systems in a high-growth startup environment.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching. When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.
📊 It compares your profile to the job’s core requirements and past success factors to determine your match score.
🎯 Based on this analysis, we automatically shortlist the 3 candidates with the highest match to the role.
🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.
The process is transparent, skills-based, and free of bias—focusing solely on your fit for the role. Once the shortlist is completed, it is shared directly with the company. The final decision and next steps (such as interviews or additional assessments) are made by their internal hiring team.
Thank you for your interest!
#LI-CL1