DescriptionThis is a hands-on engineering role that requires deep technical knowledge, strong autonomy, and a proactive mindset. You will be expected to handle complex technical troubleshooting, work closely with cross-functional teams, and continuously improve the reliability and efficiency of the cloud infrastructure.
#LI-Hybrid #LI-SD2
Responsibilities
- Provision, configure, and maintain complex AWS environments to support application deployments across multiple regions and availability zones.
- Support and optimize infrastructure and application deployments, ensuring consistency, security, and high availability across all environments.
- Investigate and resolve complex incidents, performing deep root cause analysis and implementing preventive measures.
- Drive continuous improvement through environment tuning, monitoring enhancements, automation, and infrastructure optimization.
- Support and enhance BAU operations, including OS and application patching, upgrades, certificate management, and maintenance tasks.
- Work closely with application, security, and infrastructure teams to ensure environment readiness and performance alignment with business needs.
- Participate in Disaster Recovery planning and testing, and support clients during DR events.
- Maintain and improve technical documentation, standard operating procedures, and environment runbooks.
- Participate in the on-call rotation and act as an escalation point for advanced technical issues.
- Contribute to automation initiatives using PowerShell or infrastructure-as-code tools.
Qualifications
- 5+ years of experience in Systems Engineering or Cloud Operations, preferably in enterprise or multi-account environments.
- Expertise in AWS core services, including but not limited to: EC2, S3, IAM, VPC, CloudWatch, ELB/ALB, Lambda, and CloudTrail.
- Solid experience with cloud-based deployments and supporting high availability, multi-AZ/multi-region architectures.
- Strong hands-on skills in Windows Server administration and troubleshooting, including Event Logs, Services, Registry, and performance analysis.
- Good understanding of Active Directory and SSO integration (e.g., Azure AD).
- Experience managing and troubleshooting Microsoft SQL Server in production environments (performance, connectivity, backups).
- Familiarity with SQL Server Always On Availability Groups (AAG) and the use of listeners for high availability.
- Strong knowledge of networking fundamentals: DNS, routing, NAT, VPNs, firewalls, TLS/SSL, and load balancing.
- Proficiency in PowerShell and/or other scripting languages for automation and diagnostics.
- Experience with CI/CD pipelines and Infrastructure as Code tools (e.g., CloudFormation, Terraform); Azure DevOps experience is a plus.
- Exposure to AppStream 2.0, Citrix or similar desktop streaming technologies is a strong advantage.
- Familiarity with DR strategies and real-world experience in disaster recovery testing.
- Understanding of ITIL principles and experience working within ITSM frameworks (incident, change, problem management).
- Analytical mindset with a structured and proactive approach to problem-solving.