Sr. Site Reliability Engineer (SRE)

  • Full Time
  • Dubai
  • Posted 2 months ago

GETTR

Job Description

Challenges you will solve:

  • Participate in all stages of infrastructure provisioning, primarily providing the staging and production support.
  • Assist in implementation of security best practices and initiatives at all levels of the systems infrastructure.
  • Adhere with SRE (Site Reliability Engineering) principles/pillars on incident management and service level objectives.
  • Work closely with DevOps engineers to apply/improve the automation scripts and system designs shared by DevOps to improve systems efficiency in production environment.
  • Ensure maximum uptime and stability of cloud and on-premises environments, especially in staging and production environments.
  • Apply the latest OS and security patches ensuring the compatibility of underlying running application.
  • Lead on conducting in the disaster recovery/business continuity (DRBC) routine exercises.
  • Handle help desk & JIRA tickets and mitigate any production issues.
  • Ensure accurate knowledge base documentation in a timely manner.

Requirements:

  • Strong knowledge of secure web app deployments in AWS (4+ years).
  • Advanced experience as a Linux or Windows server administrator.
  • The ability to work with little supervision; must be self-driven and motivated.
  • Experience with continuous integration/continuous delivery (CI/CD) Jenkins and Git.
  • Experience with containerized microservices delivered with Docker, Kubernetes (Kops, AWS EKS), or OpenShift 4.x.
  • Manage & optimize unified logging system and APM (Application Performance Management) monitoring tools, constantly reduce the MTTR (Mean Time to Recovery).
  • Strong experience with hybrid infrastructure systems monitoring and proactive incident management.
  • Strong scripting skills using Shell and Python or Go (a plus).
  • Some knowledge of web application programming languages (such as JavaScript, NodeJS, Java, etc.).
  • Ability to proactively triage on troubleshooting urgent production issues under high time pressure with precision.
  • Experience in working collaboratively with various applications development teams throughout the organization to resolve mission critical problems.
  • Excellent written and oral communication skills necessary to produce and process technical documents.
  • Excellent problem-solving and analytical skills and the ability to translate business requirements into information systems solutions.
  • Experience with IT security.
  • Someone who is a team player.
  • Familiarity/experience with the DevOps process.
  • Professional IT certifications, such as Red Hat Certified Engineer/Windows Server, and AWS certifications (a huge plus).
  • Relevant work experience (8+ years), either in software development or IT infrastructure.
  • Masters degree in technology related, engineering or computer science (a plus).
  • Participate in a weekly on-call rotation (~every 3-4 weeks) as needed.
  • Provide mission critical production support in case of an outage during off business hours if necessary.

To apply for this job please visit www.careerjet.ae.

Job Overview
Job Location