SRE (Infrastructure)

Site Reliability Engineer

UK - Remote RH25018

Every developer has a tab open on Stack Overflow.  

We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. 

Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in leading with empathy; creating an environment where every Stacker knows they belong. We embrace that the unique contributions and points of view of all Stackers contribute to our success.

We are a Best Company to Work For, in addition to being recognized for Best Company Leadership, Best Company Happiness, Best Company Perks and Benefits, Best Company Work-Life Balance, Best Company Compensation, and Best Company Outlook.

We are a remote-first company with Hiring HUBs based in the US, Canada, UK, and Germany.

We're targeting candidates in the UK and Germany for this role.

Stack Overflow is growing fast, and our infrastructure needs keep getting bigger as our products  scale and grow. We’re looking for a Site Reliability Engineer to join our existing team of SREs and developers to help us grow our cloud infrastructure as we transition away from our on-premise footprint. As an SRE, you will collaborate with application development teams to identify gaps and opportunities to improve reliability across our products, always looking for ways to automate manual work, and create repeatable, scalable systems and processes. We want you to suggest solutions and build tools to measure and monitor the reliability for our products.

We’re looking for someone with experience in a .NET ecosystem in an Azure environment, but are also open to folks who have non-Azure cloud experience, and are willing to learn Azure. We don’t expect you to know every other part of our stack coming in, so we’ll pair you with other members of the team to learn and develop your skills across our entire infrastructure (including our non-cloud infrastructure).  We operate in a mixed Windows and Linux environment, and expect someone in this role to have experience with one environment, and a working understanding of the other. Experience with either Networking/VPN, Elasticsearch, Redis, Azure automation or Terraform are a plus, but we’re happy to train you.

What you’ll work on:

  • Help scale our hosted Stack Overflow Teams offering.
  • Manage a high-quality production platform and promotion pipeline that ensures capacity for our Teams (Free/Basic/Business/Enterprise) customers.
  • Reduce toil through software solutions and removing or automating manual tasks, steps and workflows as we further streamline deployments and upgrades.
  • Improve the observability of our systems to help identify issues or bottlenecks by iterating on our monitoring and alerting strategies.
  • Improve our security lifecycle and compliance strategy for cloud solutions.
  • Participate in our on-call rotation (typically 1 fortnight out of 2 months).
  • Partner closely with your peers to accomplish goals within an agile software development lifecycle.

Our current ecosystem includes:

  • Microsoft Azure
  • Terraform, PowerShell, Go
  • Windows Server, IIS and .NET Core
  • Linux
  • Our toolchain includes: GitHub, Octopus Deploy, ElasticSearch, Redis, ArgoCD, ArgoWF, Cloudsmith
  • Containerization with Kubernetes

Skills & Requirements

If you don't meet all of these exact qualifications, we encourage you to apply anyway!

We’re looking for:

  • Experience writing software solutions in a high-level programming language (for example, but not limited to, Python, Golang, C#).
  • An understanding of software development lifecycle phases, from planning and development through production deployment and monitoring.
  • Willingness to learn new technologies and adapt to changing priorities.
  • Eagerness and ability to work with different types of functional groups, share knowledge, collaborate and contribute. This is particularly important given our remote first environment.
  • Demonstrated understanding of basic concepts in a cloud environment.

We like to see:

  • Experience with scripting languages (Bash, Powershell).
  • Experience with Azure or equivalent Amazon AWS, Google Cloud, etc.
  • Experience with automating repetitive tasks.
  • SQL experience (Microsoft SQL Server or Azure SQL a plus)
  • Experience with terraform or similar IaC tools
  • Containerization technologies (Docker, Swarm, Kubernetes)
  • An understanding of service level indicators and service level objectives

What you’ll get in return:

  • Competitive Base Salary 
  • Generous paid vacation
  • Generous parental leave (16 weeks at 100% pay), family care leave, and unlimited sick days
  • Equity (RSUs) for all employees at all levels
  • Industry-leading health benefits that are applicable per country of residence for all our full-time employees
  • Company-paid Life Insurance
  • Home Internet stipend
  • Professional allocation for your growth and development
  • One-time allowance to assist with your home office setup
  • Company-paid access to Calm, Bravely, LinkedIn Learning, MyAcademy and Overdrive

Stack Overflow is proud to be an equal opportunity workplace. We value diversity, inclusion, equity and belonging and these pillars are at the heart of how we work together here at Stack. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. 

For individuals based in California, and other locations where required, we will consider employment qualified applicants with arrest and conviction records.

Read our Applicant and Candidate Privacy Notice