Microsoft has been a leading company in computing for decades. We are a global company, relied on by companies, governments, utilities, stores, schools, universities and co-operatives to deliver the things they need to work, every day.
In order to make this work, we need to make it reliable. In order to make it reliable, we need you — someone who already is, or is interested in becoming, a **Site Reliability Engineer** (also known as SRE).
Site Reliability Engineering is a hybrid role, comparatively rare in industry but crucially important to how things work behind the scenes today.
SREs are people who take engineering-based approaches to solving operations problems; we like infrastructure, we like seeing how the big complicated thing works, and most importantly, we gain great satisfaction from making it better. We have backgrounds in lots of things — yes of course, Computer Science, System Administration, Networking, Mathematics, and Engineering generally, but you can also find folks who’ve worked in Physics, Chemistry, Computational Biology, Statistics, and even English.
Site Reliability Engineers build, monitor, and maintain the systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever they need to. We identify service problems and areas for improvement, and we help implement solutions. Our work is key to the success of many of the Microsoft services you’ll have heard of, and a number you haven’t. There are very few bits of Microsoft which aren’t touched by SREs in some way or other.
As an SRE in the Developer Services group you will:
+ Build solutions that boost the reliability, performance and security of Microsoft’s developer services – CI/CD pipeline related – and automate and simplify how developers work.
+ Collaborate with other engineers to design and deliver solutions for automatic mitigation, telemetry & monitoring, disaster recovery, capacity management and platform automation in general.
+ Perform deep investigations that stretch your skills as you traverse rich telemetry streams to isolate and solve complex performance and reliability issues for online services.
+ Collaborate very closely with Azure, and some other MS internal groups, to design, operate and optimize large-scale, online services used by teams and businesses across the globe.
+ Continually innovate and push technology to the limit both on scale and design, within our developer services scope.
Our SREs are focused on our customers and the service design that enables them to trust us. As we drive the maturity of our service we regularly influence and/or contribute improvements in both our services and the Azure platform.
What we are looking for:
+ Minimum of 2 years of Software development and automation experience.
+ Troubleshooting skills across network, application, caching, queuing, load-balancing, storage and distributed services layers.
+ Ability to conceptualize a distributed service, it’s dependencies and the transactional flow when troubleshooting.
+ Practical experience running large scale online systems built on Azure or similar cloud providers.
+ At least 2 years of experience designing and implementing solutions for platform and application layer telemetry and monitoring.
+ Experience coordinating resources across diverse teams to restore service and maintain SLA’s.
+ Communication skills are a key component of this role with audiences that include customers, peers and at times executive leadership.
At Microsoft’s Developer Division (DevDiv) we envision, create, and run a broad array of online services used by developers and teams around the world. Our services run at a massive scale and continue to grow. They are built over both public and private clouds and have been architected with common capabilities and patterns that help us to operate consistently and efficiently.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.