Type of engagement
This is an external staff job position. You will have a temporary employment contract with E- Search company, in service of Microsoft.
E- Search helps Microsoft get the best externally employed candidates (Serbian citizens). We are looking for a skilled Site Reliability Engineer to join Microsoft Office Team to design the next generation of productivity experiences.
Our client Microsoft has been a leading company in computing for decades. They are a global operation, relied on by governments, utilities, schools, and co-operatives to deliver the things they need to work, every day. In order to make this work for their customers, they need continual effort to make that delivery reliable. In order to drive reliability, they need you -- someone who already is, or is interested in becoming, a Site Reliability Engineer (also known as SRE).
SREs are people who take engineering-based approaches to solving operations problems: they like infrastructure, they like seeing how big complicated things work, and most importantly, they gain great satisfaction from making it better. SREs build, monitor, and maintain the systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever and wherever they need to. SREs identify service problems and areas for improvement, and they follow up by fixing those problems.
Do you love to be in the operational thick of things? Do you have experience with DevOps and Live Site, a keen eye for detail and a drive to deliver 99.999% availability? The Azure Data Sql Team is looking for a Site Reliability Engineer to create and administrate live site infrastructure. This role will work on our monitor and alerting infrastructure and live site tools to support an excellent live site practice.
We would like to talk to you if you:
- Are interested in distributed systems and working with high scale services.
- Like to work in an fast-moving environment and you aren't afraid to change things to make them better.
- Enjoy new technological challenges and solving hard problems.
Your responsibilities will include some or all of the below:
- Troubleshoot problems or flaws affecting the availability, reliability, performance, and efficiency of components and features.
- Associated troubleshooting skills, including the ability to follow cross services call-chains across arbitrary network steps. Consequent understanding of monitoring in distributed systems.
- Respond to incidents during on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root causes.
- Suggest potential solutions to resolve and prevent recurring issues and bring them to the attention of other engineers.
- Implement, and manage monitoring solutions for quick targeting of areas needing attention.
- Develop and implement operations and live site practice dashboards discover trends and help prioritize investments.
- Contribute technical troubleshooting guides (TSGs) to our Live Site playbook.
- Drive efficiencies in tooling, visibility and monitoring, root cause analysis.
- Familiarity with scripting languages
- Working knowledge of one or more query languages including but not limited to SQL, KQL
- Foundational understanding of distributed systems design, interactions between cloud technology layers and components, dependencies at scale, and the code that defines infrastructures.
- Demonstrated troubleshooting skills across multiple layers
- Strong verbal and written communication skills with excellent interpersonal communication and collaboration skills
- Approaches problems and solutions with a growth mindset.
- Demonstrated commitment to the success of others.
- BA/BS in Computer Science, System Administration, Networking, Mathematics, and Engineering generally, or in place of a 4-year degree, an equivalent industry internship or industry engineering experience.
- 2+ years of technical experience in software engineering, network engineering, or systems administration or
- 2+ years experience in relevant site reliability engineering, cloud operations, or microservice architecture.
- Deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies
- Enjoy learning and ramping up on new technologies quickly
- Passion for data-driven decision making
- Experience in a cloud stack and leveraging cloud architecture, applying site reliability principles and/or demonstrating sensitivity to operational concerns.
- Demonstrated ability to debug, fix, and optimize code.
- Excellent written and verbal communication skills a plus.