Senior Software Engineer
Redmond, WA 
Share
Posted Today
Job Description
OverviewThe Azure Compute team builds a fault tolerant, distributed system on top of commodity datacenter hardware, to deliver infrastructure for hosting cloud applications in Virtual Machines (VMs), Containers or Bare Metal. Advances by this team have helped power the latest advances in Generative AI technologies and continue to push the Compute boundaries by providing resources that are perceived to be limitless, infinitely elastic, and always available.This role is in the Availability Platform team within Azure Compute, that primarily focuses on making sure every application running on Azure is available with an SLA (Service Level Agreement) of 99.99+% curated to each application's expectations. Getting to that target and beyond requires out-of-the box thinking, backed by sound data-driven decisions. The team owns microservices that detect, diagnose, repair, attribute and report the health of millions of Azure machines, VMs and containers within seconds of latency. The team also collaborates closely with data scientists to build predictive failure models to live-migrate customer applications off machines even before the failure occurs and invoke AI-powered recovery strategies if the application is already down. Availability is one of the top KPIs (key performance indicators) for Azure and Microsoft - come be part of the team driving the platform forward on this front.As a Senior Software Engineer, you will be joining a talented team that invests in our people and technology in the long term. We emphasize comprehensive designs, incremental development with high quality, shipping frequently, demonstrating impact to Azure's leadership team and adapting quickly to feedback. Members of this team have been internally recognized through multiple Quality, Innovation and Security awards owing to their impact on the business and culture of the organization. Join us in pushing the boundaries of scale, reliability, availability, observability, and efficiency.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.Relocation assistance is unavailable for this role.
ResponsibilitiesPartners with appropriate stakeholders spanning across teams and orgs to determine project requirements and existing system behavior.Leads the design and architecture of change management features and services in Azure Compute through data-driven decision choices.Identifies dependencies, authors design documents for features and services, and communicates these solutions effectively to all stakeholders.Leverages expertise with appropriate stakeholders to develop project plans, release plans, and work items.Develops high quality, extensible, maintainable code with telemetry and configuration control and coach others to do the same.Supports and improves livesite as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.OtherEmbody our Culture and Values

 

Job Summary
Company
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Experience
Open
Email this Job to Yourself or a Friend
Indicates required fields