Meta AR/VR Job | SiteOps Global Production Operations Lead Engineer
Job(岗位): SiteOps Global Production Operations Lead Engineer
Type(岗位类型): Data Center Operations
Citys(岗位城市): Los Lunas, NM
Date(发布日期): 2023-5-27
Summary(岗位介绍)
Meta is seeking a technical leader to collaborate and guide Production Operations functions in our Data Center Site Operations team. The Production Operations team plays a key role at each of our data centers, assuring high reliability and availability of the server infrastructure required to meet the needs of more than 2 billion people actively engaged with Meta and our suite of applications. We partner closely with vendors and others at Meta, including infrastructure tooling & software development teams, product engineers & service owners, hardware design & manufacture, logistics & supply chain operations, quality & data analytics, project management, production & operations incident management, and maintenance management.
The Production Operations Lead Engineer will assure exceptional availability and reliability of our hyper-scale fleet of servers. We seek a Subject Matter Expert who can continue to drive innovation in this space, spanning people, processes, infrastructure, reliability, tooling, automation, cost and quality. Ensuring high availability of our servers requires effective spare parts management, identification of improvements to ensure quality parts are on hand. We seek someone who can quickly understand and respond to the technical needs of subject matter experts, local Site leadership, and our Production Operations teams, in a rapidly evolving technical environment. The scope includes spare part management and planning. The successful candidate will gain alignment across these globally distributed teams and partner organizations, driving initiatives that deliver the most impact by prioritizing resources and focus areas.
Qualifications(岗位要求)
Decision-making and problem-solving skills
Interpersonal, partnership and communications skills
Proven experience as Engineering or Operations Director, or relevant Senior Technical, Operations or Engineering Lead role
Data Center Logistics experience
Organizational, technical, and leadership skills
BSc/BA in technical field or commensurate experience
Working knowledge of IT/Operations Infrastructure
Prioritization skills and proven experience leading tooling, systems, automation and process
Description(岗位职责)
Responsible for exceptional uptime, quality, and reliability of Facebook’s global fleet of data center servers, assuring the Production Operations team meets or exceeds all operational targets
Organize and drive the needs and priorities of the Production Operations team in internal and partner forums, as the technical expert in this space
Build trusted relationships within the team, to understand the biggest challenges and opportunities, and to advocate effectively for the right initiatives
With partner organizations, collaboratively drive a roadmap that scales Site Operations, delivering high impact advances in tooling, hardware, and workflow
Drive a singular operations strategy, goals, and priorities for the global Production Operations function within Site Operations
Measure and benchmark the effectiveness of operational processes both internally and externally, setting performance targets and driving improvements as needed
Develop scaling strategies and plans, be forward thinking by understanding infrastructure growth, identifying scaling issues before they occur, and contributing to solutions
Liaise with site teams and logistic partners to identify improvement areas, drive changes and innovation to increase parts availability
Works with site teams to address material shortages and develop best practices and alternative solutions for managing the associated repairs
Establish practices to collect failure data on parts and then use that data to drive improvements in repair flows
Understand the cost trade-offs for various parts and repair flows and make recommendations that ensure high availability while managing total cost of ownership
Present a single message, representing SiteOps, to our logistics and procurement partners on spare part strategy and improvement areas
Ensure robust, timely communications across a globally distributed team, and provide the team great visibility to progress and strategy
Develop close partnerships with Program Management, Tooling, Hardware Design, Data Analytics, Manufacturing, Sourcing, Logistics and other teams to deliver superb operational results and manage the performance of external vendors
30% - 40% travel required