空 挡 广 告 位 | 空 挡 广 告 位

Meta AR/VR Job | Software Engineer - Compute Infrastructure

Job(岗位): Software Engineer - Compute Infrastructure

Type(岗位类型): Engineering

Citys(岗位城市): Pittsburgh, PA

Date(发布日期): 2022-2-22

Summary(岗位介绍)

Meta Reality Labs Research is looking for a Software Engineer Technical Lead to drive the requirements, software tooling, and adoption of an industry-leading, machine learning super cluster that will be used to process data for avatars in the Metaverse. The ideal candidate will be an expert in developing workflows on large compute clusters, as well as building tools and libraries to ensure researchers are highly productive at developing their own workflows. The role will require a high level of cross-functional collaboration with researchers, data center operations teams, data privacy and security teams, and other software engineering teams.

Qualifications(岗位要求)

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.

5+ years of experience developing workflows for large scale AI training

Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering or related field

Communication skills, including experience driving decision making

Experience working with cross functional teams including hardware, software, network, legal, privacy and security

Python experience

Linux/shell scripting development experience

Experience developing and support reliable multi-stage data pipelines

Experience with containers (Docker or similar)

Quantitative reasoning skills, experience analyzing trade offs of different hardware and software solutions

Description(岗位职责)

Define requirements of cluster hardware (storage system speed and size, number and type of CPUs and GPUs required, etc.)

Automation of data ingress into cluster

Responsible of compute allocation policies and containerization technologies for the cluster

Development of data access layer using proprietary framework

Define and communicate cluster software requirements, based on research needs

Enabling adoption of the cluster by additional research cases

Definition, design and implementation of automated testing

Point of contact for hardware & software questions regarding cluster capabilities

Work with privacy and legal teams to develop data handling policies

Reporting on progress, presenting technical risks, challenges and status to executive management

Additional Requirements(额外要求)

8+ years of experience developing workflows for large scale AI training

Understanding of deep neural network training

Experience with securing sensitive data (encryption, access control, audit logging)

Experience with HPC (High Performance Computing)

Experience with scheduling systems such as Slurm or Kubernetes

Experience with large scale object storage services (S3 or similar)

Experience in research or converting research to products

Experience with git

Experience with Conda

Experience with SQL databases

Experience with C++

您可能还喜欢...

招聘