空 挡 广 告 位 | 空 挡 广 告 位

Meta AR/VR Job | Codec Avatars Large Scale Experimentation Lead | Quest

Job(岗位): Codec Avatars Large Scale Experimentation Lead | Quest

Type(岗位类型): 3D Software Engineering

Citys(岗位城市): Pittsburgh, PA

Date(发布日期): 2023-5-27

Summary(岗位介绍)

Meta Reality Lab’s Codec Avatar Research team is building technology to enable immersive, photorealistic social presence. Codec Avatars are real-time live-drivable representations that match the appearance of their users. As part of the Lab’s Instant Codec Avatar group, you’ll work to scale up Codec Avatar technology by modeling the diversity of human appearance and applying that model to the process of rapidly generating new avatars.

This role is focused on our Large Scale Experimentation efforts, which both support our new Research Supercluster compute resource and uses that resource to run large-scale machine learning experiments that advance the state-of-the-art in Codec Avatar technology. In this role, you will lead a team of software engineers, research engineers, and research scientists to plan and deliver software systems needed to support large scale model training over thousands of GPUs. These systems ingest, store, and serve some of the largest ML training datasets in the world, and coordinate complex workflows composed from a mixture of traditional graphics and ML algorithms. You’ll also plan, design and execute research experiments using those workflows to advance our understanding of how appearance modeling scales over large populations.

Qualifications(岗位要求)

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.

Experience providing technical leadership for teams of 5 or more engineers

Experience with multi-node ML training workflows and frameworks

Experience developing and debugging distributed systems

Experience operating in a self-directed environment with multiple stakeholders across multiple teams

Proven communication skills, including experience driving decision making

Experience working with cross functional teams including hardware, software, network, legal, privacy and security

Proven Python experience

Proven Linux/shell scripting development experience

Experience developing and supporting reliable multi-stage data pipelines

Proven quantitative reasoning skills, analyzing trade-offs of different hardware and software solutions

Description(岗位职责)

Develop and debug machine learning workflows on a large multi-node cluster

Automation of data ingress into cluster

Implement compute allocation policy for the cluster

Define and implement strategy for compute environment management and deployment

Development of data read/access layer using proprietary framework

Define and communicate cluster software requirements, based on research needs

Enabling adoption of the cluster by additional research cases

Definition, design and implementation of automated testing

Point of contact for hardware & software questions regarding cluster capabilities

Reporting on progress, presenting technical risks, challenges and status to executive management

Partner with Data Collection and Asset Generation teams to specify and ingest assets required for large scale training

Partner with Codec Avatars Universal Avatar Research team to support large scale experimentation based on Python workflows

Partner with Research SuperCluster production engineering team to support reliable operation

Partner with Research SuperCluster storage engineering team to support development of features required for Codec Avatars datasets

Partner with security, privacy, and policy teams to ensure workflow compliance with company policy

Additional Requirements(额外要求)

Experience providing technical leadership for teams of 12 or more engineers

Masters or higher degree in Computer Science or related technical field, or equivalent experience

8+ years of experience in ML or distributed systems

Experience developing or applying computer graphics algorithms

Experience developing or applying computer vision algorithms

5+ years of experience developing workflows for large scale AI training

Understanding of deep neural network training

Experience with securing sensitive data (encryption, access control, audit logging)

Experience with HPC (High Performance Computing)

Experience with scheduling systems such as Slurm or Kubernetes

Experience with large scale object storage services (S3 or similar)

Experience in research or converting research to products

Experience using git

Experience using Conda

SQL databases experience

Modern C++ development experience

您可能还喜欢...

招聘