空 挡 广 告 位 | 空 挡 广 告 位

Meta AR/VR Job | Research Engineer - Codec Avatar ML Compute Team

Job(岗位): Research Engineer - Codec Avatar ML Compute Team

Type(岗位类型): Data Engineering

Citys(岗位城市): Pittsburgh, PA

Date(发布日期): 2024-4-9

Summary(岗位介绍)

Reality Labs Research (RL-R) brings together a diverse and highly interdisciplinary team of researchers and engineers to create the future of augmented and virtual reality. As a member of the Codec Avatars ML Compute Infrastructure team, you'll have the exciting opportunity to contribute to the advancement of our Codec Avatar technology. Your role will involve delivering data, tools, and libraries within our super clusters, playing a crucial part in our technological progress.

Our team cultivates an honest and considerate environment where self-motivated individuals thrive. We encourage a strong sense of ownership and embrace the ambiguity that comes with working on the frontiers of research. In this research engineer role on the Codec Avatar ML Compute team, you will serve as the point of contact for Meta's research GPU super clusters, parallelizing massive ML models and data, and optimizing other compute resources to enable groundbreaking research in relightable avatars, full-body avatars, and generative AI for codec avatars.

Qualifications(岗位要求)

Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.

3+ years experience coding in C++ and Python

Experience in large scale ML system performance measurement, logging, and optimization

Experience in writing system level infrastructure, libraries, and applications

Prior experience in ML libraries such as PyTorch, TensorFlow, or cuDNN

Experience with software development practices such as source control, unit testing, debugging and profiling

Experience in developing performant software and systems

Description(岗位职责)

Build efficient and scalable machine learning tooling for the GPU clusters within Meta research labs, a heterogeneous environment containing diverse system architectures and research workload

Build efficient and scalable data tooling for massive ML training data preprocessing and postprocessing using thousands of CPU / GPUs nodes

Provide on-call support and lead incident root cause analysis through multiple infrastructure layers (compute, storage, network) for GPU clusters and act as a final escalation point

Work side by side with research scientists and engineers to take full advantage of modern GPUs for large scale multi-GPU training jobs impact

Collaborate in a diverse team environment across multiple scientific and engineering disciplines, making the architectural tradeoffs required to rapidly train large scale ML models

Provide guidance to other engineers on best practices to build mature tools which are highly reliable, secure, and scalable

Influence outcomes within your immediate team, peer engineering teams, and with cross-functional stakeholders

Ability to work independently, handle large projects simultaneously, and prioritize team roadmap and deliverables by balancing required effort with resulting

Additional Requirements(额外要求)

Prior experience in large scale machine learning model training, including model parallelization strategies

Prior experience in machine learning model compiler

Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of researchers, developing tools to improve research experience, providing guidance on best practices, coordinating distribution of compute/storage resources, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies

Prior experience building tooling for monitoring and telemetry for large scale supercomputers

Prior experience in debugging performance issues for large scale ML training tasks

Prior experience in GPGPU development with CUDA, OpenCL or DirectCompute

Familiar with Linux observability tools, such as eBPF

您可能还喜欢...

招聘