Research Scientist, Audio-Visual Learning (PhD)

Job(岗位): Research Scientist, Audio-Visual Learning (PhD)

Type(岗位类型): Computer Vision | Research

Citys(岗位城市): Pittsburgh, PA

Date(发布日期): 2022-1-31


Reality Labs brings together a world-class team of researchers, developers, and engineers to create the future of virtual and augmented reality, which together will become as universal and essential as smartphones and personal computers are today. Just as personal computers have done over the past 45 years, AR and VR will ultimately change everything about how we work, play, and connect. We are developing all the technologies needed to enable breakthrough AR glasses and VR headsets, including optics and displays, computer vision, audio, graphics, brain-computer interface, haptic interaction, eye/hand/face/body tracking, perception science, and true telepresence. Some of those will advance much faster than others, but they all need to happen to enable AR and VR that are so compelling that they become an integral part of our lives.

We are looking for an exceptional researcher with a proven track record in using machine learning for solving computer vision and audio problems (e.g., generative models for image and videos, deep acoustic models, multimodal fusion) as well as an outstanding software engineer who can prototype invented algorithms. This person will usher in the next era of human-computer interaction by solving open and exciting computer vision and audio problems.

As a Research Scientist at Reality Labs, you will pursue research and work with other Computer Vision and Machine Learning Researchers and Engineers to solve challenges at the forefront of computer vision that transform virtual reality from dream to reality.


Currently has, or is in the process of obtaining a Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.

Currently has, or is in the process of obtaining, a PhD and/or postdoctoral assignment in the field of computer vision, speech recognition, machine learning, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.

Proven track record in using machine learning for solving computer vision and speech processing problems.

Experience with speech generation, audio-visual learning, multimodal fusion, or audio-driven face animation.

Experience with prototyping algorithms in Python or other scripting languages.

Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.


Participating in cutting edge research in computer vision and speech processing

Developing efficient deep neural network models for audio-visual content generation

Contributing research that can be applied to Oculus product development

Publish research results in top-tier journals and at leading international conferences

Additional Requirements(额外要求)

Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading conferences (e.g., NeurIPS, ICLR, ICML, SIGGRAPH, CVPR, ECCV and ICCV) or journals (e.g., PAMI, IJCV, JMLR, ToG).

3+ years experience designing and developing computer vision or machine learning algorithms.

Experience with generative models such as GANs or VAEs for image and video generation.

Experience in speech processing and deep learning models for this domain.

Experience with multimodal fusion or deep learning with audio-visual sensor data.

Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. GitHub).

Experience solving complex problems and comparing alternative solutions, tradeoffs, and diverse points of view to determine a path forward.

Experience working and communicating cross functionally in a team environment.