Apple AR/VR Job | AIML - Senior Software Engineer, Foundation Models, On-Device Machine Learning
Job(岗位): AIML - Senior Software Engineer, Foundation Models, On-Device Machine Learning
Citys(岗位城市): Santa Clara Valley (Cupertino), California, United States
Date(发布日期): 2023-6-14
Summary(岗位介绍)
At Apple, the AIML - On-Device Machine Learning group is responsible for accelerating the creation of amazing on-device ML experiences, and we are looking for a senior software engineer to help define and implement features that accelerate and compress large foundation models in our on-device inference stack. We are a dedicated team working on ground breaking technology in the field of natural language processing, computer vision and artificial intelligence. We are designing, developing, and optimizing large-scale language/vision/multi-modal models that power on-device inference capabilities across various Apple products and services. This is a unique opportunity to work on powerful new technologies and contribute to Apple's ecosystem, with a commitment to privacy and user experience impacting millions of users worldwide.
Are you someone who can write high-quality, well-tested code and collaborate cross-functionally with partner HW, SW and ML teams across the company? If so, come join us and be part of the team that is helping Machine Learning developers innovate and ship enriching experiences on Apple devices!
Qualifications(岗位要求)
Proven programming skills using standard ML tools such as C/C++, Python, PyTorch, Tensorflow, CUDA/Metal.
Solid understanding of state-of-the-art DNN optimization techniques and how they translate to hardware acceleration architectures, and a general ability to reason about system performance (compute/memory) tradeoffs
Hands-on experience working (training, fine-tuning, optimizing, deploying) with large foundational models (e.g. LLMs).
Hands-on experience applying common machine learning optimization techniques, like quantization and sparsity-induction, to reduce the resource consumption and/or eliminate latency
Experience building APIs and/or core components of ML frameworks
Capacity to iterate on ideas, work with a variety of partners from all parts of the stack — from Apps to Compilation, HW Arch, and Power/Performance analysis
Proven track record to analyze sophisticated and ambiguous problems
Disciplined programming abilities with a strong attention to detail
Strong applied experience with compiler technology to work with CPU, GPU, and ML accelerators
Excellent problem-solving (e.g. via building forward-looking prototype systems), critical thinking, strong communication, and collaboration skills
Description(岗位职责)
As a member of this team, the successful candidate will:
- Build features for our on-device inference stack to support the most relevant accuracy preserving, general purpose techniques that empower model developers to compress and accelerate foundation models in apps
- Convert models from a high-level ML framework to a target device (CPU, GPU, Neural Engine) for optimal functional accuracy and performance
- Write unit and system integration tests to ensure functional correctness and avoid performance regressions
- Diagnose performance bottlenecks and work with HW Arch teams to co-design solutions that further improve latency, power, and memory footprint of neural network workloads
- Analyze impact of model optimization (compression/quantization etc) on model quality by partnering with foundation modeling and adaptation teams across diverse product use cases.