Join a team at the forefront of ML infrastructure and generative AI, where data and model workflows come together to enable the next generation of intelligent experiences on Apple products and services. We build robust systems that connect scalable data pipelines with advanced ML workflows, accelerating the development of real-world AI applications. Our work spans the full ML lifecycle, from experimentation to deployment, and you’ll play a key role in shaping how AI models are built, optimized, and scaled. We develop a platform for ML data and features that powers advanced GenAI applications. This includes embeddings (generation, evaluation, ANN search, multimodal support), AI Ops, efficient inference, and a modern feature platform designed to streamline experimentation and drive innovation. We’re looking for engineers and researchers passionate about generative models, data-centric ML, and intelligent systems across diverse real-world use cases. With the autonomy to experiment, the scale to make an impact, and the support to take ideas from prototype to production, you’ll work alongside a world-class team to build intelligent, flexible systems that make ML development faster, more reliable, and more creative. \\n
The Apple Cloud AI Platform team enables Apple"s next generation of intelligent products by giving Apple"s ML engineers and researchers the data systems and large-scale compute they need to build and ship models at Apple"s bar for quality and privacy.
As a member of the Apple Cloud AI Platform team, your responsibilities will include:\\nDesign and build the platform behind Apple"s largest model builds — ingestion, immutable versioning, lineage, and governance across structured, unstructured, and multimodal data at petabyte scale, so every model run is reproducible from a versioned dataset\\nDevelop and evolve Python SDKs and core data libraries that ML engineers depend on to access, transform, and load model-ready datasets across every stage of model development\\nBuild high-throughput data access and loading primitives that feed Apple"s largest GPU fleets, keeping workloads compute-bound rather than I/O-bound\\nBuild and operate distributed data pipelines spanning Spark, Daft, and Rust-based systems for ingestion, transformation, and large-scale data preparation\\nOptimize platform components for tight integration with leading ML frameworks — PyTorch, JAX, and TensorFlow — so dataset access is a first-class concern in the model development loop\\nPartner with research and product teams to onboard new data sources, and enable rapid iteration on datasets powering GenAI workloads\\nEnsure governance is a first-class platform capability: Legal Terms of Use enforcement, privacy controls, and end-to-end data lineage on every dataset version\\nDrive efficiency, reliability, and automation across the data plane and control plane that power Apple"s ML fleet\\nContinuously evolve platform capabilities to support next-generation workloads, including foundation models, multimodal data, and retrieval-augmented systems\\nDiagnose, fix, and automate away complex issues across the stack — from ingestion pipelines to dataset APIs to ML framework integrations — to maximize uptime and throughput\\n\\n
Strong foundation in machine learning, with hands-on experience across the end-to-end ML workflow - including data preparation, pipeline development, experimentation, evaluation, and deployment\\nExpertise in building and running large scale distributed systems\\nFamiliarity with modern generative techniques (e.g. transformers, diffusion, retrieval-augmented generation)\\nProven experience building and delivering data and machine learning infrastructure in real-world production environments\\nFamiliarity with fine-tuning workflows, model optimization, and preparing models for scalable inference\\nFamiliarity with generative AI and its applications in accelerating and enhancing machine learning workflows\\nExperience configuring, deploying and troubleshooting large scale production environments\\nExperience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use\\nExtensive programming experience in Java, Python or Go\\nStrong collaboration and communication (verbal and written) skills\\nComfortable navigating ambiguity and evolving technical landscapes, especially in fast-moving areas\\nB.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience\\n
Experience in any of the below is preferred: \\nProficiency with one or more modern ML frameworks (PyTorch, JAX, or TensorFlow), particularly the data loading and dataset access layer \\nColumnar and lakehouse formats: Parquet, Iceberg, Delta, or Lance \\nDistributed data loading frameworks for ML: Ray Data, NVIDIA DALI, WebDataset, or Mosaic StreamingDataset \\nPerformance engineering for I/O-bound workloads — Arrow, zero-copy, memory mapping, async I/O \\nHigh-throughput object storage access patterns at GPU scale \\nData lineage and governance systems (DataHub, OpenLineage, Unity Catalog, or equivalent) \\nContributions to or operational experience with Spark, Daft, Polars, or DuckDB internals \\nContainerization and orchestration technologies (Docker, Kubernetes)\\n