We're seeking a solution-oriented machine learning engineer with strong software development skills to architect, build, and maintain innovative evaluation solutions and tools for large-scale statistical assessment of GenAI-powered products, models, and AI agents. As a key member of our team, you'll deliver evaluation-as-a-service solutions that empower product and modeling teams across Apple to run comprehensive statistical evaluations, generate actionable metrics and insights, and make informed shipping decisions.
What You'll Do:
Partner with cross-functional teams to translate evaluation needs into robust technical solutions for conversational AI, language models, and AI agent capabilities
Own end-to-end requirements gathering, proof-of-concept development, and co-drive the development roadmap for ML system evaluation platforms
Design and implement scalable solutions that enable statistical analysis of product experiences, model performance, and AI agent behavior at scale
Drive system integration efforts and influence how evaluation software is incorporated into ML model and AI agent CI/CD pipelines
Develop monitoring and observability solutions to provide deep insights into platform performance, evaluation quality, and AI agent reliability
Build specialized evaluation frameworks for AI agents, including multi-step reasoning assessment, tool usage validation, and agent interaction quality measurement
Iterate rapidly based on stakeholder feedback while maintaining platform reliability and performance across diverse AI workloads
The ideal candidate thrives in fast-paced environments, combines strategic thinking with hands-on problem-solving, and is passionate about enabling data-driven decisions that enhance Apple product experiences for millions of users. You'll be instrumental in building the next generation of evaluation infrastructure that supports Apple's expanding AI agent capabilities.
10+ years of professional software development experience with demonstrated expertise in designing, implementing, and optimizing large-scale, data and compute-intensive frameworks, APIs, and tools
Strong software engineering capabilities including system design, backend development, testing, debugging, release management, and production maintenance
Expert-level proficiency in Python (required) and at least one additional object-oriented programming language (e.g., Swift, Java, Go)
Solid experience with service-oriented architecture and distributed systems design patterns
Backend development expertise with experience building scalable APIs, microservices, and platform infrastructure
ML lifecycle familiarity including exposure to data preprocessing, model training, evaluation methodologies, deployment strategies, monitoring approaches, and AI agent development workflows
Statistical evaluation methodology knowledge including experience with ML training pipelines, model accuracy assessment, performance optimization techniques, and AI agent evaluation frameworks
Platform and infrastructure mindset with ability to develop long-term strategic visions and execute scalable solutions in agile, fast-paced work environments
Cross-functional collaboration skills with strong organizational abilities and experience working effectively with multiple stakeholders across product, engineering, and research teams
Communication excellence with demonstrated ability to document complex technical concepts and present solutions to diverse audiences
AI agent evaluation experience including familiarity with agentic workflows, multi-step reasoning assessment, tool usage validation, and autonomous system reliability measurement
Educational background with BS or MS in Computer Science, Software Engineering, or related technical field preferred