Principal Software Engineer, ML Platform - Game Tech Group
at Riot Games
Los Angeles, United States
Riot Engineers bring deep knowledge of specific technical areas but also value the chance to work in many broader domains. As a Software Engineer, you’ll also dive into projects that focus on team cohesiveness and cross-team goals. You’ll lead without authority and provide other engineers with a clear illustration of extraordinary engineering.
As the Principal Software Engineer on the ML Platform team, you’ll embrace MLOPs concepts– architecting, building, and leading core infrastructure for ML model deployment, observability, and lifecycle management. This role is foundational in bringing modern ML inference practices to our game and platform teams—reducing time-to-deploy, reducing effort of operations, avoiding a cycle of reinventing the wheel, scaling infrastructure intelligently, and enabling data scientists, ML engineers, and product teams to move faster leveraging production grade ML with confidence. You’ll design for both GPU and CPU workloads, live testing (A/B and shadow modes), model versioning—creating a platform that is robust, cost-efficient, and extensible for Riot’s present and future needs. Your work will directly enable high impact ML use cases across Riot’s games and internal teams. This engineer will be a part of founding a new ML Platform team and serve as its first technical leader— writing initial critical systems code, and guiding long-term strategy and direction as the team grows. You'll report directly to the Director of Data Science.
Responsibilities:
- Architect and implement Riot’s core ML inference infrastructure, with a focus on both live inference and nearline batch inference for scalable model serving, CPU and GPU-aware orchestration, and automated deployment pipelines.
- Partner with researchers, game teams, and platform engineers to understand product needs and deliver generalizable, reusable solutions.
- Define and build CI/CD workflows for ML artifacts—supporting rapid iteration and safe promotion from dev to production and MLOps practices.
- Own tooling for environment and dependency management strategies (e.g., Conda/Poetry lock files, secure image builds) for ML runtimes.
- Instrument and emit platform metrics for observability, model monitoring, drift detection, CPU/GPU utilization, and latency SLAs.
- Establish patterns and tooling for multi-version model support, blue/green and shadow deployments, and rollback.
- Be thoughtful on developer UX and incorporate an iterative approach to improving.
- Serve as the technical founding voice for a new platform—defining long-term architecture, mentoring incoming engineers, and collaborating on hiring.
- Contribute upstream to shared infra initiatives and build a feedback loops and collaboration models with other Riot platform teams
Required Qualifications:
- 10+ years of experience in software engineering, with substantial time spent in platform or infrastructure teams
- Proven technical leadership in building large scale distributed systems, production ML systems or model serving infrastructure at scale
- Deep experience with cloud-native systems (e.g., Kubernetes, containerization, autoscaling, observability stacks)
- Experience with one or more inference serving frameworks (e.g., NVIDIA Triton, KServe, TorchServe, BentoML, Seldon Core etc)
- Familiarity with GPU orchestration, performance tuning, and cost-aware scheduling
- Strong background in CI/CD automation, IaC tools (e.g., Terraform), and artifact management
- Hands-on experience with Python ML ecosystems, package management (Poetry, Conda etc), and vulnerability scanning
- Ability to mentor engineers, write clear documentation, and influence cross-functional stakeholders
Desired Qualifications:
- Experience building ML infrastructure within a real-time, or latency-sensitive environment
- Familiarity with ML workflow tools (MLFlow, DVC, LakeFS, etc) and drift monitoring strategies
- Exposure to AB testing and experimentation frameworks, especially in online model evaluation
- Prior success in founding or greenfield platform work, especially building toward multi-tenancy or self-service capabilities
- Passion for player experience, game systems, or creative technology development
- Familiarity/experience with technical deployments in China, particularly in Tencent.
For this role, you'll find success through craft expertise, a collaborative spirit, and decision-making that prioritizes the delight of players. We will be looking at your past studies, experience, and your personal relationship with games. If you embody player empathy and care about players' experiences, this could be your role!
Our Perks:
Riot focuses on work/life balance, shown by our open paid time off policy and other perks such as flexible work schedules. We offer medical, dental, and life insurance, parental leave for you, your spouse/domestic partner, and children, and a 401k with company match. Check out our benefits pages for more information.
At Riot Games, we put players first. That mission drives every decision in our quest to create games and experiences that make it better to be a player. Whether you’re working directly on a new player-facing experience or you’re supporting the company as a whole, everyone at Riot is part of our mission. And just like in our games, we’re better when we work together. Our goal is to create collaborative teams where you are empowered to bring your unique perspective everyday. If that sounds like the kind of place you want to work, we’re looking forward to your application.
It’s our policy to provide equal employment opportunity for all applicants and members of Riot Games, Inc. Riot Games makes reasonable accommodations for handicapped and disabled Rioters and does not unlawfully discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, handicap, veteran status, marital status, criminal history, or any other category protected by applicable federal and state law. We consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with applicable federal, state and local law, including the California Fair Chance Act, the City of Los Angeles Fair Chance Initiative for Hiring Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, the San Francisco Fair Chance Ordinance, and the Washington Fair Chance Act.
Per the Los Angeles County Fair Chance Ordinance, the following core duties may create a basis for disqualifying candidates with relevant criminal histories:
- Safeguarding confidential and sensitive Company data
- Communication with others, including Rioters and third parties such as vendors, and/or players, including minors
- Accessing Company assets, secure digital systems, and networks
- Ensuring a safe interactive environment for players and other Rioters
These duties are directly related to essential operations, safety, trust, and compliance obligations within our organization. Please note that job duties may evolve based on business needs and additional responsibilities may be assigned as necessary to maintain operational efficiency and security.