Oddin.gg

ML Engineer for Video Generation

Published: 4/8/2025

Independent contractorOn-site/Hybrid

About Valka

Valka, a visionary spin-off from the Realms Group (the parent company of Oddin.gg), is on a mission to revolutionize the way people create and experience digital content. Our team believes that content shouldn’t just be consumed; it should be co-created in real time, blurring the lines between imagination and reality. By harnessing the power of cutting-edge AI, we aim to build an interactive human-digital platform where virtual characters respond dynamically to each user’s voice, text, gestures, and more.

This is your chance to join a diverse group of innovators who are driven to redefine what’s possible in generative content. Together, we’re changing the paradigm from passive viewing to active participation, unlocking new creative frontiers across gaming, entertainment, education, and beyond.

Position Intro:

We’re looking for an experienced ML Engineer for Speech Synthesis for a foundational role to join our new team.

You’ll develop text-to-speech and voice cloning models to create synthetic voices for our avatars that sound like public figures.

We expect you to work with state-of-the-art models and push the limits of what voice cloning and TTS can do. This role requires a solid understanding of speech synthesis, NLP, and deep learning. Experience working with large text and speech datasets is highly desirable.

You’ll build efficient training and deployment pipelines for voice models. Part of your job will be designing validation strategies that compare synthetic speech to real recordings, and creating custom metrics to measure quality.

You’ll also help set up the infrastructure for tracking experiments, making results reproducible, and serving models in production. From training on distributed systems to monitoring deployed models, you’ll be involved in the full machine learning workflow.

What you will work on:

  • Design, develop, and optimize video generation models using diffusion techniques, with a focus on maintaining consistency, realism, and style.
  • Work closely with other teams on large-scale video datasets, including human motion and gestures, facial expressions, and scene context.
  • Experiment with cutting-edge diffusion architectures for controllable and high-quality video synthesis.
  • Define robust validation strategies and implement custom evaluation metrics comparing synthetic vs. real gameplay.
  • Contribute to foundational MLOps practices and infrastructure; from experiment tracking, CI/CD, deployment, monitoring, and versioning.

Skills you need:

  • Solid experience with deep learning frameworks (PyTorch, TensorFlow, or JAX).
  • Understanding of video processing.
  • Experience with training diffusion models for image and video generation.
  • Worked with generative models such as VAE, GAN, and Diffusion models.
  • Ability to read and implement ideas from research papers
  • Proficient in Python and working with large datasets and GPU environments
  • Understand core machine learning and deep learning concepts
  • Strong programming skills and experience working with Python

Join us at Valka to lead a new wave of interactive video content—one where your creativity and technical prowess will help transform entire industries and reimagine how digital content is created, shared, and experienced.

This role offers a unique opportunity to shape the future of interactive video content, where digital humans can engage in meaningful and dynamic interactions with users. If you're a passionate AI/ML expert with a drive to innovate and create immersive experiences, we encourage you to apply.