ML Engineer for Speech Synthesis
Published: 4/8/2025
Independent contractorOn-site/Hybrid
About Valka
Valka, a visionary spin-off from the Realms Group (the parent company of Oddin.gg), is on a mission to revolutionize the way people create and experience digital content. Our team believes that content shouldn’t just be consumed; it should be co-created in real time, blurring the lines between imagination and reality. By harnessing the power of cutting-edge AI, we aim to build an interactive human-digital platform where virtual characters respond dynamically to each user’s voice, text, gestures, and more.
Position Intro:
We’re looking for an experienced ML Engineer for Speech Synthesis for a foundational role to join our new team.
You’ll develop text-to-speech and voice cloning models to create synthetic voices for our avatars that sound like public figures.
We expect you to work with state-of-the-art models and push the limits of what voice cloning and TTS can do. This role requires a solid understanding of speech synthesis, NLP, and deep learning. Experience working with large text and speech datasets is highly desirable.
You’ll build efficient training and deployment pipelines for voice models. Part of your job will be designing validation strategies that compare synthetic speech to real recordings, and creating custom metrics to measure quality.
You’ll also help set up the infrastructure for tracking experiments, making results reproducible, and serving models in production. From training on distributed systems to monitoring deployed models, you’ll be involved in the full machine learning workflow.
What You will work on:
- Design, develop, and optimize text-to-speech models with a focus on maintaining the style and authenticity of the original voice actors.
- Collaborate with teams to work on audio datasets that include voice recordings and multilingual transcriptions.
- Experiment with state-of-the-art architectures for speech synthesis, including neural TTS and voice cloning models.
- Define robust validation strategies and implement custom evaluation metrics comparing synthetic vs. real gameplay.
- Contribute to foundational MLOps practices and infrastructure; from experiment tracking, CI/CD, deployment, monitoring, and versioning.
Skills you need:
- Solid experience with deep learning frameworks (PyTorch, TensorFlow, or JAX)
- Understanding of audio processing (sampling, spectrograms, vocoders)
- Experience with training text-to-speech, and voice cloning models
- Familiarity with speech synthesis models such as WaveNet, Tacotron, and VITS
- Worked with voice cloning models (e.g., XTTS, YourTTS, or similar)
- Experience with transformers and diffusion models.
- Ability to implement ideas from research papers
- Understand machine learning and deep learning fundamentals
- Strong programming skills and experience working with Python
Join us at Valka to lead a new wave of interactive audio content—one where your creativity and technical prowess will help transform entire industries and reimagine how digital content is created, shared, and experienced.
Based in Prague, the heart of Europe, we offer flexible work arrangements. We can support relocation or facilitate regular on-site visits for those unable to relocate permanently.
This role offers a unique opportunity to shape the future of interactive video content, where digital humans can engage in meaningful and dynamic interactions with users. If you're a passionate expert with a drive to innovate and create immersive experiences, we encourage you to apply.