Welcome! I am a postdoctoral researcher at Media Technology Center, ETH Zürich. I am leading the project on the development of the first voice assistant that can speak different Swiss German dialects by creating low-resourced neural machine translation and text-to-speech synthesis models (project video).
Previously, I was a joint doctoral student of Computer Science at Disney Research Studios and in Computer Graphics Laboratory at ETH Zürich, where I was advised by Markus Gross . I am completed my masters studies at the Department of Electrical Engineering at EPFL, and spent time at EMPA, Disney Research Zürich, and Disney Research Pittsburgh as an intern, and University of British Columbia as a visitor.
My general research interests lie in image processing, video processing, natural language processing, speech synthesis, visual-textual data alignment, computer vision. More specifically, my research is mostly about exploring the correspondances between visual, textual and audio elements.
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German arXiv preprint
Enriching Video Captions With Contextual Text International Conference on Pattern Recognition 2020 (ICPR)
Neural Sequential Phrase Grounding (SeqGROUND) Conference on Computer Vision and Pattern Recognition 2019 (CVPR)
Controlling Motion Blur in Synthetic Long Time Exposures Eurographics 2019
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) Conference on Computer Vision and Pattern Recognition 2018 (CVPR) (Spotlight)
Label-Based Automatic Alignment of Video with Narrative Sentences European Conference on Computer Vision 2016, Workshop on Web-scale Vision and Social Media
A Simple, Fast and Low-cost Method for in Situ Monitoring of Topographical Changes and Wear Rate of a Complex Tribo-system under Mixed Lubrication Wear 364 (2016)
Key-frame Based Spatiotemporal Scribble Propagation Proceedings of the Eurographics Workshop on Intelligent Cinematography and Editing