About
Welcome! I am a postdoctoral researcher at Media Technology Center, ETH Zürich. I am leading the project on the development of the first voice assistant that can speak different Swiss German dialects by creating low-resourced neural machine translation and text-to-speech synthesis models (project video).
Previously, I was a joint doctoral student of Computer Science at Disney Research Studios and in Computer Graphics Laboratory at ETH Zürich, where I was advised by Markus Gross . I completed my masters studies at the Department of Electrical Engineering at EPFL, and spent time at EMPA, Disney Research Zürich, and Disney Research Pittsburgh as an intern, and University of British Columbia as a visitor.
Research
My general research interests lie in image processing, video processing, natural language processing, speech synthesis, visual-textual data alignment, computer vision. More specifically, my research is mostly about exploring the correspondances between visual, textual and audio elements.
Publications
2021
-
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German arXiv preprint
PDF / Bibtex / Relevant Video / Audio Samples / Data Access
2020
-
Enriching Video Captions With Contextual Text International Conference on Pattern Recognition 2020 (ICPR)
2019
-
Neural Sequential Phrase Grounding (SeqGROUND) Conference on Computer Vision and Pattern Recognition 2019 (CVPR)
PDF / PDF (supp.) / Bibtex
-
Controlling Motion Blur in Synthetic Long Time Exposures Eurographics 2019
PDF / PDF (supp.) / Video / Bibtex
2018
-
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) Conference on Computer Vision and Pattern Recognition 2018 (CVPR) (Spotlight)
PDF / PDF (supp.) / Bibtex
2016
-
Label-Based Automatic Alignment of Video with Narrative Sentences European Conference on Computer Vision 2016, Workshop on Web-scale Vision and Social Media
-
A Simple, Fast and Low-cost Method for in Situ Monitoring of Topographical Changes and Wear Rate of a Complex Tribo-system under Mixed Lubrication Wear 364 (2016)
2015
-
Key-frame Based Spatiotemporal Scribble Propagation Proceedings of the Eurographics Workshop on Intelligent Cinematography and Editing
Website / PDF / PDF (supp.) / Video / Bibtex
Patents
Alignment of video and textual sequences for metadata analysis US Patent No: US10558761B2, 2020 Techniques for performing contextual phrase grounding US Patent Application No: US20200272695A1, 2020