Follow
David Harwath
Title
Cited by
Cited by
Year
Unsupervised learning of spoken language with visual context
D Harwath, A Torralba, J Glass
Advances in Neural Information Processing Systems 29, 2016
2202016
Jointly discovering visual objects and spoken words from raw sensory input
D Harwath, A Recasens, D SurÝs, G Chuang, A Torralba, J Glass
Proceedings of the European conference on computer vision (ECCV), 649-665, 2018
1612018
Deep multimodal semantic embeddings for speech and images
D Harwath, J Glass
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRUá…, 2015
1292015
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition
A Jansen, E Dupoux, S Goldwater, M Johnson, S Khudanpur, K Church, ...
2013 IEEE International Conference on Acoustics, Speech and Signalá…, 2013
1052013
Learning word-like units from joint audio-visual analysis
D Harwath, JR Glass
arXiv preprint arXiv:1701.07481, 2017
1022017
Learning hierarchical discrete linguistic units from visually-grounded speech
D Harwath, WN Hsu, J Glass
arXiv preprint arXiv:1911.09602, 2019
622019
Avlnet: Learning audio-visual language representations from instructional videos
A Rouditchenko, A Boggust, D Harwath, B Chen, D Joshi, S Thomas, ...
arXiv preprint arXiv:2006.09199, 2020
442020
Vision as an interlingua: Learning multilingual semantic embeddings of untranscribed speech
D Harwath, G Chuang, J Glass
2018 IEEE International Conference on Acoustics, Speech and Signalá…, 2018
442018
Towards visually grounded sub-word speech unit discovery
D Harwath, J Glass
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech andá…, 2019
332019
Zero resource spoken audio corpus analysis
DF Harwath, TJ Hazen, JR Glass
2013 IEEE International Conference on Acoustics, Speech and Signalá…, 2013
292013
Look, Listen, and Decode: Multimodal Speech Recognition with Images
F Sun, D Harwath, J Glass
IEEE Workshop on Spoken Language Technology, 2016
272016
Learning modality-invariant representations for speech and images
K Leidal, D Harwath, J Glass
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRUá…, 2017
262017
Text-free image-to-speech synthesis using learned segmental units
WN Hsu, D Harwath, C Song, J Glass
arXiv preprint arXiv:2012.15454, 2020
212020
Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech
D Harwath, TJ Hazen
2012 IEEE International Conference on Acoustics, Speech and Signalá…, 2012
202012
Transfer learning from audio-visual grounding to speech recognition
WN Hsu, D Harwath, J Glass
arXiv preprint arXiv:1907.04355, 2019
192019
Grounding Spoken Words in Unlabeled Video.
AW Boggust, K Audhkhasi, D Joshi, D Harwath, S Thomas, RS Feris, ...
CVPR Workshops 2, 2019
162019
Trilingual semantic embeddings of visually grounded speech with self-attention mechanisms
Y Ohishi, A Kimura, T Kawanishi, K Kashino, D Harwath, J Glass
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech andá…, 2020
142020
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio.
E Azuh, D Harwath, JR Glass
INTERSPEECH, 276-280, 2019
142019
Speech recognition without a lexicon—bridging the gap between graphemic and phonetic systems
D Harwath, JR Glass
Fifteenth Annual Conference of the International Speech Communicationá…, 2014
142014
Multimodal clustering networks for self-supervised learning from unlabeled videos
B Chen, A Rouditchenko, K Duarte, H Kuehne, S Thomas, A Boggust, ...
Proceedings of the IEEE/CVF International Conference on Computer Visioná…, 2021
132021
The system can't perform the operation now. Try again later.
Articles 1–20