company_logo

Speech Recognition Intern

Sony Research India

Updated on: 21 May 2025

Additional Details

Website

www.sony.com

website

Work Location

Work-from-Home

location

Job Type

Internship + Fte

job_type

Batch

2025 | 2024

batch

Stream Required

Masters in (Research) or PhD. in deep learning/machine learning

stream

Salary

40K/ Month (Expected) [Stipend]

salary

Job Description

Sony Research India is seeking a dynamic and motivated Speech Recognition Intern to join our innovative research team. As an intern, you will work on real-world problems in automatic speech recognition (ASR), focusing on improving noise robustness and reducing hallucinations in transcription outputs. You'll gain hands-on experience with state-of-the-art tools and datasets, and contribute to impactful projects alongside experienced researchers and engineers.

 

Key Responsibilities:

  • Explore and develop techniques to enhance ASR robustness under noisy, low-resource, and domain-shifted conditions.
  • Investigate hallucination phenomena in end-to-end ASR models (e.g., Whisper, Wav2Vec2, etc.) and propose mitigation strategies.
  • Conduct experiments using large-scale speech datasets and evaluate ASR performance across varying noise levels and linguistic diversity.
  • Contribute to publications, technical reports, or open-source tools as outcomes of the research.

 

Work Location:

  • Remote

 

Duration of the paid Internship:

  • This paid internship will be for a period of 6 months starting June first week of 2025.
  • 9:00 to 18:00 (Monday to Friday).

 

Qualification:

Currently pursuing/completed Masters in (Research) or Ph.D. in deep learning/machine learning with hands-on experience on Transformer models with an applications audio/speech.

 

Must Have Skills:

  • Strong programming skills in Python, and familiarity with PyTorch or TensorFlow.
  • Experience with speech processing libraries (e.g., Torchaudio, ESPnet, Hugging Face Transformers).
  • Prior experience with ASR models like Wav2Vec2, Whisper, or RNN-T is a plus.
  • Ability to read and implement academic papers.
  • Strong foundation in machine learning and signal processing.

 

Good to have skills:

  • Familiarity with prompt tuning, contrastive learning, or multi-modal architectures.
  • Experience with evaluating hallucinations or generating synthetic speech/audio perturbations.

Disclaimer: The Job Company is an independent platform dedicated to providing information about job openings. We are not affiliated with, nor do we represent, any company, agency, or agent mentioned in the job listings. Please refer to our Terms of Services for further details.

Important: If an employer asks you to pay any kind of fee, please notify us immediately. The Job company does not charge any fee from the applicants and we do not post any jobs where companies ask candidates to pay.

Click on the Apply Now button to apply for Sony Research India