Transcription Curriculum Learning

Improving data efficiency for large-scale transcription model training

Deepgram uses large datasets to train even larger models. While this allows for powerful end-to-end high-capacity speech transcription systems, it also is computationally expensive and restricts the efficiency of experimental R&D. I find that intentionally restricting the dataset can significantly improve the efficiency of model training between two to five times, and particularly results in substantive absolute improvements to the diminishing-returns regime of large model training. Read the writeup here