"Train the Deep Learning Ahem Detector with two sets of audio files, "a negative sample with clean voice/sound" (minimum 3 minutes) and "a positive one with 'ahem' sounds concatenated" (minimum 10s) and it will detect "ahems" in any voice sample thereafter."