Stanley Smith Stevens's scientific contributions

Citations

... Given a waveform x(t) ∈ R T at 16kHz, we compute its discrete wavelet transform (DWT) [40] with a Hann window of size 2048, and a hop size δ = 384 (i.e., a time resolution of ∆ t = 24 ms based on [34]). We then map it to 229 mel-frequency bins [41] in the 50 Hz-8000Hz range and take the logarithm, keeping an input representation as a log-mel spectrogram X(f, t ′ ) ∈ R 229×T ′ , where T ′ = T δ is the resulting "compact" time domain. ...
... Given a waveform x(t) ∈ R T at 16kHz, we compute its STFT [14] with a Hann window of size 2048, and a hop size δ=384 (i.e. a time resolution of ∆ t =24ms). We then map it to 229 mel-frequency bins [30] in the 50Hz-8000Hz range, and take the logarithm, yielding our input representation: a log-mel spectrogram X(f, t ) ∈ R 229×T , where T = T δ is the resulting "compact" time domain (see Figure 1). We also compute the first time-derivativeẊ(f, t ) := X(f, t )−X(f, t −1) and concatenate it to X, forming the CNN input. ...