-
Notifications
You must be signed in to change notification settings - Fork 196
Low latency filtering: Partitioned Convolution
A longstanding plan is the modification of the main audio filtering in UHSDR from time domain filtering to Fast Convolution filtering. However, in order to obtain filters steep enough, the FFT size for the FFT-iFFT audio chain has to be at least 2048 or even 4096 (FIR filter impulse response with 1025 / 2049 coefficients). This produces an inherent delay of 170msec @24ksps sample rate, which is unacceptable for CW operators and can also be annoying for operators in other modes.
A solution to this problem has been highlighted by Warren Pratt in his HAMRADIO 2018 talk at the Software Defined Academy, which is called "Partitioned Convolution" (see also Kulp 1988, Armelloni et al. 2003). In Partitioned Convolution, the filters impulse response is partitioned into separate blocks and so are the convolutions which are performed for the separate blocks and not one big FFT for the whole impulse response.
For UHSDR running on OVI40 with the STM32F7 processor, we would like to implement Fast Convolution filtering with partitioned convolution in order to minimize filter latency while maintaining a high quality filter with steep filter skirts ("brickwall").
[the following is just notes taken from understanding wdsp, firmin.c, "Standalone Partitioned overlap-save bandpass", Pratt 2018]
Setup (repeat every time the filter is adjusted):
- calculate 2048 complex FIR filter coefficients (= impulse response) with windowing (Kaiser or Blackman-Harris 4-term)
- partition coefficients into 8 blocks of 256 coeffs
- Calculate an FFT256 of one block
- store FFT results in fmask[8][512] --> I have to carefully understand how this is done in wdsp --> half of the impulse response is discarded
Real-time filter process:
- accumulate 128 samples
- overlap 50% with previous samples
- FFT of those 256 samples
- copy FFT result into fftout[buffidx]
- k = buffidx
- repeat for j=0; j < 8; i++ {
- complex-multiply fftout[k] with fmask[j]
- accumulate result of complex-multiply in accum[512]
- k++ }
- buffidx++
- inverse FFT on accum[512]
- discard first half and take last 256 samples as output [overlap & save]
Benchmark figures could be:
- FFT size 128
- partitioned blocks nfor = 8
- no. of FIR coefficients 1024 (or 1025 ?)
- running at 24ksps
- delay of an 128-point FFT @24ksps -> 5.33msec
- --> estimated memory consumption about 35kbytes
OR:
- FFT size 256
- partitioned blocks nfor = 8
- no. of FIR coefficients 2048 (or 2049 ?)
- running at 24ksps
- delay of a 256-point FFT @24ksps -> 10.67msec
- --> estimated memory consumption about 70kbytes
OR:
- FFT size 128
- partitioned blocks nfor = 16
- no. of FIR coefficients 2048 (or 2049 ?)
- running at 24ksps
- delay of a 128-point FFT @24ksps -> 5.33msec
- --> estimated memory consumption about 70kbytes
Armenolli et al. (2013): Implementation of real-time partitioned convolution on a DSP board. - IEEE workshop on Applications of Signal Processing to Audio and Acoustics HERE
Kulp, B.D. (1988): Digital Equalization using Fourier Transform Techniques. - HERE
Pratt, W. (2018): Open source DSP library wdsp. - HERE
- Supported SDR Hardware
- UHSDR: Manuals
- mcHF: Building your own SDR
- OVI40: Building your own SDR
- UHSDR: SW Installation on SDR
- UHSDR: Theory of Operation
- UHSDR: SW Development
- UHSDR: Supported Hardware
- UHSDR: Manuals
- Building a mcHF SDR
-
Building a OVI40 SDR
- UHSDR SW Installation
- Theory of Operation
- UHSDR SW Development