|
|
| --- |
| license: apache-2.0 |
| pipeline_tag: audio-to-audio |
| tags: |
| - speech_enhancement |
| - noise_suppression |
| - real_time |
| - streaming |
| - causal |
| - onnx |
| - tflite |
| - fullband |
| --- |
| |
| # DPDFNet |
|
|
| DPDFNet is a family of **causal, single‑channel** speech enhancement models for **real‑time noise suppression**.\ |
| It builds on **DeepFilterNet2** by adding **Dual‑Path RNN (DPRNN)** blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly. |
|
|
| **Links** |
| - Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/ |
| - Paper (arXiv): https://arxiv.org/abs/2512.16420 |
| - Code (GitHub): https://github.com/ceva-ip/DPDFNet |
| - Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo |
| - Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet |
| |
| --- |
| |
| ## What’s in this repo |
| |
| - **TFLite**: `*.tflite` (root) |
| - **ONNX**: `onnx/*.onnx` |
| - **PyTorch checkpoints**: `checkpoints/*.pth` |
| |
| --- |
| |
| ## Model variants |
| |
| ### 16 kHz models |
| |
| | Model | DPRNN blocks | Params (M) | MACs (G) | |
| |---|:---:|:---:|:---:| |
| | `baseline` | 0 | 2.31 | 0.36 | |
| | `dpdfnet2` | 2 | 2.49 | 1.35 | |
| | `dpdfnet4` | 4 | 2.84 | 2.36 | |
| | `dpdfnet8` | 8 | 3.54 | 4.37 | |
| |
| ### 48 kHz fullband model |
| |
| | Model | DPRNN blocks | Params (M) | MACs (G) | |
| |---|:---:|:---:|:---:| |
| | `dpdfnet2_48khz_hr` | 2 | 2.58 | 2.42 | |
| | `dpdfnet8_48khz_hr` | 8 | 3.63 | 7.17 | |
| |
| --- |
| |
| ## Recommended inference (CPU-only, ONNX) |
| |
| ```bash |
| pip install dpdfnet |
| ``` |
| |
| ### CLI |
| |
| ```bash |
| # Enhance one file |
| dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4 |
| |
| # Enhance a directory (uses all CPU cores by default) |
| dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 |
| |
| # Enhance a directory with a fixed worker count |
| dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4 |
| |
| # Download models |
| dpdfnet download |
| dpdfnet download dpdfnet8 |
| dpdfnet download dpdfnet4 --force |
| ``` |
| |
| ### Python API |
| |
| ```python |
| import soundfile as sf |
| import dpdfnet |
| |
| # In-memory enhancement: |
| audio, sr = sf.read("noisy.wav") |
| enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4") |
| sf.write("enhanced.wav", enhanced, sr) |
|
|
| # Enhance one file: |
| out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2") |
| print(out_path) |
| |
| # Model listing: |
| for row in dpdfnet.available_models(): |
| print(row["name"], row["ready"], row["cached"]) |
| |
| # Download models: |
| dpdfnet.download() # All models |
| dpdfnet.download("dpdfnet4") # Specific model |
| ``` |
| |
| ### Real-time Microphone Enhancement |
| |
| Install `sounddevice` (not included in `dpdfnet` dependencies): |
| |
| ```bash |
| pip install sounddevice |
| ``` |
| |
| `StreamEnhancer` processes audio chunk-by-chunk, preserving RNN state across |
| calls. Any chunk size works; enhanced samples are returned as soon as enough |
| data has accumulated for the first model frame (20 ms). |
| |
| ```python |
| import numpy as np |
| import sounddevice as sd |
| import dpdfnet |
|
|
| INPUT_SR = 48000 |
| # Use one model hop (10 ms) as the block size so process() returns |
| # exactly one hop's worth of enhanced audio on every callback. |
| BLOCK_SIZE = int(INPUT_SR * 0.010) # 480 samples at 48 kHz |
| |
| enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr") |
| |
| def callback(indata, outdata, frames, time, status): |
| mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel() |
| enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR) |
| n = min(len(enhanced), frames) |
| outdata[:n, 0] = enhanced[:n] |
| if n < frames: |
| outdata[n:] = 0.0 # silence while the first window accumulates |
| |
| with sd.Stream( |
| samplerate=INPUT_SR, |
| blocksize=BLOCK_SIZE, |
| channels=1, |
| dtype="float32", |
| callback=callback, |
| ): |
| print("Enhancing microphone input - press Ctrl+C to stop") |
| try: |
| while True: |
| sd.sleep(100) |
| except KeyboardInterrupt: |
| pass |
| |
| # Optional: drain the final partial window at the end of a recording |
| tail = enhancer.flush() |
| ``` |
| |
| > [!NOTE] |
| > **Latency** |
| > The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay. |
| > |
| > **Sample rate** |
| > `StreamEnhancer` resamples internally. Pass your device's native rate as `sample_rate`; the return value is at the same rate. |
| > |
| > **Block size** |
| > Using `BLOCK_SIZE = int(SR * 0.010)` (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills. |
| > |
| > **Multiple streams** |
| > Create a separate `StreamEnhancer` per stream. Call `enhancer.reset()` between independent audio segments to clear RNN state. |
| |
| --- |
| |
| ## Citation |
| |
| ```bibtex |
| @article{rika2025dpdfnet, |
| title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN}, |
| author = {Rika, Daniel and Sapir, Nino and Gus, Ido}, |
| year = {2025} |
| } |
| ``` |
| |
| --- |
| |
| ## License |
| |
| Apache-2.0 |
| |