DPDFNet / README.md
danielr-ceva's picture
Update README.md
a2d672f verified
---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- speech_enhancement
- noise_suppression
- real_time
- streaming
- causal
- onnx
- tflite
- fullband
---
# DPDFNet
DPDFNet is a family of **causal, single‑channel** speech enhancement models for **real‑time noise suppression**.\
It builds on **DeepFilterNet2** by adding **Dual‑Path RNN (DPRNN)** blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.
**Links**
- Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/
- Paper (arXiv): https://arxiv.org/abs/2512.16420
- Code (GitHub): https://github.com/ceva-ip/DPDFNet
- Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo
- Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet
---
## What’s in this repo
- **TFLite**: `*.tflite` (root)
- **ONNX**: `onnx/*.onnx`
- **PyTorch checkpoints**: `checkpoints/*.pth`
---
## Model variants
### 16 kHz models
| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `baseline` | 0 | 2.31 | 0.36 |
| `dpdfnet2` | 2 | 2.49 | 1.35 |
| `dpdfnet4` | 4 | 2.84 | 2.36 |
| `dpdfnet8` | 8 | 3.54 | 4.37 |
### 48 kHz fullband model
| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `dpdfnet2_48khz_hr` | 2 | 2.58 | 2.42 |
| `dpdfnet8_48khz_hr` | 8 | 3.63 | 7.17 |
---
## Recommended inference (CPU-only, ONNX)
```bash
pip install dpdfnet
```
### CLI
```bash
# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4
# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2
# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4
# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force
```
### Python API
```python
import soundfile as sf
import dpdfnet
# In-memory enhancement:
audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)
# Enhance one file:
out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)
# Model listing:
for row in dpdfnet.available_models():
print(row["name"], row["ready"], row["cached"])
# Download models:
dpdfnet.download() # All models
dpdfnet.download("dpdfnet4") # Specific model
```
### Real-time Microphone Enhancement
Install `sounddevice` (not included in `dpdfnet` dependencies):
```bash
pip install sounddevice
```
`StreamEnhancer` processes audio chunk-by-chunk, preserving RNN state across
calls. Any chunk size works; enhanced samples are returned as soon as enough
data has accumulated for the first model frame (20 ms).
```python
import numpy as np
import sounddevice as sd
import dpdfnet
INPUT_SR = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010) # 480 samples at 48 kHz
enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")
def callback(indata, outdata, frames, time, status):
mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
n = min(len(enhanced), frames)
outdata[:n, 0] = enhanced[:n]
if n < frames:
outdata[n:] = 0.0 # silence while the first window accumulates
with sd.Stream(
samplerate=INPUT_SR,
blocksize=BLOCK_SIZE,
channels=1,
dtype="float32",
callback=callback,
):
print("Enhancing microphone input - press Ctrl+C to stop")
try:
while True:
sd.sleep(100)
except KeyboardInterrupt:
pass
# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()
```
> [!NOTE]
> **Latency**
> The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay.
>
> **Sample rate**
> `StreamEnhancer` resamples internally. Pass your device's native rate as `sample_rate`; the return value is at the same rate.
>
> **Block size**
> Using `BLOCK_SIZE = int(SR * 0.010)` (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills.
>
> **Multiple streams**
> Create a separate `StreamEnhancer` per stream. Call `enhancer.reset()` between independent audio segments to clear RNN state.
---
## Citation
```bibtex
@article{rika2025dpdfnet,
title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
year = {2025}
}
```
---
## License
Apache-2.0