DPDFNet / README.md

Update README.md

a2d672f verified about 24 hours ago

4.78 kB


	---
	license: apache-2.0
	pipeline_tag: audio-to-audio
	tags:
	- speech_enhancement
	- noise_suppression
	- real_time
	- streaming
	- causal
	- onnx
	- tflite
	- fullband
	---

	# DPDFNet

	DPDFNet is a family of causal, single‑channel speech enhancement models for real‑time noise suppression.\
	It builds on DeepFilterNet2 by adding Dual‑Path RNN (DPRNN) blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.

	Links
	- Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/
	- Paper (arXiv): https://arxiv.org/abs/2512.16420
	- Code (GitHub): https://github.com/ceva-ip/DPDFNet
	- Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo
	- Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet

	---

	## What’s in this repo

	- TFLite: `*.tflite` (root)
	- ONNX: `onnx/*.onnx`
	- PyTorch checkpoints: `checkpoints/*.pth`

	---

	## Model variants

	### 16 kHz models

	\| Model \| DPRNN blocks \| Params (M) \| MACs (G) \|
	\|---\|:---:\|:---:\|:---:\|
	\| `baseline` \| 0 \| 2.31 \| 0.36 \|
	\| `dpdfnet2` \| 2 \| 2.49 \| 1.35 \|
	\| `dpdfnet4` \| 4 \| 2.84 \| 2.36 \|
	\| `dpdfnet8` \| 8 \| 3.54 \| 4.37 \|

	### 48 kHz fullband model

	\| Model \| DPRNN blocks \| Params (M) \| MACs (G) \|
	\|---\|:---:\|:---:\|:---:\|
	\| `dpdfnet2_48khz_hr` \| 2 \| 2.58 \| 2.42 \|
	\| `dpdfnet8_48khz_hr` \| 8 \| 3.63 \| 7.17 \|

	---

	## Recommended inference (CPU-only, ONNX)

	```bash
	pip install dpdfnet
	```

	### CLI

	```bash
	# Enhance one file
	dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4

	# Enhance a directory (uses all CPU cores by default)
	dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2

	# Enhance a directory with a fixed worker count
	dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4

	# Download models
	dpdfnet download
	dpdfnet download dpdfnet8
	dpdfnet download dpdfnet4 --force
	```

	### Python API

	```python
	import soundfile as sf
	import dpdfnet

	# In-memory enhancement:
	audio, sr = sf.read("noisy.wav")
	enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
	sf.write("enhanced.wav", enhanced, sr)

	# Enhance one file:
	out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
	print(out_path)

	# Model listing:
	for row in dpdfnet.available_models():
	print(row["name"], row["ready"], row["cached"])

	# Download models:
	dpdfnet.download() # All models
	dpdfnet.download("dpdfnet4") # Specific model
	```

	### Real-time Microphone Enhancement

	Install `sounddevice` (not included in `dpdfnet` dependencies):

	```bash
	pip install sounddevice
	```

	`StreamEnhancer` processes audio chunk-by-chunk, preserving RNN state across
	calls. Any chunk size works; enhanced samples are returned as soon as enough
	data has accumulated for the first model frame (20 ms).

	```python
	import numpy as np
	import sounddevice as sd
	import dpdfnet

	INPUT_SR = 48000
	# Use one model hop (10 ms) as the block size so process() returns
	# exactly one hop's worth of enhanced audio on every callback.
	BLOCK_SIZE = int(INPUT_SR * 0.010) # 480 samples at 48 kHz

	enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")

	def callback(indata, outdata, frames, time, status):
	mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
	enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
	n = min(len(enhanced), frames)
	outdata[:n, 0] = enhanced[:n]
	if n < frames:
	outdata[n:] = 0.0 # silence while the first window accumulates

	with sd.Stream(
	samplerate=INPUT_SR,
	blocksize=BLOCK_SIZE,
	channels=1,
	dtype="float32",
	callback=callback,
	):
	print("Enhancing microphone input - press Ctrl+C to stop")
	try:
	while True:
	sd.sleep(100)
	except KeyboardInterrupt:
	pass

	# Optional: drain the final partial window at the end of a recording
	tail = enhancer.flush()
	```

	> [!NOTE]
	> Latency
	> The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay.
	>
	> Sample rate
	> `StreamEnhancer` resamples internally. Pass your device's native rate as `sample_rate`; the return value is at the same rate.
	>
	> Block size
	> Using `BLOCK_SIZE = int(SR * 0.010)` (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills.
	>
	> Multiple streams
	> Create a separate `StreamEnhancer` per stream. Call `enhancer.reset()` between independent audio segments to clear RNN state.

	---

	## Citation

	```bibtex
	@article{rika2025dpdfnet,
	title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
	author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
	year = {2025}
	}
	```

	---

	## License

	Apache-2.0