Kyle Howells

Add conversion script and update README with conversion instructions

8abe262 2 months ago

4.76 kB

	---
	license: mit
	library_name: mlx
	tags:
	- mlx
	- audio
	- speech-enhancement
	- noise-suppression
	- deepfilternet
	- apple-silicon
	base_model:
	- DeepFilterNet/DeepFilterNet
	- DeepFilterNet/DeepFilterNet2
	- DeepFilterNet/DeepFilterNet3
	pipeline_tag: audio-to-audio
	---

	# DeepFilterNet — MLX

	MLX-compatible weights for [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement framework that suppresses background noise from full-band 48 kHz audio.

	This repository contains all three model versions (v1, v2, v3), converted directly from the original PyTorch checkpoints to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon. No fine-tuning or quantization was applied — the weights are numerically identical to the originals.

	## Models

	Each version is stored in its own subfolder:

	\| Version \| Subfolder \| Weights \| Paper \|
	\|---------\|-----------\|---------\|-------\|
	\| DeepFilterNet v1 \| `v1/` \| ~7.2 MB (float32) \| [arXiv:2110.05588](https://arxiv.org/abs/2110.05588) \|
	\| DeepFilterNet v2 \| `v2/` \| ~8.9 MB (float32) \| [arXiv:2205.05474](https://arxiv.org/abs/2205.05474) \|
	\| DeepFilterNet v3 \| `v3/` \| ~8.3 MB (float32) \| [arXiv:2305.08227](https://arxiv.org/abs/2305.08227) \|

	## Model Details

	All versions share the same audio parameters:

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Sample rate \| 48 kHz \|
	\| FFT size \| 960 \|
	\| Hop size \| 480 \|
	\| ERB bands \| 32 \|
	\| DF bins \| 96 \|
	\| DF order \| 5 \|

	\| Version \| Embedding hidden dim \|
	\|---------\|---------------------\|
	\| v1 \| 512 \|
	\| v2 \| 256 \|
	\| v3 \| 256 \|

	## Files

	```
	convert_deepfilternet.py # PyTorch → MLX conversion script
	v1/
	config.json # v1 architecture configuration
	model.safetensors # v1 weights
	v2/
	config.json # v2 architecture configuration
	model.safetensors # v2 weights
	v3/
	config.json # v3 architecture configuration
	model.safetensors # v3 weights
	```

	## Usage

	### Python (mlx-audio)

	```python
	from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel

	# Load v3 (default)
	model = DeepFilterNetModel.from_pretrained("mlx-community/DeepFilterNet-mlx")

	# Load a specific version
	model = DeepFilterNetModel.from_pretrained("mlx-community/DeepFilterNet-mlx", subfolder="v1")

	# Enhance a file
	enhanced = model.enhance("noisy.wav")
	```

	### Swift (mlx-audio-swift)

	```swift
	import MLXAudioSTS

	let model = try await DeepFilterNetModel.fromPretrained("mlx-community/DeepFilterNet-mlx", subfolder: "v3")
	let enhanced = try model.enhance(audioArray)
	```

	## Converting from PyTorch

	To re-create these weights from the original DeepFilterNet checkpoints:

	```bash
	# Clone the original repo to get the pretrained checkpoints
	git clone https://github.com/Rikorose/DeepFilterNet

	# Convert each version
	python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet --output v1 --name DeepFilterNet
	python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet2 --output v2 --name DeepFilterNet2
	python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet3 --output v3 --name DeepFilterNet3
	```

	Each input directory should contain a `config.ini` and a `checkpoints/` folder from the original repo.

	Requires `torch` and `mlx` to be installed.

	## Origin

	- Original model: [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schroeter
	- License: MIT (same as the original)
	- Conversion: PyTorch → `safetensors` via `convert_deepfilternet.py`

	## Citations

	```bibtex
	@inproceedings{schroeter2022deepfilternet,
	title={{DeepFilterNet}: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering},
	author={Schr{\"o}ter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
	booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	year={2022},
	organization={IEEE}
	}

	@inproceedings{schroeter2022deepfilternet2,
	title={{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio},
	author={Schr{\"o}ter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
	booktitle={17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022)},
	year={2022},
	}

	@inproceedings{schroeter2023deepfilternet3,
	title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
	author={Schr{\"o}ter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
	booktitle={INTERSPEECH},
	year={2023}
	}
	```