OpenCulture Collection A multilingual dataset of public domain books and newspapers. β’ 25 items β’ Updated 9 days ago β’ 133
Running on CPU Upgrade Featured 3.04k The Smol Training Playbook π 3.04k The secrets to building world-class LLMs
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs Mar 20, 2024 β’ 32