OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 25 items • Updated 9 days ago • 133
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs Mar 20, 2024 • 32