social-post-explorers (Social Post Explorers)

ajibawa-2023

posted an update 1 day ago

Post

116

Ruby-Code-Large
Dataset : ajibawa-2023/Ruby-Code-Large

Ruby-Code-Large is a large-scale corpus of Ruby programming language source code comprising 331,743 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, web application development, and software engineering automation within the Ruby ecosystem.

By offering a substantial, language-focused dataset, Ruby-Code-Large enables targeted experimentation in dynamic programming, object-oriented design, and rapid application development—areas where Ruby is widely used, particularly in web frameworks and scripting.

Ruby-Code-Large addresses the lack of large, curated, Ruby-specific datasets, enabling focused research on expressive syntax, metaprogramming, and high-level abstractions.

ajibawa-2023

posted an update 2 days ago

Post

5781

Go-Code-Large
Dataset: ajibawa-2023/Go-Code-Large

Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.

By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend services—domains where Go is widely adopted.

Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.

2 replies

·

AdinaY

submitted a paper to Daily Papers 4 days ago

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

Paper • 2604.14531 • Published 5 days ago • 6

gagan3012

authored a paper 5 days ago

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

Paper • 2604.05083 • Published 15 days ago

fffiloni

posted an update 10 days ago

Post

3084

✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.

appvoid

posted an update 11 days ago

Post

116

Yesterday someone faked an anthropic account: https://huggingface.co/Anthropic-ai/claude
Be careful... all I'm saying.

1 reply

·

Duskfallcrew

posted an update 18 days ago

Post

176

It turns out "LARGELY RETIRING" was an Autistic response to utter burnout and not understanding our creative and technical workflows.

We apologize for any autistic rage moments that have caused confusion where our models have been shuffled and MAYBE possibly lost?

We've been busy currently building this: https://github.com/Ktiseos-Nyx/Ktiseos-Nyx-Trainer

And it actually works, not just the AI going "I TESTED IT ON MY LOCAL MACHINE" but actually tested it on a remote GPU last night!

fffiloni

posted an update 19 days ago

Post

2852

✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction

fffiloni

posted an update 20 days ago

Post

2246

AniDoc is back 🎉

I’ve fixed the Space and brought it back to life:
- ✅ Working again after being broken for a while
- ✅ Updated to Gradio 6
- ✅ Compatible with ZeroGPU
- ✅ Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc

AdinaY

submitted a paper to Daily Papers 20 days ago

KAT-Coder-V2 Technical Report

Paper • 2603.27703 • Published 22 days ago • 10

mattmdjaga

authored a paper 28 days ago

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Paper • 2603.15714 • Published Mar 16

gagan3012

authored 2 papers 28 days ago

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Paper • 2603.08501 • Published Mar 9

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

Paper • 2603.19017 • Published Mar 19 • 3

gagan3012

submitted 2 papers to Daily Papers about 1 month ago

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

Paper • 2603.19017 • Published Mar 19 • 3

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Paper • 2603.08501 • Published Mar 9

fffiloni

posted an update about 1 month ago

Post

4128

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

4 replies

·

ajibawa-2023

posted an update about 1 month ago

Post

2785

C-Code-Large
Dataset: ajibawa-2023/C-Code-Large

C-Code-Large is a large-scale corpus of C programming language source code comprising more than 4 million code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, and software engineering automation for the C ecosystem.

By offering a high-volume, language-focused dataset, C-Code-Large enables targeted experimentation in low-level programming, memory-constrained environments, and performance-critical systems, where C continues to be a dominant language.

C-Code-Large addresses the lack of large, curated, C-specific datasets, making it possible to conduct focused research on procedural programming paradigms, manual memory management, and system-level abstractions.

fffiloni

posted an update about 1 month ago

Post

494

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

AdinaY

submitted 2 papers to Daily Papers about 1 month ago

Training Language Models via Neural Cellular Automata

Paper • 2603.10055 • Published Mar 9 • 8

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Paper • 2603.05438 • Published Mar 5 • 40

AI & ML interests

Team members 852

social-post-explorers's activity