Mitko Vasilev
mitkox
AI & ML interests
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
Recent Activity
posted
an
update
2 days ago
134,614 tok/sec input prefil max
1031 tokens/sec out gen max
At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue
On my GPU workstation:
- Z8 Fury G5 4x A6000
- MiniMax-M2.5
- Claude Code to localhost:8000
posted
an
update
12 days ago
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.
With local AI, I donβt have /fast CC switch, but I have /absurdlyfast:
- 100β499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707β200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; Itβs not the car. Itβs the driver.
Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.
My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
posted
an
update
22 days ago
ββββββ Claude Code v2.1.23
ββββββ Kimi-K2.5 Β· API Usage Billing
ββ ββ ~/dev/vllm
/model to try Opus 4.5
β― hey
β Hello! How can I help you today?
β― what model are you?
β I'm Claude Kimi-K2.5, running in a local environment on Linux.
Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.