Thinking tags
I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!
I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!
@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example
I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!
@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example
Thanks.. Tencent just updated their GGUF and its a little larger than yours. I can turn tags on and off with theirs. The GGUF you made I can enable reasoning but not reasoning with the thinking tag. I can disable reasoning which needs no tag. Not sure if the llama.cpp gguf conversion script was updated, or they plan to submit changes.
You could try re-quantizing based on tencent/Youtu-LLM-2B-GGUF with this command:
llama-quantize Youtu-LLM-2B-F16.gguf Youtu-LLM-2B-Q4_K_M.gguf Q4_K_M
It might help align the functionality (like the thinking tag for reasoning) with the updated version from Tencent.