Thinking tags

#1
by twoxfh - opened

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example

Thanks.. Tencent just updated their GGUF and its a little larger than yours. I can turn tags on and off with theirs. The GGUF you made I can enable reasoning but not reasoning with the thinking tag. I can disable reasoning which needs no tag. Not sure if the llama.cpp gguf conversion script was updated, or they plan to submit changes.

You could try re-quantizing based on tencent/Youtu-LLM-2B-GGUF with this command:

llama-quantize Youtu-LLM-2B-F16.gguf Youtu-LLM-2B-Q4_K_M.gguf Q4_K_M

It might help align the functionality (like the thinking tag for reasoning) with the updated version from Tencent.

twoxfh changed discussion status to closed

Sign up or log in to comment