Thinking tags

by twoxfh - opened Jan 2

Jan 2

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

Junrulu

Jan 6

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example

twoxfh

Jan 7

I am not seeing any thinking tags generated for llama.cpp ui to separate out using the gguf and the latest chat template from tencent. Are you able to? Really cool responses for a 2b model so far!

@twoxfh Hi, you can check server example for enable/disable the thinking mode: https://huggingface.co/tencent/Youtu-LLM-2B-GGUF#server-example

Thanks.. Tencent just updated their GGUF and its a little larger than yours. I can turn tags on and off with theirs. The GGUF you made I can enable reasoning but not reasoning with the thinking tag. I can disable reasoning which needs no tag. Not sure if the llama.cpp gguf conversion script was updated, or they plan to submit changes.

Yinsongliu

Jan 7

You could try re-quantizing based on tencent/Youtu-LLM-2B-GGUF with this command:

llama-quantize Youtu-LLM-2B-F16.gguf Youtu-LLM-2B-Q4_K_M.gguf Q4_K_M

It might help align the functionality (like the thinking tag for reasoning) with the updated version from Tencent.

twoxfh changed discussion status to closed Jan 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment