Something is wrong with the chat - it breaks after sometime

by alexaione - opened Mar 18

Discussion

alexaione

Mar 18

•

edited Mar 18

using default llama.cpp web ui, the chat works but it stops working randomly after some conversations.

Also it stops working with openwebui.

Tested with KiloCode in vscode - it does not respond at all.

on the log side, I am not sure what to look for, adding the below info (not sure if its helpful)

======================
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
slot print_timing: id 3 | task 0 |
prompt eval time = 574.50 ms / 16 tokens ( 35.91 ms per token, 27.85 tokens per second)
eval time = 355.56 ms / 26 tokens ( 13.68 ms per token, 73.12 tokens per second)
total time = 930.06 ms / 42 tokens
slot release: id 3 | task 0 | stop processing: n_tokens = 41, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-native
slot get_availabl: id 2 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 2 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 2 | task 27 | processing task, is_child = 0
slot update_slots: id 2 | task 27 | new prompt, n_ctx_slot = 25600, n_keep = 0, task.n_tokens = 9373
slot update_slots: id 2 | task 27 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 2 | task 27 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.218500
slot update_slots: id 2 | task 27 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 2 | task 27 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.437000
slot update_slots: id 2 | task 27 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 2 | task 27 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.655500
slot update_slots: id 2 | task 27 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 2 | task 27 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.874000
slot update_slots: id 2 | task 27 | n_tokens = 8192, memory_seq_rm [8192, end)
slot init_sampler: id 2 | task 27 | init sampler, took 0.72 ms, tokens: text = 9373, total = 9373
slot update_slots: id 2 | task 27 | prompt processing done, n_tokens = 9373, batch.n_tokens = 1181
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv stop: cancel task, id_task = 27
slot release: id 2 | task 27 | stop processing: n_tokens = 12938, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-native
slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.661 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 3 | task 3599 | processing task, is_child = 0
slot update_slots: id 3 | task 3599 | new prompt, n_ctx_slot = 25600, n_keep = 0, task.n_tokens = 62
slot update_slots: id 3 | task 3599 | n_tokens = 41, memory_seq_rm [41, end)
slot init_sampler: id 3 | task 3599 | init sampler, took 0.02 ms, tokens: text = 62, total = 62
slot update_slots: id 3 | task 3599 | prompt processing done, n_tokens = 62, batch.n_tokens = 21
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200

naplam

about 1 month ago

llama.cpp has introduced regressions in the parser over the past few weeks, this could be a consequence. I can't run mistral 3.2 small on the current llama code anymore (I get 500 errors from llama server, it has trouble with the template/parser), I have to use an old commit from early march. Now, the additional problem with mistral 4 is I think there were some necessary fixes recently to make it run. So you probably can't revert cleanly either.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment