Instructions to use maywell/KoMultiGen-General with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use maywell/KoMultiGen-General with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="maywell/KoMultiGen-General") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("maywell/KoMultiGen-General") model = AutoModelForCausalLM.from_pretrained("maywell/KoMultiGen-General") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use maywell/KoMultiGen-General with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "maywell/KoMultiGen-General" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maywell/KoMultiGen-General", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/maywell/KoMultiGen-General
- SGLang
How to use maywell/KoMultiGen-General with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "maywell/KoMultiGen-General" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maywell/KoMultiGen-General", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "maywell/KoMultiGen-General" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maywell/KoMultiGen-General", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use maywell/KoMultiGen-General with Docker Model Runner:
docker model run hf.co/maywell/KoMultiGen-General
RuntimeError: The size of tensor a (32) must match the size of tensor b (0) at non-singleton dimension 0 에러 관련
#1
by kurugai - opened
안녕하세요.
Text generation web ui로 maywell/KoMultiGen-General 모델을 load-in-4bit로 불러오는건 성공했는데 실제로 문장을 생성하면 아래와 같은 에러가 발생하면서 사용이 중단됩니다.
Text generation web ui의 문제라면 혹시 파이썬으로 이 모델을 구동하기위한 샘플코드를 제공해주실수 있을까요?
2024-03-21 22:38:40 text-generation-webui | 13:38:40-105185 INFO WARPERS=
2024-03-21 22:38:40 text-generation-webui | ['TemperatureLogitsWarperCustom', 'TopKLogitsWarper', 'TopPLogitsWarper']
2024-03-21 22:38:40 text-generation-webui |
2024-03-21 22:38:41 text-generation-webui | Traceback (most recent call last):
2024-03-21 22:38:41 text-generation-webui | File "/app/modules/callbacks.py", line 61, in gentask
2024-03-21 22:38:41 text-generation-webui | ret = self.mfunc(callback=_callback, *args, **self.kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/app/modules/text_generation.py", line 390, in generate_with_callback
2024-03-21 22:38:41 text-generation-webui | shared.model.generate(**kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-03-21 22:38:41 text-generation-webui | return func(*args, **kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1592, in generate
2024-03-21 22:38:41 text-generation-webui | return self.sample(
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2696, in sample
2024-03-21 22:38:41 text-generation-webui | outputs = self(
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
2024-03-21 22:38:41 text-generation-webui | return self._call_impl(*args, **kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
2024-03-21 22:38:41 text-generation-webui | return forward_call(*args, **kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
2024-03-21 22:38:41 text-generation-webui | output = module._old_forward(*args, **kwargs)
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1392, in forward
2024-03-21 22:38:41 text-generation-webui | aux_loss = load_balancing_loss_func(
2024-03-21 22:38:41 text-generation-webui | File "/venv/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 132, in load_balancing_loss_func
2024-03-21 22:38:41 text-generation-webui | tokens_per_expert = torch.sum(expert_mask.float() * expert_attention_mask, dim=0) / torch.sum(
2024-03-21 22:38:41 text-generation-webui | RuntimeError: The size of tensor a (32) must match the size of tensor b (0) at non-singleton dimension 0
2024-03-21 22:38:41 text-generation-webui | Output generated in 1.36 seconds (0.73 tokens/s, 1 tokens, context 52, seed 583721504)
3090*2 환경에서 load_in_4bit, use_double_quant로 정상 작동하는거 확인했습니다.
kurugai changed discussion status to closed