Text Generation
Transformers
Safetensors
English
Vietnamese
text-generation-inference
unsloth
qwen2
trl
conversational
Instructions to use lightontech/SeaLightSum3-Adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightontech/SeaLightSum3-Adapter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lightontech/SeaLightSum3-Adapter") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("lightontech/SeaLightSum3-Adapter", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lightontech/SeaLightSum3-Adapter with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lightontech/SeaLightSum3-Adapter" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightontech/SeaLightSum3-Adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/lightontech/SeaLightSum3-Adapter
- SGLang
How to use lightontech/SeaLightSum3-Adapter with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lightontech/SeaLightSum3-Adapter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightontech/SeaLightSum3-Adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lightontech/SeaLightSum3-Adapter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightontech/SeaLightSum3-Adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use lightontech/SeaLightSum3-Adapter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lightontech/SeaLightSum3-Adapter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lightontech/SeaLightSum3-Adapter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lightontech/SeaLightSum3-Adapter to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="lightontech/SeaLightSum3-Adapter", max_seq_length=2048, ) - Docker Model Runner
How to use lightontech/SeaLightSum3-Adapter with Docker Model Runner:
docker model run hf.co/lightontech/SeaLightSum3-Adapter
| base_model: SeaLLMs/SeaLLM3-7B-Chat | |
| language: | |
| - en | |
| - vi | |
| license: apache-2.0 | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - qwen2 | |
| - trl | |
| datasets: | |
| - lightontech/tech-viet-translation | |
| pipeline_tag: text-generation | |
| # Uploaded model | |
| - **Developed by:** lightontech | |
| - **License:** apache-2.0 | |
| - **Finetuned from model :** SeaLLMs/SeaLLM3-7B-Chat | |
| This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. | |
| [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) | |
| To use GGUF format for Llama.cpp or running in LM Studio, Jan and other local software, please refer to [lightontech/SeaLightSum3_GGUF](https://huggingface.co/lightontech/SeaLightSum3_GGUF) | |
| # How to use | |
| For faster startup, checkout the [Example notebook here](https://colab.research.google.com/drive/1h6NyOBCzSYrx-nBoRA1X40loIe2oTioA?usp=sharing) | |
| ## Install unsloth | |
| This sample use unsloth for colab, you may switch to unsloth only if you want | |
| ``` | |
| pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" | |
| pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes | |
| ``` | |
| ## Run inference | |
| ```python | |
| alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. | |
| ### Instruction: | |
| {} | |
| ### Input: | |
| {} | |
| ### Response: | |
| {}""" | |
| if True: | |
| from unsloth import FastLanguageModel | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name = "lightontech/SeaLightSum3-Adapter", # YOUR MODEL YOU USED FOR TRAINING | |
| max_seq_length = max_seq_length, | |
| dtype = dtype, | |
| load_in_4bit = load_in_4bit, | |
| ) | |
| FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference! | |
| # alpaca_prompt = You MUST copy from above! | |
| FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference! | |
| inputs = tokenizer( | |
| [ | |
| alpaca_prompt.format( | |
| "Dịch đoạn văn sau sang tiếng Việt:\nOnce you have trained a model using either the SFTTrainer, PPOTrainer, or DPOTrainer, you will have a fine-tuned model that can be used for text generation. In this section, we’ll walk through the process of loading the fine-tuned model and generating text. If you need to run an inference server with the trained model, you can explore libraries such as text-generation-inference.", # instruction | |
| "", # input | |
| "", # output - leave this blank for generation! | |
| ) | |
| ], return_tensors = "pt").to("cuda") | |
| from transformers import TextStreamer | |
| text_streamer = TextStreamer(tokenizer) | |
| _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1000) | |
| ``` |