Instructions to use TechxGenus/CodeGemma-7b-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TechxGenus/CodeGemma-7b-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TechxGenus/CodeGemma-7b-AWQ")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TechxGenus/CodeGemma-7b-AWQ")
model = AutoModelForCausalLM.from_pretrained("TechxGenus/CodeGemma-7b-AWQ")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TechxGenus/CodeGemma-7b-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TechxGenus/CodeGemma-7b-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/CodeGemma-7b-AWQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TechxGenus/CodeGemma-7b-AWQ

SGLang

How to use TechxGenus/CodeGemma-7b-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TechxGenus/CodeGemma-7b-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/CodeGemma-7b-AWQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TechxGenus/CodeGemma-7b-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TechxGenus/CodeGemma-7b-AWQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TechxGenus/CodeGemma-7b-AWQ with Docker Model Runner:
```
docker model run hf.co/TechxGenus/CodeGemma-7b-AWQ
```

Would you be willing to fine-tune a much more capable base gemma model?

by rombodawg - opened Mar 11, 2024

Discussion

rombodawg

Mar 11, 2024

Hello, i have created a much more capable base-gemma model with precise and highly refined merging techniques. The model is much higher quality than base Gemma-7b, and performs exceptionally well at coding. I thing it would be much better suited for a coding fine-tune. You can find the model linked bellow, as well as information about the model in the model card.

https://huggingface.co/rombodawg/EveryoneLLM-7b-Gemma-Base

TechxGenus

Owner Mar 12, 2024

This seems to be a merged model of many fine-tuned models, and fine-tuning again often does not achieve good results. I will update if there are any major improvements.

rombodawg

Mar 12, 2024

Thank you, i appreciate you taking this seriously. Im very confident in my merges as I've spend months perfecting my techniques and i believe they will achieve good results with a fine tune. I look forward to hearing your results. 🙂

rombodawg

Mar 12, 2024

You have to check this out @TechxGenus Massive improvements for gemma finetuning because of these findings

https://www.reddit.com/r/LocalLLaMA/comments/1bd18y8/gemma_finetuning_should_be_much_better_now/

rombodawg

Mar 12, 2024

Ive opened an official issue for transformers to implement a fix
https://github.com/huggingface/transformers/issues/29616

rombodawg

Mar 17, 2024

•

edited Mar 17, 2024

@TechxGenus I would like to share that we at Replete-Ai have created a new model called Mistral-11b-v0.1 which is an expanding on the size and pretraining on the mistral-7b model. Feel free to check it out. I would love to see a coding variant if your team is at all interested.

https://huggingface.co/Replete-AI/Mistral-11b-v0.1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment