Instructions to use Phind/Phind-CodeLlama-34B-Python-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Phind/Phind-CodeLlama-34B-Python-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Phind/Phind-CodeLlama-34B-Python-v1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Phind/Phind-CodeLlama-34B-Python-v1")
model = AutoModelForCausalLM.from_pretrained("Phind/Phind-CodeLlama-34B-Python-v1")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Phind/Phind-CodeLlama-34B-Python-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Phind/Phind-CodeLlama-34B-Python-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phind/Phind-CodeLlama-34B-Python-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Phind/Phind-CodeLlama-34B-Python-v1

SGLang

How to use Phind/Phind-CodeLlama-34B-Python-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Phind/Phind-CodeLlama-34B-Python-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phind/Phind-CodeLlama-34B-Python-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Phind/Phind-CodeLlama-34B-Python-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Phind/Phind-CodeLlama-34B-Python-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Phind/Phind-CodeLlama-34B-Python-v1 with Docker Model Runner:
```
docker model run hf.co/Phind/Phind-CodeLlama-34B-Python-v1
```

This is an absolute gem! Can't thank you enough.

by pgalko - opened Aug 26, 2023

Discussion

pgalko

Aug 26, 2023

My OpenAI bill is going to get so much smaller... not sure that the same can be said about my GPU compute bill :-)))

Shinku

Aug 26, 2023

Are you sure this is a good model? Have you tested it? Because according to the results in the leaderboard, this is probably the worst llama model, even worse than 7b... or is the result so bad because it is a specialized model?

michaelroyzen

Phind org Aug 26, 2023

•

edited Aug 26, 2023

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

pgalko

Aug 26, 2023

It is a completion model trained and finetuned for Python coding. It absolutely excels on HumanEval benchmark where it beats March version of GPT-4. Low ranks on other benchmarks are to be expected. Based on the HumanEval score of 69.5 it is the best OS model out there by a large margin. The closest is WizardCoder 15B with 57. I will have some tests completed by end of today.

pgalko

Aug 26, 2023

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

Are you planning a release of an instruct model ?

Shinku

Aug 26, 2023

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

Thanks for answering, I'm looking forward to trying it!

mirek190

Aug 26, 2023

HumanEval for widardcoderpython 34b is 73.2 .... so looks even better ... much better.

srwill

Aug 26, 2023

Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.

hugsyy

Aug 26, 2023

As a noob, how would I go about downloading, installing and trying this model?

pgalko

Aug 26, 2023

This comment has been hidden

pgalko

Aug 27, 2023

Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.

I agree the WizardCoder-Python-34B is for now the new benchmark for OS coding models. Even 15B version released a while ago was quite impressive. Phind would be more useful if it was an instruct model , for now it is just a nice experiment. It is kind of OK with shorter prompts, but as soon as you throw something longer at it, it kind of gives up... at least for me.

1sf

Aug 28, 2023

•

edited Aug 28, 2023

Thanks Phind for the model. Very helpful. What are the hardware requirements here ?
Right now, I am trying to infer with A100 in colab and it takes forever to infer.
How many A100s or H100s would you recommend using for instant inferences ?(like within 5-10 seconds)

mirek190

Aug 28, 2023

I have rtx 3090.

On CPU I have 4 t/s
With RTX 3090 30 t/s.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment