Instructions to use Phind/Phind-CodeLlama-34B-Python-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Phind/Phind-CodeLlama-34B-Python-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Phind/Phind-CodeLlama-34B-Python-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Phind/Phind-CodeLlama-34B-Python-v1") model = AutoModelForCausalLM.from_pretrained("Phind/Phind-CodeLlama-34B-Python-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Phind/Phind-CodeLlama-34B-Python-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Phind/Phind-CodeLlama-34B-Python-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phind/Phind-CodeLlama-34B-Python-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Phind/Phind-CodeLlama-34B-Python-v1
- SGLang
How to use Phind/Phind-CodeLlama-34B-Python-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Phind/Phind-CodeLlama-34B-Python-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phind/Phind-CodeLlama-34B-Python-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Phind/Phind-CodeLlama-34B-Python-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Phind/Phind-CodeLlama-34B-Python-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Phind/Phind-CodeLlama-34B-Python-v1 with Docker Model Runner:
docker model run hf.co/Phind/Phind-CodeLlama-34B-Python-v1
This is an absolute gem! Can't thank you enough.
My OpenAI bill is going to get so much smaller... not sure that the same can be said about my GPU compute bill :-)))
It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.
It is a completion model trained and finetuned for Python coding. It absolutely excels on HumanEval benchmark where it beats March version of GPT-4. Low ranks on other benchmarks are to be expected. Based on the HumanEval score of 69.5 it is the best OS model out there by a large margin. The closest is WizardCoder 15B with 57. I will have some tests completed by end of today.
It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.
Are you planning a release of an instruct model ?
It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.
Thanks for answering, I'm looking forward to trying it!
HumanEval for widardcoderpython 34b is 73.2 .... so looks even better ... much better.
Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.
As a noob, how would I go about downloading, installing and trying this model?
Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.
I agree the WizardCoder-Python-34B is for now the new benchmark for OS coding models. Even 15B version released a while ago was quite impressive. Phind would be more useful if it was an instruct model , for now it is just a nice experiment. It is kind of OK with shorter prompts, but as soon as you throw something longer at it, it kind of gives up... at least for me.
Thanks Phind for the model. Very helpful. What are the hardware requirements here ?
Right now, I am trying to infer with A100 in colab and it takes forever to infer.
How many A100s or H100s would you recommend using for instant inferences ?(like within 5-10 seconds)
I have rtx 3090.
On CPU I have 4 t/s
With RTX 3090 30 t/s.
