| --- |
| license: apache-2.0 |
| tags: |
| - llm |
| - tinyllama |
| - function-calling |
| - cpu-optimized |
| - low-resource |
| --- |
| |
| # TinyLlama Function Calling (CPU Optimized) |
|
|
| This is a CPU-optimized version of TinyLlama that has been fine-tuned for function calling capabilities. |
|
|
| ## Model Details |
|
|
| - **Base Model**: TinyLlama-1.1B-Chat-v1.0 |
| - **Parameters**: 1.1 billion |
| - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
| - **Training Data**: Function calling examples from Glaive Function Calling v2 dataset |
| - **Optimization**: Merged LoRA weights, converted to float32 for CPU deployment |
|
|
| ## Key Features |
|
|
| 1. **Function Calling Capabilities**: The model can identify when functions should be called and generate appropriate function call syntax |
| 2. **CPU Optimized**: Ready to run efficiently on low-end hardware without GPUs |
| 3. **Lightweight**: Only 1.1B parameters, making it suitable for older hardware |
| 4. **Low Resource Requirements**: Requires only 4-6 GB RAM for loading |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # Load the model |
| model = AutoModelForCausalLM.from_pretrained("tinyllama-function-calling-cpu-optimized") |
| tokenizer = AutoTokenizer.from_pretrained("tinyllama-function-calling-cpu-optimized") |
| |
| # Example prompt for function calling |
| prompt = """### Instruction: |
| Given the available functions and the user query, determine which function(s) to call and with what arguments. |
| |
| Available functions: |
| { |
| "name": "get_exchange_rate", |
| "description": "Get the exchange rate between two currencies", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "base_currency": { |
| "type": "string", |
| "description": "The currency to convert from" |
| }, |
| "target_currency": { |
| "type": "string", |
| "description": "The currency to convert to" |
| } |
| }, |
| "required": [ |
| "base_currency", |
| "target_currency" |
| ] |
| } |
| } |
| |
| User query: What is the exchange rate from USD to EUR? |
| |
| ### Response:""" |
| |
| # Tokenize and generate response |
| inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=150, |
| do_sample=True, |
| temperature=0.7, |
| top_k=50, |
| top_p=0.95 |
| ) |
| |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ## Performance on Low-End Hardware |
|
|
| The CPU-optimized model requires approximately: |
| - 4-6 GB RAM for loading |
| - 2-4 CPU cores for inference |
| - No GPU required |
|
|
| This makes it suitable for: |
| - Older laptops (2018 and newer) |
| - Low-end desktops |
| - Edge devices with ARM processors |
|
|
| ## Training Process |
|
|
| The model was fine-tuned using LoRA (Low-Rank Adaptation) on the Glaive Function Calling v2 dataset. Only a subset of 50 examples was used for demonstration purposes. |
|
|
| ## License |
|
|
| This model is licensed under the Apache 2.0 license. |