Text Generation
PEFT
Safetensors
Transformers
English
phi3
axolotl
lora
conversational
custom_code
text-generation-inference
Instructions to use DannyAI/phi4_lora_axolotl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DannyAI/phi4_lora_axolotl with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-instruct") model = PeftModel.from_pretrained(base_model, "DannyAI/phi4_lora_axolotl") - Transformers
How to use DannyAI/phi4_lora_axolotl with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DannyAI/phi4_lora_axolotl", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DannyAI/phi4_lora_axolotl", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("DannyAI/phi4_lora_axolotl", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DannyAI/phi4_lora_axolotl with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DannyAI/phi4_lora_axolotl" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DannyAI/phi4_lora_axolotl", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DannyAI/phi4_lora_axolotl
- SGLang
How to use DannyAI/phi4_lora_axolotl with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DannyAI/phi4_lora_axolotl" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DannyAI/phi4_lora_axolotl", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DannyAI/phi4_lora_axolotl" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DannyAI/phi4_lora_axolotl", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DannyAI/phi4_lora_axolotl with Docker Model Runner:
docker model run hf.co/DannyAI/phi4_lora_axolotl
| [2026-01-24 13:25:03,029] [DEBUG] [axolotl.utils.config.log_gpu_memory_usage:127] [PID:9359] baseline 0.000GB () | |
| [2026-01-24 13:25:03,032] [INFO] [axolotl.cli.config.load_cfg:259] [PID:9359] config: | |
| { | |
| "activation_offloading": false, | |
| "adapter": "lora", | |
| "axolotl_config_path": "using_axolotl/lora.yml", | |
| "base_model": "microsoft/Phi-4-mini-instruct", | |
| "base_model_config": "microsoft/Phi-4-mini-instruct", | |
| "batch_size": 8, | |
| "bf16": true, | |
| "capabilities": { | |
| "bf16": true, | |
| "compute_capability": "sm_86", | |
| "fp8": false, | |
| "n_gpu": 1, | |
| "n_node": 1 | |
| }, | |
| "chat_template": "tokenizer_default", | |
| "context_parallel_size": 1, | |
| "dataloader_num_workers": 1, | |
| "dataloader_pin_memory": true, | |
| "dataloader_prefetch_factor": 256, | |
| "dataset_num_proc": 9, | |
| "datasets": [ | |
| { | |
| "message_property_mappings": { | |
| "content": "content", | |
| "role": "role" | |
| }, | |
| "path": "DannyAI/African-History-QA-Dataset", | |
| "split": "train", | |
| "trust_remote_code": false, | |
| "type": "alpaca_chat.load_qa" | |
| } | |
| ], | |
| "ddp": false, | |
| "device": "cuda:0", | |
| "dion_rank_fraction": 1.0, | |
| "dion_rank_multiple_of": 1, | |
| "env_capabilities": { | |
| "torch_version": "2.9.1" | |
| }, | |
| "eval_batch_size": 2, | |
| "eval_causal_lm_metrics": [ | |
| "sacrebleu", | |
| "comet", | |
| "ter", | |
| "chrf" | |
| ], | |
| "eval_max_new_tokens": 128, | |
| "eval_sample_packing": false, | |
| "eval_steps": 50, | |
| "eval_strategy": "steps", | |
| "eval_table_size": 0, | |
| "experimental_skip_move_to_device": true, | |
| "fp16": false, | |
| "gradient_accumulation_steps": 4, | |
| "gradient_checkpointing": false, | |
| "hub_model_id": "DannyAI/phi4_lora_axolotl", | |
| "include_tkps": true, | |
| "is_falcon_derived_model": false, | |
| "is_llama_derived_model": false, | |
| "is_mistral_derived_model": false, | |
| "learning_rate": 2e-05, | |
| "lisa_layers_attribute": "model.layers", | |
| "load_best_model_at_end": false, | |
| "load_in_4bit": false, | |
| "load_in_8bit": false, | |
| "local_rank": 0, | |
| "logging_steps": 5, | |
| "lora_alpha": 16, | |
| "lora_dropout": 0.05, | |
| "lora_r": 8, | |
| "lora_target_modules": [ | |
| "q_proj", | |
| "v_proj", | |
| "k_proj", | |
| "o_proj" | |
| ], | |
| "loraplus_lr_embedding": 1e-06, | |
| "lr_scheduler": "cosine", | |
| "max_steps": 650, | |
| "mean_resizing_embeddings": false, | |
| "micro_batch_size": 2, | |
| "model_config_type": "phi3", | |
| "num_epochs": 1.0, | |
| "optimizer": "adamw_torch", | |
| "otel_metrics_host": "localhost", | |
| "otel_metrics_port": 8000, | |
| "output_dir": "./phi4_african_history_lora_out", | |
| "pad_to_sequence_len": true, | |
| "pretrain_multipack_attn": true, | |
| "profiler_steps_start": 0, | |
| "qlora_sharded_model_loading": false, | |
| "ray_num_workers": 1, | |
| "remove_unused_columns": false, | |
| "resources_per_worker": { | |
| "GPU": 1 | |
| }, | |
| "sample_packing": true, | |
| "sample_packing_bin_size": 200, | |
| "sample_packing_group_size": 100000, | |
| "save_only_model": false, | |
| "save_safetensors": true, | |
| "save_steps": 100, | |
| "save_strategy": "steps", | |
| "sequence_len": 2048, | |
| "shuffle_before_merging_datasets": false, | |
| "shuffle_merged_datasets": true, | |
| "skip_prepare_dataset": false, | |
| "streaming_multipack_buffer_size": 10000, | |
| "strict": false, | |
| "tensor_parallel_size": 1, | |
| "test_datasets": [ | |
| { | |
| "message_property_mappings": { | |
| "content": "content", | |
| "role": "role" | |
| }, | |
| "path": "DannyAI/African-History-QA-Dataset", | |
| "split": "validation", | |
| "trust_remote_code": false, | |
| "type": "alpaca_chat.load_qa" | |
| } | |
| ], | |
| "tiled_mlp_use_original_mlp": true, | |
| "tokenizer_config": "microsoft/Phi-4-mini-instruct", | |
| "tokenizer_save_jinja_files": true, | |
| "tokenizer_type": "AutoTokenizer", | |
| "torch_dtype": "torch.bfloat16", | |
| "train_on_inputs": false, | |
| "trl": { | |
| "log_completions": false, | |
| "mask_truncated_completions": false, | |
| "ref_model_mixup_alpha": 0.9, | |
| "ref_model_sync_steps": 64, | |
| "scale_rewards": true, | |
| "sync_ref_model": false, | |
| "use_vllm": false, | |
| "vllm_server_host": "0.0.0.0", | |
| "vllm_server_port": 8000 | |
| }, | |
| "type_of_model": "AutoModelForCausalLM", | |
| "use_otel_metrics": false, | |
| "use_ray": false, | |
| "use_wandb": true, | |
| "val_set_size": 0.0, | |
| "vllm": { | |
| "device": "auto", | |
| "dtype": "auto", | |
| "gpu_memory_utilization": 0.9, | |
| "host": "0.0.0.0", | |
| "port": 8000 | |
| }, | |
| "wandb_name": "phi4_lora_axolotl", | |
| "wandb_project": "phi4_african_history", | |
| "warmup_steps": 20, | |
| "weight_decay": 0.0, | |
| "world_size": 1 | |
| } | |
| [2026-01-24 13:25:04,559] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:9359] EOS: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:04,559] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:9359] BOS: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:04,559] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:9359] PAD: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:04,559] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:9359] UNK: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:04,560] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:481] [PID:9359] Unable to find prepared dataset in last_run_prepared/89363fb9438bda5d225c172d067e1ebf | |
| [2026-01-24 13:25:04,560] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:9359] Loading raw datasets... | |
| [2026-01-24 13:25:04,560] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:9359] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`. | |
| [2026-01-24 13:25:06,680] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:9359] Loading dataset: DannyAI/African-History-QA-Dataset with base_type: alpaca_chat.load_qa and prompt_style: None | |
| [2026-01-24 13:25:06,878] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:224] [PID:9359] min_input_len: 52 | |
| [2026-01-24 13:25:06,879] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:226] [PID:9359] max_input_len: 179 | |
| Dropping Long Sequences (>2048) (num_proc=9): 0%| | 0/2114 [00:00<?, ? Dropping Long Sequences (>2048) (num_proc=9): 11%| | 235/2114 [00:00<00: Dropping Long Sequences (>2048) (num_proc=9): 100%|β| 2114/2114 [00:00<00 | |
| Drop Samples with Zero Trainable Tokens (num_proc=9): 0%| | 0/2114 [00: Drop Samples with Zero Trainable Tokens (num_proc=9): 11%| | 235/2114 [0 Drop Samples with Zero Trainable Tokens (num_proc=9): 100%|β| 2114/2114 [ Drop Samples with Zero Trainable Tokens (num_proc=9): 100%|β| 2114/2114 [ | |
| Add position_id column (Sample Packing) (num_proc=9): 0%| | 0/2114 [00: Add position_id column (Sample Packing) (num_proc=9): 11%| | 235/2114 [0 Add position_id column (Sample Packing) (num_proc=9): 100%|β| 2114/2114 [ | |
| Saving the dataset (0/8 shards): 0%| | 0/2114 [00:00<?, ? examples/s] Saving the dataset (0/8 shards): 13%|β| 265/2114 [00:00<00:01, 1232.71 e Saving the dataset (1/8 shards): 13%|β| 265/2114 [00:00<00:01, 1232.71 e Saving the dataset (2/8 shards): 25%|β| 529/2114 [00:00<00:01, 1232.71 e Saving the dataset (3/8 shards): 38%|β| 794/2114 [00:00<00:01, 1232.71 e Saving the dataset (4/8 shards): 50%|β| 1058/2114 [00:00<00:00, 1232.71 Saving the dataset (5/8 shards): 63%|β| 1322/2114 [00:00<00:00, 1232.71 Saving the dataset (6/8 shards): 75%|β| 1586/2114 [00:00<00:00, 1232.71 Saving the dataset (7/8 shards): 88%|β| 1850/2114 [00:00<00:00, 1232.71 Saving the dataset (8/8 shards): 100%|β| 2114/2114 [00:00<00:00, 1232.71 Saving the dataset (8/8 shards): 100%|β| 2114/2114 [00:00<00:00, 5928.96 | |
| [2026-01-24 13:25:09,029] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:481] [PID:9359] Unable to find prepared dataset in last_run_prepared/1affaed26259409613b775fd6050f3a2 | |
| [2026-01-24 13:25:09,030] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:9359] Loading raw datasets... | |
| [2026-01-24 13:25:09,030] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:9359] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`. | |
| [2026-01-24 13:25:10,146] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:9359] Loading dataset: DannyAI/African-History-QA-Dataset with base_type: alpaca_chat.load_qa and prompt_style: None | |
| [2026-01-24 13:25:10,349] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:224] [PID:9359] min_input_len: 54 | |
| [2026-01-24 13:25:10,349] [INFO] [axolotl.utils.data.utils.handle_long_seq_in_dataset:226] [PID:9359] max_input_len: 169 | |
| Dropping Long Sequences (>2048) (num_proc=9): 0%| | 0/200 [00:00<?, ? e Dropping Long Sequences (>2048) (num_proc=9): 12%| | 23/200 [00:00<00:01 Dropping Long Sequences (>2048) (num_proc=9): 100%|β| 200/200 [00:00<00:0 | |
| Drop Samples with Zero Trainable Tokens (num_proc=9): 0%| | 0/200 [00:0 Drop Samples with Zero Trainable Tokens (num_proc=9): 12%| | 23/200 [00: Drop Samples with Zero Trainable Tokens (num_proc=9): 100%|β| 200/200 [00 | |
| Add position_id column (Sample Packing) (num_proc=9): 0%| | 0/200 [00:0 Add position_id column (Sample Packing) (num_proc=9): 12%| | 23/200 [00: Add position_id column (Sample Packing) (num_proc=9): 100%|β| 200/200 [00 | |
| Saving the dataset (0/1 shards): 0%| | 0/200 [00:00<?, ? examples/s] Saving the dataset (0/1 shards): 100%|β| 200/200 [00:00<00:00, 1565.14 ex Saving the dataset (1/1 shards): 100%|β| 200/200 [00:00<00:00, 1565.14 ex Saving the dataset (1/1 shards): 100%|β| 200/200 [00:00<00:00, 905.63 exa | |
| [2026-01-24 13:25:12,212] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:417] [PID:9359] total_num_tokens: 205_770 | |
| [2026-01-24 13:25:12,246] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:435] [PID:9359] `total_supervised_tokens: 94_469` | |
| [2026-01-24 13:25:12,274] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:9359] Using single process for pack_parallel, running sequentially. | |
| [2026-01-24 13:25:13,265] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:9359] Using single process for pack_parallel, running sequentially. | |
| [2026-01-24 13:25:13,472] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.20749449729919434 | |
| [2026-01-24 13:25:13,473] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:9359] Using single process for pack_parallel, running sequentially. | |
| [2026-01-24 13:25:13,678] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.20551633834838867 | |
| [2026-01-24 13:25:13,679] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:9359] Using single process for pack_parallel, running sequentially. | |
| [2026-01-24 13:25:13,870] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.19202852249145508 | |
| [2026-01-24 13:25:13,871] [DEBUG] [axolotl.utils.samplers.multipack.pack_parallel:177] [PID:9359] Using single process for pack_parallel, running sequentially. | |
| [2026-01-24 13:25:14,081] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.21112775802612305 | |
| [2026-01-24 13:25:14,120] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:9359] gather_len_batches: [51] | |
| [2026-01-24 13:25:14,120] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:494] [PID:9359] data_loader_len: 12 | |
| [2026-01-24 13:25:14,120] [INFO] [axolotl.utils.trainer.calc_sample_packing_eff_est:510] [PID:9359] sample_packing_eff_est across ranks: [0.9850356158088235] | |
| [2026-01-24 13:25:14,120] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:522] [PID:9359] sample_packing_eff_est: 0.99 | |
| [2026-01-24 13:25:14,120] [DEBUG] [axolotl.utils.trainer.calculate_total_num_steps:533] [PID:9359] total_num_steps: 12 | |
| [2026-01-24 13:25:14,121] [INFO] [axolotl.utils.data.sft._prepare_standard_dataset:121] [PID:9359] Maximum number of steps set at 12 | |
| [2026-01-24 13:25:14,169] [DEBUG] [axolotl.train.setup_model_and_tokenizer:70] [PID:9359] loading tokenizer... microsoft/Phi-4-mini-instruct | |
| [2026-01-24 13:25:15,574] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:285] [PID:9359] EOS: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:15,574] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:286] [PID:9359] BOS: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:15,574] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:287] [PID:9359] PAD: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:15,574] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:288] [PID:9359] UNK: 199999 / <|endoftext|> | |
| [2026-01-24 13:25:15,574] [DEBUG] [axolotl.train.setup_model_and_tokenizer:82] [PID:9359] Loading model | |
| [2026-01-24 13:25:15,776] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:9359] Patched Trainer.evaluation_loop with nanmean loss calculation | |
| [2026-01-24 13:25:15,781] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:9359] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation | |
| [2026-01-24 13:25:15,782] [INFO] [axolotl.loaders.patch_manager._apply_multipack_patches:345] [PID:9359] Applying multipack dataloader patch for sample packing... | |
| Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 100%|βββββββββββ| 2/2 [00:00<00:00, 66.49it/s] | |
| [2026-01-24 13:25:17,426] [INFO] [axolotl.loaders.model._configure_embedding_dtypes:347] [PID:9359] Converting modules to torch.bfloat16 | |
| [2026-01-24 13:25:18,248] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:9359] Memory usage after model load 0.000GB () | |
| trainable params: 1,572,864 || all params: 3,837,594,624 || trainable 0.0410 | |
| [2026-01-24 13:25:18,300] [DEBUG] [axolotl.loaders.model.log_gpu_memory_usage:127] [PID:9359] after adapters 0.000GB () | |
| [2026-01-24 13:25:26,968] [INFO] [axolotl.train.save_initial_configs:413] [PID:9359] Pre-saving adapter config to ./phi4_african_history_lora_out... | |
| [2026-01-24 13:25:26,968] [INFO] [axolotl.train.save_initial_configs:417] [PID:9359] Pre-saving tokenizer to ./phi4_african_history_lora_out... | |
| [2026-01-24 13:25:27,149] [INFO] [axolotl.train.save_initial_configs:422] [PID:9359] Pre-saving model config to ./phi4_african_history_lora_out... | |
| [2026-01-24 13:25:27,153] [INFO] [axolotl.train.execute_training:212] [PID:9359] Starting trainer... | |
| [2026-01-24 13:25:28,505] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.47576427459716797 | |
| [2026-01-24 13:25:28,983] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.477489709854126 | |
| [2026-01-24 13:25:29,462] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.47864699363708496 | |
| [2026-01-24 13:25:29,967] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:9359] generate_batches time: 0.5052089691162109 | |
| [2026-01-24 13:25:29,968] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:9359] gather_len_batches: [51] | |
| [34m[1mwandb[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /root/.netrc. | |
| [34m[1mwandb[0m: Currently logged in as: [33mdannyai[0m ([33mdannyai-danny-the-analyst[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin | |
| [34m[1mwandb[0m: [38;5;178mβ’Ώ[0m Waiting for wandb.init()... | |
| [Am[2K [34m[1mwandb[0m: [38;5;178mβ£»[0m setting up run 80snmd5o (0.2s) | |
| [Am[2K [34m[1mwandb[0m: [38;5;178mβ£½[0m setting up run 80snmd5o (0.2s) | |
| [Am[2K [34m[1mwandb[0m: Tracking run with wandb version 0.24.0 | |
| [34m[1mwandb[0m: Run data is saved locally in [35m[1m/workspace/wandb/run-20260124_132530-80snmd5o[0m | |
| [34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing. | |
| [34m[1mwandb[0m: Syncing run [33mphi4_lora_axolotl[0m | |
| [34m[1mwandb[0m: βοΈ View project at [34m[4mhttps://wandb.ai/dannyai-danny-the-analyst/phi4_african_history[0m | |
| [34m[1mwandb[0m: π View run at [34m[4mhttps://wandb.ai/dannyai-danny-the-analyst/phi4_african_history/runs/80snmd5o[0m | |
| [34m[1mwandb[0m: Detected [huggingface_hub.inference] in use. | |
| [34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script. | |
| [34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/ | |
| [34m[1mwandb[0m: [33mWARNING[0m Saving files without folders. If you want to preserve subdirectories pass base_path to wandb.save, i.e. wandb.save("/mnt/folder/file.h5", base_path="/mnt") | |
| [34m[1mwandb[0m: [33mWARNING[0m Symlinked 1 file into the W&B run directory; call wandb.save again to sync new files. | |
| [2026-01-24 13:25:33,022] [INFO] [axolotl.utils.callbacks.on_train_begin:757] [PID:9359] The Axolotl config has been saved to the WandB run under files. | |
| 0%| | 0/650 [00:00<?, ?it/s][2026-01-24 13:25:33,028] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|β | 2/100 [00:00<00:08, 11.46it/s][A | |
| 4%|ββ | 4/100 [00:01<00:32, 2.97it/s][A | |
| 5%|ββ | 5/100 [00:01<00:38, 2.49it/s][A | |
| 6%|βββ | 6/100 [00:02<00:40, 2.31it/s][A | |
| 7%|βββ | 7/100 [00:02<00:42, 2.19it/s][A | |
| 8%|βββ | 8/100 [00:03<00:43, 2.12it/s][A | |
| 9%|ββββ | 9/100 [00:03<00:45, 2.02it/s][A | |
| 10%|ββββ | 10/100 [00:04<00:44, 2.01it/s][A | |
| 11%|ββββ | 11/100 [00:04<00:44, 2.00it/s][A | |
| 12%|βββββ | 12/100 [00:05<00:44, 1.99it/s][A | |
| 13%|βββββ | 13/100 [00:05<00:44, 1.94it/s][A | |
| 14%|βββββ | 14/100 [00:06<00:44, 1.95it/s][A | |
| 15%|ββββββ | 15/100 [00:06<00:43, 1.95it/s][A | |
| 16%|ββββββ | 16/100 [00:07<00:42, 1.96it/s][A | |
| 17%|ββββββ | 17/100 [00:07<00:43, 1.91it/s][A | |
| 18%|βββββββ | 18/100 [00:08<00:42, 1.93it/s][A | |
| 19%|βββββββ | 19/100 [00:08<00:41, 1.94it/s][A | |
| 20%|βββββββ | 20/100 [00:09<00:41, 1.95it/s][A | |
| 21%|ββββββββ | 21/100 [00:10<00:41, 1.91it/s][A | |
| 22%|ββββββββ | 22/100 [00:10<00:40, 1.93it/s][A | |
| 23%|ββββββββ | 23/100 [00:11<00:39, 1.94it/s][A | |
| 24%|βββββββββ | 24/100 [00:11<00:39, 1.94it/s][A | |
| 25%|βββββββββ | 25/100 [00:12<00:39, 1.90it/s][A | |
| 26%|βββββββββ | 26/100 [00:12<00:38, 1.93it/s][A | |
| 27%|ββββββββββ | 27/100 [00:13<00:37, 1.93it/s][A | |
| 28%|ββββββββββ | 28/100 [00:13<00:37, 1.94it/s][A | |
| 29%|βββββββββββ | 29/100 [00:14<00:37, 1.90it/s][A | |
| 30%|βββββββββββ | 30/100 [00:14<00:36, 1.92it/s][A | |
| 31%|βββββββββββ | 31/100 [00:15<00:35, 1.93it/s][A | |
| 32%|ββββββββββββ | 32/100 [00:15<00:35, 1.94it/s][A | |
| 33%|ββββββββββββ | 33/100 [00:16<00:35, 1.90it/s][A | |
| 34%|ββββββββββββ | 34/100 [00:16<00:34, 1.92it/s][A | |
| 35%|βββββββββββββ | 35/100 [00:17<00:33, 1.93it/s][A | |
| 36%|βββββββββββββ | 36/100 [00:17<00:33, 1.94it/s][A | |
| 37%|βββββββββββββ | 37/100 [00:18<00:33, 1.90it/s][A | |
| 38%|ββββββββββββββ | 38/100 [00:18<00:32, 1.92it/s][A | |
| 39%|ββββββββββββββ | 39/100 [00:19<00:31, 1.93it/s][A | |
| 40%|ββββββββββββββ | 40/100 [00:19<00:30, 1.94it/s][A | |
| 41%|βββββββββββββββ | 41/100 [00:20<00:31, 1.90it/s][A | |
| 42%|βββββββββββββββ | 42/100 [00:20<00:30, 1.92it/s][A | |
| 43%|βββββββββββββββ | 43/100 [00:21<00:29, 1.93it/s][A | |
| 44%|ββββββββββββββββ | 44/100 [00:21<00:28, 1.94it/s][A | |
| 45%|ββββββββββββββββ | 45/100 [00:22<00:28, 1.90it/s][A | |
| 46%|ββββββββββββββββ | 46/100 [00:23<00:28, 1.92it/s][A | |
| 47%|βββββββββββββββββ | 47/100 [00:23<00:27, 1.93it/s][A | |
| 48%|βββββββββββββββββ | 48/100 [00:24<00:26, 1.94it/s][A | |
| 49%|ββββββββββββββββββ | 49/100 [00:24<00:26, 1.89it/s][A | |
| 50%|ββββββββββββββββββ | 50/100 [00:25<00:26, 1.92it/s][A | |
| 51%|ββββββββββββββββββ | 51/100 [00:25<00:25, 1.93it/s][A | |
| 52%|βββββββββββββββββββ | 52/100 [00:26<00:24, 1.93it/s][A | |
| 53%|βββββββββββββββββββ | 53/100 [00:26<00:24, 1.89it/s][A | |
| 54%|βββββββββββββββββββ | 54/100 [00:27<00:24, 1.92it/s][A | |
| 55%|ββββββββββββββββββββ | 55/100 [00:27<00:23, 1.93it/s][A | |
| 56%|ββββββββββββββββββββ | 56/100 [00:28<00:22, 1.93it/s][A | |
| 57%|ββββββββββββββββββββ | 57/100 [00:28<00:22, 1.89it/s][A | |
| 58%|βββββββββββββββββββββ | 58/100 [00:29<00:21, 1.92it/s][A | |
| 59%|βββββββββββββββββββββ | 59/100 [00:29<00:21, 1.93it/s][A | |
| 60%|βββββββββββββββββββββ | 60/100 [00:30<00:20, 1.93it/s][A | |
| 61%|ββββββββββββββββββββββ | 61/100 [00:30<00:20, 1.89it/s][A | |
| 62%|ββββββββββββββββββββββ | 62/100 [00:31<00:19, 1.91it/s][A | |
| 63%|ββββββββββββββββββββββ | 63/100 [00:31<00:19, 1.92it/s][A | |
| 64%|βββββββββββββββββββββββ | 64/100 [00:32<00:18, 1.93it/s][A | |
| 65%|βββββββββββββββββββββββ | 65/100 [00:32<00:18, 1.89it/s][A | |
| 66%|βββββββββββββββββββββββ | 66/100 [00:33<00:17, 1.91it/s][A | |
| 67%|ββββββββββββββββββββββββ | 67/100 [00:33<00:17, 1.92it/s][A | |
| 68%|ββββββββββββββββββββββββ | 68/100 [00:34<00:16, 1.93it/s][A | |
| 69%|βββββββββββββββββββββββββ | 69/100 [00:35<00:16, 1.89it/s][A | |
| 70%|βββββββββββββββββββββββββ | 70/100 [00:35<00:15, 1.91it/s][A | |
| 71%|βββββββββββββββββββββββββ | 71/100 [00:36<00:15, 1.92it/s][A | |
| 72%|ββββββββββββββββββββββββββ | 72/100 [00:36<00:14, 1.92it/s][A | |
| 73%|ββββββββββββββββββββββββββ | 73/100 [00:37<00:14, 1.89it/s][A | |
| 74%|ββββββββββββββββββββββββββ | 74/100 [00:37<00:13, 1.91it/s][A | |
| 75%|βββββββββββββββββββββββββββ | 75/100 [00:38<00:13, 1.92it/s][A | |
| 76%|βββββββββββββββββββββββββββ | 76/100 [00:38<00:12, 1.92it/s][A | |
| 77%|βββββββββββββββββββββββββββ | 77/100 [00:39<00:12, 1.88it/s][A | |
| 78%|ββββββββββββββββββββββββββββ | 78/100 [00:39<00:11, 1.91it/s][A | |
| 79%|ββββββββββββββββββββββββββββ | 79/100 [00:40<00:10, 1.92it/s][A | |
| 80%|ββββββββββββββββββββββββββββ | 80/100 [00:40<00:10, 1.92it/s][A | |
| 81%|βββββββββββββββββββββββββββββ | 81/100 [00:41<00:10, 1.88it/s][A | |
| 82%|βββββββββββββββββββββββββββββ | 82/100 [00:41<00:09, 1.90it/s][A | |
| 83%|βββββββββββββββββββββββββββββ | 83/100 [00:42<00:08, 1.91it/s][A | |
| 84%|ββββββββββββββββββββββββββββββ | 84/100 [00:42<00:08, 1.92it/s][A | |
| 85%|ββββββββββββββββββββββββββββββ | 85/100 [00:43<00:07, 1.88it/s][A | |
| 86%|ββββββββββββββββββββββββββββββ | 86/100 [00:43<00:07, 1.90it/s][A | |
| 87%|βββββββββββββββββββββββββββββββ | 87/100 [00:44<00:06, 1.91it/s][A | |
| 88%|βββββββββββββββββββββββββββββββ | 88/100 [00:45<00:06, 1.92it/s][A | |
| 89%|ββββββββββββββββββββββββββββββββ | 89/100 [00:45<00:05, 1.88it/s][A | |
| 90%|ββββββββββββββββββββββββββββββββ | 90/100 [00:46<00:05, 1.90it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββ | 91/100 [00:46<00:04, 1.91it/s][A | |
| 92%|βββββββββββββββββββββββββββββββββ | 92/100 [00:47<00:04, 1.92it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββ | 93/100 [00:47<00:03, 1.88it/s][A | |
| 94%|βββββββββββββββββββββββββββββββββ | 94/100 [00:48<00:03, 1.90it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββ | 95/100 [00:48<00:02, 1.91it/s][A | |
| 96%|ββββββββββββββββββββββββββββββββββ | 96/100 [00:49<00:02, 1.92it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββ | 97/100 [00:49<00:01, 1.88it/s][A | |
| 98%|βββββββββββββββββββββββββββββββββββ| 98/100 [00:50<00:01, 1.90it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββ| 99/100 [00:50<00:00, 1.91it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:51<00:00, 1.85it/s][A | |
| [A{'eval_loss': 2.118363380432129, 'eval_runtime': 55.7531, 'eval_samples_per_second': 3.587, 'eval_steps_per_second': 1.794, 'eval_ppl': 8.31751, 'memory/max_active (GiB)': 14.82, 'memory/max_allocated (GiB)': 14.82, 'memory/device_reserved (GiB)': 15.37, 'epoch': 0} | |
| 0%| | 0/650 [00:55<?, ?it/s] | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:51<00:00, 1.85it/s][A | |
| [A 0%| | 1/650 [01:05<11:52:51, 65.90s/it] 0%| | 2/650 [01:14<5:45:12, 31.96s/it] 0%|β | 3/650 [01:22<3:47:42, 21.12s/it] 1%|β | 4/650 [01:30<2:52:35, 16.03s/it] 1%|β | 5/650 [01:38<2:22:09, 13.22s/it] {'loss': 6.0638, 'grad_norm': 0.5395623445510864, 'learning_rate': 4.000000000000001e-06, 'ppl': 430.00636, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 215.98207092285156, 'tokens/total': 81920, 'tokens/trainable': 36915, 'epoch': 0.39} | |
| 1%|β | 5/650 [01:38<2:22:09, 13.22s/it] 1%|β | 6/650 [01:47<2:03:51, 11.54s/it] 1%|β | 7/650 [01:55<1:52:10, 10.47s/it] 1%|β | 8/650 [02:03<1:44:28, 9.76s/it] 1%|β | 9/650 [02:11<1:39:17, 9.29s/it] 2%|β | 10/650 [02:20<1:35:45, 8.98s/it] {'loss': 6.0041, 'grad_norm': 0.5103967785835266, 'learning_rate': 9e-06, 'ppl': 405.08625, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 218.7120819091797, 'tokens/total': 163840, 'tokens/trainable': 74494, 'epoch': 0.78} | |
| 2%|β | 10/650 [02:20<1:35:45, 8.98s/it] 2%|β | 11/650 [02:28<1:33:17, 8.76s/it] 2%|β | 12/650 [02:36<1:31:29, 8.60s/it] 2%|β | 13/650 [02:42<1:23:44, 7.89s/it] 2%|β | 14/650 [02:52<1:30:40, 8.55s/it] 2%|β | 15/650 [03:01<1:29:34, 8.46s/it] {'loss': 6.0541, 'grad_norm': 0.6589650511741638, 'learning_rate': 1.4e-05, 'ppl': 425.85546, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 198.2006378173828, 'tokens/total': 241664, 'tokens/trainable': 109022, 'epoch': 1.16} | |
| 2%|β | 15/650 [03:01<1:29:34, 8.46s/it] 2%|β | 16/650 [03:09<1:28:47, 8.40s/it] 3%|β | 17/650 [03:17<1:28:11, 8.36s/it] 3%|β | 18/650 [03:26<1:27:44, 8.33s/it] 3%|β | 19/650 [03:34<1:27:22, 8.31s/it] 3%|β | 20/650 [03:42<1:27:04, 8.29s/it] {'loss': 6.0239, 'grad_norm': 0.7923386096954346, 'learning_rate': 1.9e-05, 'ppl': 413.18689, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 237.59417724609375, 'tokens/total': 323584, 'tokens/trainable': 146859, 'epoch': 1.55} | |
| 3%|β | 20/650 [03:42<1:27:04, 8.29s/it] 3%|β | 21/650 [03:50<1:26:50, 8.28s/it] 3%|β | 22/650 [03:59<1:26:34, 8.27s/it] 4%|ββ | 23/650 [04:07<1:26:24, 8.27s/it] 4%|ββ | 24/650 [04:15<1:26:14, 8.27s/it] 4%|ββ | 25/650 [04:23<1:26:06, 8.27s/it] {'loss': 5.9298, 'grad_norm': 0.9688937664031982, 'learning_rate': 1.9998010727705237e-05, 'ppl': 376.07929, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 212.3710479736328, 'tokens/total': 405504, 'tokens/trainable': 183924, 'epoch': 1.94} | |
| 4%|ββ | 25/650 [04:23<1:26:06, 8.27s/it] 4%|ββ | 26/650 [04:30<1:19:40, 7.66s/it] 4%|ββ | 27/650 [04:40<1:27:26, 8.42s/it] 4%|ββ | 28/650 [04:48<1:26:49, 8.38s/it] 4%|ββ | 29/650 [04:56<1:26:19, 8.34s/it] 5%|ββ | 30/650 [05:05<1:25:56, 8.32s/it] {'loss': 5.8764, 'grad_norm': 1.120764136314392, 'learning_rate': 1.9989930665413148e-05, 'ppl': 356.52344, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 214.12718200683594, 'tokens/total': 483328, 'tokens/trainable': 218753, 'epoch': 2.31} | |
| 5%|ββ | 30/650 [05:05<1:25:56, 8.32s/it] 5%|ββ | 31/650 [05:13<1:25:38, 8.30s/it] 5%|ββ | 32/650 [05:21<1:25:24, 8.29s/it] 5%|ββ | 33/650 [05:29<1:25:11, 8.28s/it] 5%|ββ | 34/650 [05:38<1:24:59, 8.28s/it] 5%|ββ | 35/650 [05:46<1:24:45, 8.27s/it] {'loss': 5.8105, 'grad_norm': 1.2133527994155884, 'learning_rate': 1.9975640502598243e-05, 'ppl': 333.78598, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 220.15711975097656, 'tokens/total': 565248, 'tokens/trainable': 256149, 'epoch': 2.71} | |
| 5%|ββ | 35/650 [05:46<1:24:45, 8.27s/it] 6%|ββ | 36/650 [05:54<1:24:34, 8.27s/it] 6%|ββ | 37/650 [06:02<1:24:24, 8.26s/it] 6%|ββ | 38/650 [06:11<1:24:15, 8.26s/it] 6%|ββ | 39/650 [06:17<1:17:55, 7.65s/it] 6%|ββ | 40/650 [06:27<1:25:27, 8.41s/it] {'loss': 5.6921, 'grad_norm': 1.2973392009735107, 'learning_rate': 1.995514912254015e-05, 'ppl': 296.51565, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 220.37948608398438, 'tokens/total': 643072, 'tokens/trainable': 290596, 'epoch': 3.08} | |
| 6%|ββ | 40/650 [06:27<1:25:27, 8.41s/it] 6%|ββ | 41/650 [06:35<1:24:52, 8.36s/it] 6%|βββ | 42/650 [06:44<1:24:24, 8.33s/it] 7%|βββ | 43/650 [06:52<1:24:02, 8.31s/it] 7%|βββ | 44/650 [07:00<1:23:44, 8.29s/it] 7%|βββ | 45/650 [07:08<1:23:30, 8.28s/it] {'loss': 5.5982, 'grad_norm': 1.3150535821914673, 'learning_rate': 1.9928469263418376e-05, 'ppl': 269.94008, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 223.76809692382812, 'tokens/total': 724992, 'tokens/trainable': 328311, 'epoch': 3.47} | |
| 7%|βββ | 45/650 [07:08<1:23:30, 8.28s/it] 7%|βββ | 46/650 [07:17<1:23:18, 8.28s/it] 7%|βββ | 47/650 [07:25<1:23:05, 8.27s/it] 7%|βββ | 48/650 [07:33<1:22:54, 8.26s/it] 8%|βββ | 49/650 [07:41<1:22:43, 8.26s/it] 8%|βββ | 50/650 [07:50<1:22:33, 8.26s/it] {'loss': 5.394, 'grad_norm': 1.2739847898483276, 'learning_rate': 1.9895617510393773e-05, 'ppl': 220.08196, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.82, 'tokens/train_per_sec_per_gpu': 209.72315979003906, 'tokens/total': 806912, 'tokens/trainable': 365622, 'epoch': 3.86} | |
| 8%|βββ | 50/650 [07:50<1:22:33, 8.26s/it][2026-01-24 13:33:23,098] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|β | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|β | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|ββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|ββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|βββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββ | 10/100 [00:05<00:48, 1.87it/s][A | |
| 11%|ββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|βββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|βββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|ββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|βββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|ββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|βββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|βββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|ββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|ββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|βββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|ββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|ββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|βββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|βββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|ββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|βββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|ββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|βββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|ββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|ββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|βββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|ββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|ββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|βββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|βββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|ββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|βββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|ββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|βββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|βββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|ββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|βββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|ββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|ββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|βββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|βββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|ββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|βββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|ββββββββββββββββββββββββββββββββ | 89/100 [00:46<00:05, 1.86it/s][A | |
| 90%|ββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|βββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|βββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|ββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|βββββββββββββββββββββββββββββββββββ| 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.83it/s][A | |
| [A{'eval_loss': 2.1003942489624023, 'eval_runtime': 53.9477, 'eval_samples_per_second': 3.707, 'eval_steps_per_second': 1.854, 'eval_ppl': 8.16939, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.82, 'epoch': 3.86, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 8%|βββ | 50/650 [08:44<1:22:33, 8.26s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.83it/s][A | |
| [A 8%|βββ | 51/650 [08:52<4:04:05, 24.45s/it] 8%|βββ | 52/650 [08:58<3:09:14, 18.99s/it] 8%|βββ | 53/650 [09:08<2:42:38, 16.35s/it] 8%|βββ | 54/650 [09:16<2:15:29, 13.64s/it] 8%|βββ | 55/650 [09:24<1:59:13, 12.02s/it] {'loss': 5.3139, 'grad_norm': 1.290380597114563, 'learning_rate': 1.985661428529863e-05, 'ppl': 203.14094, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 238.74021911621094, 'tokens/total': 884736, 'tokens/trainable': 400183, 'epoch': 4.24} | |
| 8%|βββ | 55/650 [09:24<1:59:13, 12.02s/it] 9%|βββ | 56/650 [09:32<1:47:47, 10.89s/it] 9%|βββ | 57/650 [09:40<1:39:49, 10.10s/it] 9%|βββ | 58/650 [09:48<1:31:27, 9.27s/it] 9%|βββ | 59/650 [09:56<1:28:15, 8.96s/it] 9%|βββ | 60/650 [10:04<1:25:58, 8.74s/it] {'loss': 5.2056, 'grad_norm': 1.3334054946899414, 'learning_rate': 1.9811483833941726e-05, 'ppl': 182.29021, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 226.33343505859375, 'tokens/total': 966656, 'tokens/trainable': 437640, 'epoch': 4.63} | |
| 9%|βββ | 60/650 [10:04<1:25:58, 8.74s/it] 9%|βββ | 61/650 [10:12<1:24:22, 8.59s/it] 10%|ββββ | 62/650 [10:21<1:23:11, 8.49s/it] 10%|ββββ | 63/650 [10:29<1:22:19, 8.41s/it] 10%|ββββ | 64/650 [10:37<1:21:39, 8.36s/it] 10%|ββββ | 65/650 [10:43<1:15:16, 7.72s/it] {'loss': 5.0582, 'grad_norm': 1.161880612373352, 'learning_rate': 1.9760254211036245e-05, 'ppl': 157.30711, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 187.5677032470703, 'tokens/total': 1044480, 'tokens/trainable': 472345, 'epoch': 5.0} | |
| 10%|ββββ | 65/650 [10:43<1:15:16, 7.72s/it] 10%|ββββ | 66/650 [10:53<1:22:08, 8.44s/it] 10%|ββββ | 67/650 [11:02<1:21:26, 8.38s/it] 10%|ββββ | 68/650 [11:10<1:20:55, 8.34s/it] 11%|ββββ | 69/650 [11:18<1:20:30, 8.31s/it] 11%|ββββ | 70/650 [11:26<1:20:12, 8.30s/it] {'loss': 4.9713, 'grad_norm': 1.14654541015625, 'learning_rate': 1.9702957262759964e-05, 'ppl': 144.21424, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 228.5047607421875, 'tokens/total': 1126400, 'tokens/trainable': 509679, 'epoch': 5.39} | |
| 11%|ββββ | 70/650 [11:26<1:20:12, 8.30s/it] 11%|ββββ | 71/650 [11:35<1:19:56, 8.28s/it] 11%|ββββ | 72/650 [11:43<1:19:42, 8.27s/it] 11%|ββββ | 73/650 [11:51<1:19:28, 8.27s/it] 11%|ββββ | 74/650 [11:59<1:19:17, 8.26s/it] 12%|ββββ | 75/650 [12:08<1:19:07, 8.26s/it] {'loss': 4.8927, 'grad_norm': 1.1394518613815308, 'learning_rate': 1.9639628606958535e-05, 'ppl': 133.31303, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 227.03880310058594, 'tokens/total': 1208320, 'tokens/trainable': 546748, 'epoch': 5.78} | |
| 12%|ββββ | 75/650 [12:08<1:19:07, 8.26s/it] 12%|ββββ | 76/650 [12:16<1:18:58, 8.26s/it] 12%|ββββ | 77/650 [12:24<1:18:51, 8.26s/it] 12%|ββββ | 78/650 [12:30<1:12:58, 7.66s/it] 12%|ββββ | 79/650 [12:40<1:19:33, 8.36s/it] 12%|ββββ | 80/650 [12:49<1:19:05, 8.33s/it] {'loss': 4.8296, 'grad_norm': 1.182919979095459, 'learning_rate': 1.9570307611004124e-05, 'ppl': 125.16089, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 229.55503845214844, 'tokens/total': 1286144, 'tokens/trainable': 581786, 'epoch': 6.16} | |
| 12%|ββββ | 80/650 [12:49<1:19:05, 8.33s/it] 12%|ββββ | 81/650 [12:57<1:18:45, 8.31s/it] 13%|βββββ | 82/650 [13:05<1:18:28, 8.29s/it] 13%|βββββ | 83/650 [13:13<1:18:13, 8.28s/it] 13%|βββββ | 84/650 [13:22<1:18:00, 8.27s/it] 13%|βββββ | 85/650 [13:30<1:17:50, 8.27s/it] {'loss': 4.7205, 'grad_norm': 1.1635547876358032, 'learning_rate': 1.9495037367323264e-05, 'ppl': 112.22435, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 238.2696990966797, 'tokens/total': 1368064, 'tokens/trainable': 619252, 'epoch': 6.55} | |
| 13%|βββββ | 85/650 [13:30<1:17:50, 8.27s/it] 13%|βββββ | 86/650 [13:38<1:17:42, 8.27s/it] 13%|βββββ | 87/650 [13:46<1:17:33, 8.26s/it] 14%|βββββ | 88/650 [13:55<1:17:23, 8.26s/it] 14%|βββββ | 89/650 [14:03<1:17:15, 8.26s/it] 14%|βββββ | 90/650 [14:11<1:17:04, 8.26s/it] {'loss': 4.6447, 'grad_norm': 1.1828676462173462, 'learning_rate': 1.9413864666609036e-05, 'ppl': 104.03215, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 216.6095428466797, 'tokens/total': 1449984, 'tokens/trainable': 656250, 'epoch': 6.94} | |
| 14%|βββββ | 90/650 [14:11<1:17:04, 8.26s/it] 14%|βββββ | 91/650 [14:18<1:11:20, 7.66s/it] 14%|βββββ | 92/650 [14:28<1:18:02, 8.39s/it] 14%|βββββ | 93/650 [14:36<1:17:33, 8.35s/it] 14%|βββββ | 94/650 [14:44<1:17:10, 8.33s/it] 15%|βββββ | 95/650 [14:52<1:16:51, 8.31s/it] {'loss': 4.6145, 'grad_norm': 1.196416974067688, 'learning_rate': 1.9326839968734278e-05, 'ppl': 100.93735, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 227.1728973388672, 'tokens/total': 1527808, 'tokens/trainable': 691447, 'epoch': 7.31} | |
| 15%|βββββ | 95/650 [14:52<1:16:51, 8.31s/it] 15%|βββββ | 96/650 [15:01<1:16:35, 8.30s/it] 15%|βββββ | 97/650 [15:09<1:16:20, 8.28s/it] 15%|βββββ | 98/650 [15:17<1:16:06, 8.27s/it] 15%|βββββ | 99/650 [15:25<1:15:56, 8.27s/it] 15%|βββββ | 100/650 [15:34<1:15:46, 8.27s/it] {'loss': 4.4484, 'grad_norm': 1.2008439302444458, 'learning_rate': 1.9234017371383946e-05, 'ppl': 85.49005, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 229.3255615234375, 'tokens/total': 1609728, 'tokens/trainable': 728444, 'epoch': 7.71} | |
| 15%|βββββ | 100/650 [15:34<1:15:46, 8.27s/it][2026-01-24 13:41:07,232] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|β | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|β | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|ββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|ββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|βββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|ββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|βββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|βββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|ββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|βββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|ββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|βββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|βββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|ββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|ββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|βββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|ββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|ββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|βββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|βββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|ββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|βββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|ββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|βββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|ββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|ββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|βββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|ββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|ββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|βββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|βββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|ββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|βββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|ββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|βββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|βββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|ββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|βββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|ββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|ββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|βββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|βββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|ββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|βββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|ββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|ββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|βββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|βββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|ββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|βββββββββββββββββββββββββββββββββββ| 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.83it/s][A | |
| [A{'eval_loss': 2.0366907119750977, 'eval_runtime': 53.964, 'eval_samples_per_second': 3.706, 'eval_steps_per_second': 1.853, 'eval_ppl': 7.6652, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.84, 'epoch': 7.71, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 15%|βββββ | 100/650 [16:28<1:15:46, 8.27s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.83it/s][A | |
| [A[2026-01-24 13:42:01,206] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-100 | |
| 16%|βββββ | 101/650 [16:37<3:46:01, 24.70s/it] 16%|βββββ | 102/650 [16:45<3:00:31, 19.77s/it] 16%|βββββ | 103/650 [16:53<2:28:42, 16.31s/it] 16%|βββββ | 104/650 [17:00<2:00:58, 13.29s/it] 16%|ββββββ | 105/650 [17:10<1:52:23, 12.37s/it] {'loss': 4.3593, 'grad_norm': 1.1998684406280518, 'learning_rate': 1.913545457642601e-05, 'ppl': 78.20237, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 239.9269256591797, 'tokens/total': 1687552, 'tokens/trainable': 763330, 'epoch': 8.08} | |
| 16%|ββββββ | 105/650 [17:10<1:52:23, 12.37s/it] 16%|ββββββ | 106/650 [17:18<1:41:00, 11.14s/it] 16%|ββββββ | 107/650 [17:26<1:32:57, 10.27s/it] 17%|ββββββ | 108/650 [17:34<1:27:18, 9.66s/it] 17%|ββββββ | 109/650 [17:43<1:23:19, 9.24s/it] 17%|ββββββ | 110/650 [17:51<1:20:31, 8.95s/it] {'loss': 4.3202, 'grad_norm': 1.1768964529037476, 'learning_rate': 1.903121285404192e-05, 'ppl': 75.20367, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 234.99725341796875, 'tokens/total': 1769472, 'tokens/trainable': 800603, 'epoch': 8.47} | |
| 17%|ββββββ | 110/650 [17:51<1:20:31, 8.95s/it] 17%|ββββββ | 111/650 [17:59<1:18:31, 8.74s/it] 17%|ββββββ | 112/650 [18:08<1:17:02, 8.59s/it] 17%|ββββββ | 113/650 [18:16<1:16:00, 8.49s/it] 18%|ββββββ | 114/650 [18:24<1:15:12, 8.42s/it] 18%|ββββββ | 115/650 [18:32<1:14:37, 8.37s/it] {'loss': 4.2426, 'grad_norm': 1.25551438331604, 'learning_rate': 1.8921357004638837e-05, 'ppl': 69.58855, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 223.63258361816406, 'tokens/total': 1851392, 'tokens/trainable': 837677, 'epoch': 8.86} | |
| 18%|ββββββ | 115/650 [18:32<1:14:37, 8.37s/it] 18%|ββββββ | 116/650 [18:41<1:14:11, 8.34s/it] 18%|ββββββ | 117/650 [18:47<1:08:30, 7.71s/it] 18%|ββββββ | 118/650 [18:57<1:14:39, 8.42s/it] 18%|ββββββ | 119/650 [19:05<1:14:03, 8.37s/it] 18%|ββββββ | 120/650 [19:13<1:13:38, 8.34s/it] {'loss': 4.2269, 'grad_norm': 1.2445223331451416, 'learning_rate': 1.880595531856738e-05, 'ppl': 68.50454, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 202.2104034423828, 'tokens/total': 1929216, 'tokens/trainable': 873147, 'epoch': 9.24} | |
| 18%|ββββββ | 120/650 [19:13<1:13:38, 8.34s/it] 19%|ββββββ | 121/650 [19:22<1:13:17, 8.31s/it] 19%|ββββββ | 122/650 [19:30<1:12:58, 8.29s/it] 19%|ββββββ | 123/650 [19:38<1:12:44, 8.28s/it] 19%|ββββββ | 124/650 [19:46<1:12:30, 8.27s/it] 19%|βββββββ | 125/650 [19:55<1:12:19, 8.27s/it] {'loss': 4.1258, 'grad_norm': 1.2367068529129028, 'learning_rate': 1.868507953366989e-05, 'ppl': 61.91732, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 223.10665893554688, 'tokens/total': 2011136, 'tokens/trainable': 910155, 'epoch': 9.63} | |
| 19%|βββββββ | 125/650 [19:55<1:12:19, 8.27s/it] 19%|βββββββ | 126/650 [20:03<1:12:09, 8.26s/it] 20%|βββββββ | 127/650 [20:11<1:11:58, 8.26s/it] 20%|βββββββ | 128/650 [20:19<1:11:51, 8.26s/it] 20%|βββββββ | 129/650 [20:28<1:11:41, 8.26s/it] 20%|βββββββ | 130/650 [20:34<1:06:21, 7.66s/it] {'loss': 3.9942, 'grad_norm': 1.3809951543807983, 'learning_rate': 1.855880479068559e-05, 'ppl': 54.2824, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 225.6050567626953, 'tokens/total': 2088960, 'tokens/trainable': 944690, 'epoch': 10.0} | |
| 20%|βββββββ | 130/650 [20:34<1:06:21, 7.66s/it] 20%|βββββββ | 131/650 [20:44<1:12:41, 8.40s/it] 20%|βββββββ | 132/650 [20:52<1:12:10, 8.36s/it] 20%|βββββββ | 133/650 [21:01<1:11:44, 8.33s/it] 21%|βββββββ | 134/650 [21:09<1:11:25, 8.31s/it] 21%|βββββββ | 135/650 [21:17<1:11:10, 8.29s/it] {'loss': 3.9903, 'grad_norm': 1.2650110721588135, 'learning_rate': 1.8427209586540392e-05, 'ppl': 54.07111, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 212.85777282714844, 'tokens/total': 2170880, 'tokens/trainable': 982000, 'epoch': 10.39} | |
| 21%|βββββββ | 135/650 [21:17<1:11:10, 8.29s/it] 21%|βββββββ | 136/650 [21:24<1:08:34, 8.00s/it] 21%|βββββββ | 137/650 [21:33<1:09:06, 8.08s/it] 21%|βββββββ | 138/650 [21:41<1:09:24, 8.13s/it] 21%|βββββββ | 139/650 [21:49<1:09:35, 8.17s/it] 22%|βββββββ | 140/650 [21:57<1:09:38, 8.19s/it] {'loss': 3.8609, 'grad_norm': 1.3191193342208862, 'learning_rate': 1.8290375725550417e-05, 'ppl': 47.50809, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 233.4993133544922, 'tokens/total': 2252800, 'tokens/trainable': 1019240, 'epoch': 10.78} | |
| 22%|βββββββ | 140/650 [21:57<1:09:38, 8.19s/it] 22%|βββββββ | 141/650 [22:06<1:09:39, 8.21s/it] 22%|βββββββ | 142/650 [22:14<1:09:37, 8.22s/it] 22%|βββββββ | 143/650 [22:20<1:04:28, 7.63s/it] 22%|βββββββ | 144/650 [22:30<1:10:22, 8.35s/it] 22%|ββββββββ | 145/650 [22:38<1:10:01, 8.32s/it] {'loss': 3.8391, 'grad_norm': 1.24160635471344, 'learning_rate': 1.8148388268569453e-05, 'ppl': 46.48362, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 240.01336669921875, 'tokens/total': 2330624, 'tokens/trainable': 1054003, 'epoch': 11.16} | |
| 22%|ββββββββ | 145/650 [22:38<1:10:01, 8.32s/it] 22%|ββββββββ | 146/650 [22:46<1:07:24, 8.02s/it] 23%|ββββββββ | 147/650 [22:54<1:07:52, 8.10s/it] 23%|ββββββββ | 148/650 [23:02<1:08:08, 8.14s/it] 23%|ββββββββ | 149/650 [23:11<1:08:17, 8.18s/it] 23%|ββββββββ | 150/650 [23:19<1:08:20, 8.20s/it] {'loss': 3.7583, 'grad_norm': 1.2470208406448364, 'learning_rate': 1.8001335480112067e-05, 'ppl': 42.87548, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 215.87060546875, 'tokens/total': 2412544, 'tokens/trainable': 1091234, 'epoch': 11.55} | |
| 23%|ββββββββ | 150/650 [23:19<1:08:20, 8.20s/it][2026-01-24 13:48:52,339] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|β | 2/100 [00:00<00:25, 3.86it/s][A | |
| 3%|β | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|ββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|ββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|βββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββ | 9/100 [00:04<00:49, 1.83it/s][A | |
| 10%|ββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|ββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|βββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|βββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββ | 15/100 [00:07<00:45, 1.89it/s][A | |
| 16%|ββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|ββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|βββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|ββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|βββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|βββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|ββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|ββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|βββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|ββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|ββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|βββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|βββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|ββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|βββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|ββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|βββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|ββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|ββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|βββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|ββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|ββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|βββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|βββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|ββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|βββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|ββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|βββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|βββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|ββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|βββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|ββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|ββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|βββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|βββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|ββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|βββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|ββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|ββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|βββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|βββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|ββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|βββββββββββββββββββββββββββββββββββ| 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.9784648418426514, 'eval_runtime': 53.9896, 'eval_samples_per_second': 3.704, 'eval_steps_per_second': 1.852, 'eval_ppl': 7.23163, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.84, 'epoch': 11.55, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 23%|ββββββββ | 150/650 [24:13<1:08:20, 8.20s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A 23%|ββββββββ | 151/650 [24:21<3:23:09, 24.43s/it] 23%|ββββββββ | 152/650 [24:29<2:42:30, 19.58s/it] 24%|ββββββββ | 153/650 [24:38<2:14:04, 16.19s/it] 24%|ββββββββ | 154/650 [24:46<1:54:06, 13.80s/it] 24%|ββββββββ | 155/650 [24:54<1:40:07, 12.14s/it] {'loss': 3.7622, 'grad_norm': 1.3049488067626953, 'learning_rate': 1.7849308773485226e-05, 'ppl': 43.04302, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 229.1610565185547, 'tokens/total': 2494464, 'tokens/trainable': 1128575, 'epoch': 11.94} | |
| 24%|ββββββββ | 155/650 [24:54<1:40:07, 12.14s/it] 24%|ββββββββ | 156/650 [25:00<1:25:23, 10.37s/it] 24%|ββββββββ | 157/650 [25:11<1:24:45, 10.31s/it] 24%|ββββββββ | 158/650 [25:19<1:19:30, 9.70s/it] 24%|ββββββββ | 159/650 [25:27<1:15:49, 9.27s/it] 25%|ββββββββ | 160/650 [25:35<1:13:12, 8.96s/it] {'loss': 3.6854, 'grad_norm': 1.3209326267242432, 'learning_rate': 1.769240265396249e-05, 'ppl': 39.86106, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 222.77145385742188, 'tokens/total': 2572288, 'tokens/trainable': 1163532, 'epoch': 12.31} | |
| 25%|ββββββββ | 160/650 [25:35<1:13:12, 8.96s/it] 25%|ββββββββ | 161/650 [25:44<1:11:20, 8.75s/it] 25%|ββββββββ | 162/650 [25:52<1:09:58, 8.60s/it] 25%|ββββββββ | 163/650 [26:00<1:08:59, 8.50s/it] 25%|ββββββββ | 164/650 [26:08<1:08:14, 8.43s/it] 25%|ββββββββ | 165/650 [26:17<1:07:41, 8.37s/it] {'loss': 3.6321, 'grad_norm': 1.3130707740783691, 'learning_rate': 1.7530714660036112e-05, 'ppl': 37.7921, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 239.2020721435547, 'tokens/total': 2654208, 'tokens/trainable': 1200592, 'epoch': 12.71} | |
| 25%|ββββββββ | 165/650 [26:17<1:07:41, 8.37s/it] 26%|βββββββββ | 166/650 [26:25<1:07:16, 8.34s/it] 26%|βββββββββ | 167/650 [26:33<1:06:56, 8.32s/it] 26%|βββββββββ | 168/650 [26:41<1:06:40, 8.30s/it] 26%|βββββββββ | 169/650 [26:48<1:01:37, 7.69s/it] 26%|βββββββββ | 170/650 [26:58<1:07:31, 8.44s/it] {'loss': 3.5313, 'grad_norm': 1.2986565828323364, 'learning_rate': 1.736434530278362e-05, 'ppl': 34.16836, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 219.5823516845703, 'tokens/total': 2732032, 'tokens/trainable': 1235455, 'epoch': 13.08} | |
| 26%|βββββββββ | 170/650 [26:58<1:07:31, 8.44s/it] 26%|βββββββββ | 171/650 [27:06<1:06:55, 8.38s/it] 26%|βββββββββ | 172/650 [27:14<1:06:28, 8.34s/it] 27%|βββββββββ | 173/650 [27:23<1:06:08, 8.32s/it] 27%|βββββββββ | 174/650 [27:31<1:05:51, 8.30s/it] 27%|βββββββββ | 175/650 [27:39<1:05:37, 8.29s/it] {'loss': 3.5739, 'grad_norm': 1.4056479930877686, 'learning_rate': 1.7193398003386514e-05, 'ppl': 35.65538, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 217.5475616455078, 'tokens/total': 2813952, 'tokens/trainable': 1272869, 'epoch': 13.47} | |
| 27%|βββββββββ | 175/650 [27:39<1:05:37, 8.29s/it] 27%|βββββββββ | 176/650 [27:47<1:05:24, 8.28s/it] 27%|βββββββββ | 177/650 [27:56<1:05:13, 8.27s/it] 27%|βββββββββ | 178/650 [28:04<1:05:03, 8.27s/it] 28%|βββββββββ | 179/650 [28:12<1:04:52, 8.27s/it] 28%|βββββββββ | 180/650 [28:20<1:04:42, 8.26s/it] {'loss': 3.5038, 'grad_norm': 1.2493181228637695, 'learning_rate': 1.7017979028839918e-05, 'ppl': 33.24153, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 231.85951232910156, 'tokens/total': 2895872, 'tokens/trainable': 1309958, 'epoch': 13.86} | |
| 28%|βββββββββ | 180/650 [28:20<1:04:42, 8.26s/it] 28%|βββββββββ | 181/650 [28:29<1:04:35, 8.26s/it] 28%|βββββββββββββββββ | 182/650 [28:35<59:45, 7.66s/it] 28%|βββββββββββββββββ | 183/650 [28:45<1:05:30, 8.42s/it] 28%|βββββββββββββββββ | 184/650 [28:53<1:04:59, 8.37s/it] 28%|βββββββββββββββββ | 185/650 [29:02<1:04:36, 8.34s/it] {'loss': 3.4501, 'grad_norm': 1.3328948020935059, 'learning_rate': 1.68381974258932e-05, 'ppl': 31.50354, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 223.9250030517578, 'tokens/total': 2973696, 'tokens/trainable': 1344982, 'epoch': 14.24} | |
| 28%|βββββββββββββββββ | 185/650 [29:02<1:04:36, 8.34s/it] 29%|βββββββββββββββββ | 186/650 [29:10<1:04:18, 8.31s/it] 29%|βββββββββββββββββ | 187/650 [29:18<1:04:00, 8.30s/it] 29%|βββββββββββββββββ | 188/650 [29:26<1:03:47, 8.28s/it] 29%|βββββββββββββββββ | 189/650 [29:35<1:03:35, 8.28s/it] 29%|βββββββββββββββββ | 190/650 [29:43<1:03:24, 8.27s/it] {'loss': 3.4112, 'grad_norm': 1.340157389640808, 'learning_rate': 1.6654164953262614e-05, 'ppl': 30.30158, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 220.52084350585938, 'tokens/total': 3055616, 'tokens/trainable': 1382265, 'epoch': 14.63} | |
| 29%|βββββββββββββββββ | 190/650 [29:43<1:03:24, 8.27s/it] 29%|βββββββββββββββββ | 191/650 [29:51<1:03:16, 8.27s/it] 30%|ββββββββββββββββββ | 192/650 [29:59<1:03:07, 8.27s/it] 30%|ββββββββββββββββββ | 193/650 [30:08<1:02:59, 8.27s/it] 30%|ββββββββββββββββββ | 194/650 [30:16<1:02:48, 8.26s/it] 30%|ββββββββββββββββββ | 195/650 [30:22<58:06, 7.66s/it] {'loss': 3.37, 'grad_norm': 1.4470051527023315, 'learning_rate': 1.6465996012157996e-05, 'ppl': 29.07853, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 232.44625854492188, 'tokens/total': 3133440, 'tokens/trainable': 1417035, 'epoch': 15.0} | |
| 30%|ββββββββββββββββββ | 195/650 [30:22<58:06, 7.66s/it] 30%|ββββββββββββββββββ | 196/650 [30:32<1:03:39, 8.41s/it] 30%|ββββββββββββββββββ | 197/650 [30:41<1:03:09, 8.36s/it] 30%|ββββββββββββββββββ | 198/650 [30:49<1:02:46, 8.33s/it] 31%|ββββββββββββββββββ | 199/650 [30:57<1:02:28, 8.31s/it] 31%|ββββββββββββββββββ | 200/650 [31:05<1:02:13, 8.30s/it] {'loss': 3.363, 'grad_norm': 1.4995049238204956, 'learning_rate': 1.6273807575166927e-05, 'ppl': 28.87569, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 232.96780395507812, 'tokens/total': 3215360, 'tokens/trainable': 1454761, 'epoch': 15.39} | |
| 31%|ββββββββββββββββββ | 200/650 [31:05<1:02:13, 8.30s/it][2026-01-24 13:56:38,979] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.86it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.81it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.87it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.85it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.89it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.89it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.89it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.9298608303070068, 'eval_runtime': 54.001, 'eval_samples_per_second': 3.704, 'eval_steps_per_second': 1.852, 'eval_ppl': 6.88855, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.84, 'epoch': 15.39, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 31%|ββββββββββββββββββ | 200/650 [31:59<1:02:13, 8.30s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A[2026-01-24 13:57:32,990] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-200 | |
| 31%|ββββββββββββββββββ | 201/650 [32:09<3:05:05, 24.73s/it] 31%|ββββββββββββββββββ | 202/650 [32:17<2:27:47, 19.79s/it] 31%|ββββββββββββββββββ | 203/650 [32:25<2:01:40, 16.33s/it] 31%|βββββββββββββββββββ | 204/650 [32:33<1:43:23, 13.91s/it] 32%|βββββββββββββββββββ | 205/650 [32:42<1:30:36, 12.22s/it] {'loss': 3.3266, 'grad_norm': 1.821424961090088, 'learning_rate': 1.6077719113540303e-05, 'ppl': 27.84351, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 225.10531616210938, 'tokens/total': 3297280, 'tokens/trainable': 1491439, 'epoch': 15.78} | |
| 32%|βββββββββββββββββββ | 205/650 [32:42<1:30:36, 12.22s/it] 32%|βββββββββββββββββββ | 206/650 [32:50<1:21:37, 11.03s/it] 32%|βββββββββββββββββββ | 207/650 [32:58<1:15:17, 10.20s/it] 32%|βββββββββββββββββββ | 208/650 [33:04<1:06:25, 9.02s/it] 32%|βββββββββββββββββββ | 209/650 [33:15<1:08:51, 9.37s/it] 32%|βββββββββββββββββββ | 210/650 [33:23<1:06:15, 9.03s/it] {'loss': 3.2791, 'grad_norm': 1.4399441480636597, 'learning_rate': 1.5877852522924733e-05, 'ppl': 26.55187, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 213.16677856445312, 'tokens/total': 3375104, 'tokens/trainable': 1526436, 'epoch': 16.16} | |
| 32%|βββββββββββββββββββ | 210/650 [33:23<1:06:15, 9.03s/it] 32%|βββββββββββββββββββ | 211/650 [33:31<1:04:24, 8.80s/it] 33%|βββββββββββββββββββ | 212/650 [33:39<1:03:04, 8.64s/it] 33%|βββββββββββββββββββ | 213/650 [33:48<1:02:06, 8.53s/it] 33%|βββββββββββββββββββ | 214/650 [33:56<1:01:23, 8.45s/it] 33%|ββββββββββββββββββββ | 215/650 [34:04<1:00:50, 8.39s/it] {'loss': 3.2551, 'grad_norm': 1.3437100648880005, 'learning_rate': 1.567433204758782e-05, 'ppl': 25.92221, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 217.27838134765625, 'tokens/total': 3457024, 'tokens/trainable': 1563597, 'epoch': 16.55} | |
| 33%|ββββββββββββββββββββ | 215/650 [34:04<1:00:50, 8.39s/it] 33%|ββββββββββββββββββββ | 216/650 [34:12<1:00:25, 8.35s/it] 33%|ββββββββββββββββββββ | 217/650 [34:21<1:00:05, 8.33s/it] 34%|ββββββββββββββββββββ | 218/650 [34:29<59:48, 8.31s/it] 34%|βββββββββββββββββββββ | 219/650 [34:37<59:34, 8.29s/it] 34%|βββββββββββββββββββββ | 220/650 [34:45<57:23, 8.01s/it] {'loss': 3.2498, 'grad_norm': 1.3698214292526245, 'learning_rate': 1.5467284203183437e-05, 'ppl': 25.78518, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 252.31300354003906, 'tokens/total': 3538944, 'tokens/trainable': 1601156, 'epoch': 16.94} | |
| 34%|βββββββββββββββββββββ | 220/650 [34:45<57:23, 8.01s/it] 34%|βββββββββββββββββββββ | 221/650 [34:51<53:28, 7.48s/it] 34%|βββββββββββββββββββββ | 222/650 [35:01<59:05, 8.28s/it] 34%|βββββββββββββββββββββ | 223/650 [35:09<58:54, 8.28s/it] 34%|βββββββββββββββββββββ | 224/650 [35:17<58:44, 8.27s/it] 35%|βββββββββββββββββββββ | 225/650 [35:26<58:33, 8.27s/it] {'loss': 3.163, 'grad_norm': 1.3603277206420898, 'learning_rate': 1.5256837698105047e-05, 'ppl': 23.64141, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 231.8495330810547, 'tokens/total': 3616768, 'tokens/trainable': 1635980, 'epoch': 17.31} | |
| 35%|βββββββββββββββββββββ | 225/650 [35:26<58:33, 8.27s/it] 35%|βββββββββββββββββββββ | 226/650 [35:34<58:25, 8.27s/it] 35%|βββββββββββββββββββββ | 227/650 [35:42<58:16, 8.27s/it] 35%|βββββββββββββββββββββ | 228/650 [35:50<58:07, 8.26s/it] 35%|ββββββββββββββββββββββ | 229/650 [35:59<57:58, 8.26s/it] 35%|ββββββββββββββββββββββ | 230/650 [36:07<57:49, 8.26s/it] {'loss': 3.1506, 'grad_norm': 1.3459270000457764, 'learning_rate': 1.5043123353475944e-05, 'ppl': 23.35007, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 247.0294189453125, 'tokens/total': 3698688, 'tokens/trainable': 1673180, 'epoch': 17.71} | |
| 35%|ββββββββββββββββββββββ | 230/650 [36:07<57:49, 8.26s/it] 36%|ββββββββββββββββββββββ | 231/650 [36:15<57:40, 8.26s/it] 36%|ββββββββββββββββββββββ | 232/650 [36:24<57:31, 8.26s/it] 36%|ββββββββββββββββββββββ | 233/650 [36:32<57:22, 8.25s/it] 36%|ββββββββββββββββββββββ | 234/650 [36:38<53:03, 7.65s/it] 36%|ββββββββββββββββββββββ | 235/650 [36:48<58:08, 8.41s/it] {'loss': 3.1503, 'grad_norm': 1.4214606285095215, 'learning_rate': 1.482627402182611e-05, 'ppl': 23.34307, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 231.67904663085938, 'tokens/total': 3776512, 'tokens/trainable': 1708085, 'epoch': 18.08} | |
| 36%|ββββββββββββββββββββββ | 235/650 [36:48<58:08, 8.41s/it] 36%|ββββββββββββββββββββββ | 236/650 [36:56<57:40, 8.36s/it] 36%|ββββββββββββββββββββββ | 237/650 [37:05<57:20, 8.33s/it] 37%|ββββββββββββββββββββββ | 238/650 [37:13<57:04, 8.31s/it] 37%|ββββββββββββββββββββββ | 239/650 [37:21<56:49, 8.30s/it] 37%|βββββββββββββββββββββββ | 240/650 [37:29<56:37, 8.29s/it] {'loss': 3.1352, 'grad_norm': 1.3624988794326782, 'learning_rate': 1.4606424504506325e-05, 'ppl': 22.99323, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 242.47784423828125, 'tokens/total': 3858432, 'tokens/trainable': 1745240, 'epoch': 18.47} | |
| 37%|βββββββββββββββββββββββ | 240/650 [37:29<56:37, 8.29s/it] 37%|βββββββββββββββββββββββ | 241/650 [37:38<56:25, 8.28s/it] 37%|βββββββββββββββββββββββ | 242/650 [37:46<56:15, 8.27s/it] 37%|βββββββββββββββββββββββ | 243/650 [37:54<56:06, 8.27s/it] 38%|βββββββββββββββββββββββ | 244/650 [38:03<55:57, 8.27s/it] 38%|βββββββββββββββββββββββ | 245/650 [38:11<55:49, 8.27s/it] {'loss': 3.0854, 'grad_norm': 1.3335026502609253, 'learning_rate': 1.4383711467890776e-05, 'ppl': 21.87622, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 225.49703979492188, 'tokens/total': 3940352, 'tokens/trainable': 1782485, 'epoch': 18.86} | |
| 38%|βββββββββββββββββββββββ | 245/650 [38:11<55:49, 8.27s/it] 38%|βββββββββββββββββββββββ | 246/650 [38:19<55:41, 8.27s/it] 38%|βββββββββββββββββββββββ | 247/650 [38:25<51:31, 7.67s/it] 38%|βββββββββββββββββββββββ | 248/650 [38:35<56:19, 8.41s/it] 38%|βββββββββββββββββββββββ | 249/650 [38:44<55:53, 8.36s/it] 38%|βββββββββββββββββββββββ | 250/650 [38:52<55:32, 8.33s/it] {'loss': 3.0568, 'grad_norm': 1.3789132833480835, 'learning_rate': 1.415827335842033e-05, 'ppl': 21.25942, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.84, 'tokens/train_per_sec_per_gpu': 224.20953369140625, 'tokens/total': 4018176, 'tokens/trainable': 1817262, 'epoch': 19.24} | |
| 38%|βββββββββββββββββββββββ | 250/650 [38:52<55:32, 8.33s/it][2026-01-24 14:04:25,521] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.83it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.89it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:09, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.8664345741271973, 'eval_runtime': 53.9853, 'eval_samples_per_second': 3.705, 'eval_steps_per_second': 1.852, 'eval_ppl': 6.4652, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.84, 'epoch': 19.24, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 38%|βββββββββββββββββββββββ | 250/650 [39:46<55:32, 8.33s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A 39%|βββββββββββββββββββββββ | 251/650 [39:54<2:43:04, 24.52s/it] 39%|βββββββββββββββββββββββ | 252/650 [40:03<2:10:19, 19.65s/it] 39%|βββββββββββββββββββββββ | 253/650 [40:11<1:47:24, 16.23s/it] 39%|βββββββββββββββββββββββ | 254/650 [40:19<1:31:19, 13.84s/it] 39%|βββββββββββββββββββββββ | 255/650 [40:27<1:20:04, 12.16s/it] {'loss': 3.0257, 'grad_norm': 1.3327895402908325, 'learning_rate': 1.3930250316539237e-05, 'ppl': 20.60843, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 229.9409942626953, 'tokens/total': 4100096, 'tokens/trainable': 1854578, 'epoch': 19.63} | |
| 39%|βββββββββββββββββββββββ | 255/650 [40:27<1:20:04, 12.16s/it] 39%|βββββββββββββββββββββββ | 256/650 [40:36<1:12:10, 10.99s/it] 40%|βββββββββββββββββββββββ | 257/650 [40:44<1:06:35, 10.17s/it] 40%|βββββββββββββββββββββββ | 258/650 [40:52<1:02:39, 9.59s/it] 40%|ββββββββββββββββββββββββ | 259/650 [41:00<59:51, 9.19s/it] 40%|ββββββββββββββββββββββββ | 260/650 [41:07<53:59, 8.31s/it] {'loss': 3.0287, 'grad_norm': 1.5805764198303223, 'learning_rate': 1.3699784089578791e-05, 'ppl': 20.67034, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 196.57861328125, 'tokens/total': 4177920, 'tokens/trainable': 1889380, 'epoch': 20.0} | |
| 40%|ββββββββββββββββββββββββ | 260/650 [41:07<53:59, 8.31s/it] 40%|ββββββββββββββββββββββββ | 261/650 [41:17<57:45, 8.91s/it] 40%|βββββββββββββββββββββββββ | 262/650 [41:25<56:22, 8.72s/it] 40%|βββββββββββββββββββββββββ | 263/650 [41:33<55:20, 8.58s/it] 41%|βββββββββββββββββββββββββ | 264/650 [41:42<54:36, 8.49s/it] 41%|βββββββββββββββββββββββββ | 265/650 [41:50<54:01, 8.42s/it] {'loss': 3.0286, 'grad_norm': 1.3351317644119263, 'learning_rate': 1.3467017943642074e-05, 'ppl': 20.66828, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 215.64198303222656, 'tokens/total': 4259840, 'tokens/trainable': 1927132, 'epoch': 20.39} | |
| 41%|βββββββββββββββββββββββββ | 265/650 [41:50<54:01, 8.42s/it] 41%|βββββββββββββββββββββββββ | 266/650 [41:58<53:38, 8.38s/it] 41%|βββββββββββββββββββββββββ | 267/650 [42:07<53:18, 8.35s/it] 41%|βββββββββββββββββββββββββ | 268/650 [42:15<53:02, 8.33s/it] 41%|βββββββββββββββββββββββββ | 269/650 [42:23<52:48, 8.32s/it] 42%|βββββββββββββββββββββββββ | 270/650 [42:31<52:34, 8.30s/it] {'loss': 2.9528, 'grad_norm': 1.3210023641586304, 'learning_rate': 1.3232096574544602e-05, 'ppl': 19.15953, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 214.79986572265625, 'tokens/total': 4341760, 'tokens/trainable': 1964160, 'epoch': 20.78} | |
| 42%|βββββββββββββββββββββββββ | 270/650 [42:31<52:34, 8.30s/it] 42%|βββββββββββββββββββββββββ | 271/650 [42:40<52:21, 8.29s/it] 42%|βββββββββββββββββββββββββ | 272/650 [42:48<52:09, 8.28s/it] 42%|ββββββββββββββββββββββββββ | 273/650 [42:54<48:10, 7.67s/it] 42%|ββββββββββββββββββββββββββ | 274/650 [43:04<52:54, 8.44s/it] 42%|ββββββββββββββββββββββββββ | 275/650 [43:13<52:26, 8.39s/it] {'loss': 2.968, 'grad_norm': 1.3026108741760254, 'learning_rate': 1.2995166017866194e-05, 'ppl': 19.45297, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 220.983642578125, 'tokens/total': 4419584, 'tokens/trainable': 1998717, 'epoch': 21.16} | |
| 42%|ββββββββββββββββββββββββββ | 275/650 [43:13<52:26, 8.39s/it] 42%|ββββββββββββββββββββββββββ | 276/650 [43:21<52:04, 8.35s/it] 43%|ββββββββββββββββββββββββββ | 277/650 [43:29<51:47, 8.33s/it] 43%|ββββββββββββββββββββββββββ | 278/650 [43:37<51:30, 8.31s/it] 43%|ββββββββββββββββββββββββββ | 279/650 [43:46<51:17, 8.29s/it] 43%|ββββββββββββββββββββββββββ | 280/650 [43:54<51:06, 8.29s/it] {'loss': 2.9471, 'grad_norm': 1.598791241645813, 'learning_rate': 1.2756373558169992e-05, 'ppl': 19.05063, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 220.04495239257812, 'tokens/total': 4501504, 'tokens/trainable': 2035786, 'epoch': 21.55} | |
| 43%|ββββββββββββββββββββββββββ | 280/650 [43:54<51:06, 8.29s/it] 43%|ββββββββββββββββββββββββββ | 281/650 [44:01<49:13, 8.00s/it] 43%|ββββββββββββββββββββββββββ | 282/650 [44:10<49:35, 8.09s/it] 44%|ββββββββββββββββββββββββββ | 283/650 [44:18<49:47, 8.14s/it] 44%|βββββββββββββββββββββββββββ | 284/650 [44:26<49:53, 8.18s/it] 44%|βββββββββββββββββββββββββββ | 285/650 [44:34<49:54, 8.20s/it] {'loss': 2.8982, 'grad_norm': 1.4100048542022705, 'learning_rate': 1.2515867637445088e-05, 'ppl': 18.14146, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 222.9599609375, 'tokens/total': 4583424, 'tokens/trainable': 2073333, 'epoch': 21.94} | |
| 44%|βββββββββββββββββββββββββββ | 285/650 [44:34<49:54, 8.20s/it] 44%|βββββββββββββββββββββββββββ | 286/650 [44:41<46:13, 7.62s/it] 44%|βββββββββββββββββββββββββββ | 287/650 [44:51<50:45, 8.39s/it] 44%|βββββββββββββββββββββββββββ | 288/650 [44:59<50:21, 8.35s/it] 44%|βββββββββββββββββββββββββββ | 289/650 [45:07<50:03, 8.32s/it] 45%|βββββββββββββββββββββββββββ | 290/650 [45:16<49:49, 8.30s/it] {'loss': 2.8945, 'grad_norm': 1.2952361106872559, 'learning_rate': 1.2273797762829615e-05, 'ppl': 18.07446, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 226.50791931152344, 'tokens/total': 4661248, 'tokens/trainable': 2108119, 'epoch': 22.31} | |
| 45%|βββββββββββββββββββββββββββ | 290/650 [45:16<49:49, 8.30s/it] 45%|βββββββββββββββββββββββββββ | 291/650 [45:24<49:36, 8.29s/it] 45%|βββββββββββββββββββββββββββ | 292/650 [45:32<49:25, 8.28s/it] 45%|βββββββββββββββββββββββββββ | 293/650 [45:40<49:14, 8.28s/it] 45%|ββββββββββββββββββββββββββββ | 294/650 [45:49<49:04, 8.27s/it] 45%|ββββββββββββββββββββββββββββ | 295/650 [45:57<48:55, 8.27s/it] {'loss': 2.8828, 'grad_norm': 1.2925727367401123, 'learning_rate': 1.2030314413671763e-05, 'ppl': 17.86422, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 210.02886962890625, 'tokens/total': 4743168, 'tokens/trainable': 2144924, 'epoch': 22.71} | |
| 45%|ββββββββββββββββββββββββββββ | 295/650 [45:57<48:55, 8.27s/it] 46%|ββββββββββββββββββββββββββββ | 296/650 [46:05<48:44, 8.26s/it] 46%|ββββββββββββββββββββββββββββ | 297/650 [46:13<48:34, 8.26s/it] 46%|ββββββββββββββββββββββββββββ | 298/650 [46:22<48:26, 8.26s/it] 46%|ββββββββββββββββββββββββββββ | 299/650 [46:28<44:47, 7.66s/it] 46%|ββββββββββββββββββββββββββββ | 300/650 [46:38<49:05, 8.42s/it] {'loss': 2.8736, 'grad_norm': 1.3145809173583984, 'learning_rate': 1.1785568947986368e-05, 'ppl': 17.70063, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 220.417236328125, 'tokens/total': 4820992, 'tokens/trainable': 2180039, 'epoch': 23.08} | |
| 46%|ββββββββββββββββββββββββββββ | 300/650 [46:38<49:05, 8.42s/it][2026-01-24 14:12:11,621] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.87it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.72it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.36it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:51, 1.81it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.87it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.87it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.89it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.90it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.89it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.90it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:39, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.89it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.90it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.89it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.90it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:34, 1.89it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.90it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.89it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.90it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:21<00:30, 1.89it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.90it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.89it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:27, 1.90it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.89it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.89it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:29<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:38<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:46<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:47<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.81it/s][A | |
| [A{'eval_loss': 1.8134173154830933, 'eval_runtime': 53.9413, 'eval_samples_per_second': 3.708, 'eval_steps_per_second': 1.854, 'eval_ppl': 6.13136, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 23.08, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 46%|ββββββββββββββββββββββββββββ | 300/650 [47:32<49:05, 8.42s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.81it/s][A | |
| [A[2026-01-24 14:13:05,572] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-300 | |
| 46%|βββββββββββββββββββββββββββ | 301/650 [47:41<2:24:13, 24.79s/it] 46%|βββββββββββββββββββββββββββ | 302/650 [47:49<1:55:01, 19.83s/it] 47%|βββββββββββββββββββββββββββ | 303/650 [47:58<1:34:35, 16.36s/it] 47%|ββββββββββββββββββββββββββββ | 304/650 [48:06<1:20:17, 13.92s/it] 47%|ββββββββββββββββββββββββββββ | 305/650 [48:14<1:10:15, 12.22s/it] {'loss': 2.8713, 'grad_norm': 1.3867206573486328, 'learning_rate': 1.1539713508365336e-05, 'ppl': 17.65996, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 234.4234619140625, 'tokens/total': 4902912, 'tokens/trainable': 2217599, 'epoch': 23.47} | |
| 47%|ββββββββββββββββββββββββββββ | 305/650 [48:14<1:10:15, 12.22s/it] 47%|ββββββββββββββββββββββββββββ | 306/650 [48:22<1:03:13, 11.03s/it] 47%|βββββββββββββββββββββββββββββ | 307/650 [48:31<58:16, 10.19s/it] 47%|βββββββββββββββββββββββββββββ | 308/650 [48:39<54:46, 9.61s/it] 48%|βββββββββββββββββββββββββββββ | 309/650 [48:47<52:18, 9.20s/it] 48%|βββββββββββββββββββββββββββββ | 310/650 [48:55<50:31, 8.92s/it] {'loss': 2.8459, 'grad_norm': 1.3547266721725464, 'learning_rate': 1.1292900927400334e-05, 'ppl': 17.21705, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.23565673828125, 'tokens/total': 4984832, 'tokens/trainable': 2255036, 'epoch': 23.86} | |
| 48%|βββββββββββββββββββββββββββββ | 310/650 [48:55<50:31, 8.92s/it] 48%|βββββββββββββββββββββββββββββ | 311/650 [49:04<49:13, 8.71s/it] 48%|βββββββββββββββββββββββββββββ | 312/650 [49:10<44:54, 7.97s/it] 48%|βββββββββββββββββββββββββββββ | 313/650 [49:20<48:34, 8.65s/it] 48%|βββββββββββββββββββββββββββββ | 314/650 [49:28<47:44, 8.53s/it] 48%|βββββββββββββββββββββββββββββ | 315/650 [49:37<47:07, 8.44s/it] {'loss': 2.806, 'grad_norm': 1.2744239568710327, 'learning_rate': 1.1045284632676535e-05, 'ppl': 16.54361, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 227.40797424316406, 'tokens/total': 5062656, 'tokens/trainable': 2289674, 'epoch': 24.24} | |
| 48%|βββββββββββββββββββββββββββββ | 315/650 [49:37<47:07, 8.44s/it] 49%|ββββββββββββββββββββββββββββββ | 316/650 [49:45<46:38, 8.38s/it] 49%|ββββββββββββββββββββββββββββββ | 317/650 [49:53<46:16, 8.34s/it] 49%|ββββββββββββββββββββββββββββββ | 318/650 [50:01<45:57, 8.31s/it] 49%|ββββββββββββββββββββββββββββββ | 319/650 [50:09<45:42, 8.29s/it] 49%|ββββββββββββββββββββββββββββββ | 320/650 [50:18<45:30, 8.27s/it] {'loss': 2.8271, 'grad_norm': 1.492186427116394, 'learning_rate': 1.0797018551396527e-05, 'ppl': 16.89639, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 228.06138610839844, 'tokens/total': 5144576, 'tokens/trainable': 2327003, 'epoch': 24.63} | |
| 49%|ββββββββββββββββββββββββββββββ | 320/650 [50:18<45:30, 8.27s/it] 49%|ββββββββββββββββββββββββββββββ | 321/650 [50:26<45:19, 8.27s/it] 50%|ββββββββββββββββββββββββββββββ | 322/650 [50:34<45:09, 8.26s/it] 50%|ββββββββββββββββββββββββββββββ | 323/650 [50:42<45:00, 8.26s/it] 50%|ββββββββββββββββββββββββββββββ | 324/650 [50:51<44:50, 8.25s/it] 50%|ββββββββββββββββββββββββββββββ | 325/650 [50:57<41:26, 7.65s/it] {'loss': 2.8195, 'grad_norm': 1.6219910383224487, 'learning_rate': 1.0548257014693602e-05, 'ppl': 16.76846, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 186.30055236816406, 'tokens/total': 5222400, 'tokens/trainable': 2361725, 'epoch': 25.0} | |
| 50%|ββββββββββββββββββββββββββββββ | 325/650 [50:57<41:26, 7.65s/it] 50%|ββββββββββββββββββββββββββββββ | 326/650 [51:07<45:12, 8.37s/it] 50%|βββββββββββββββββββββββββββββββ | 327/650 [51:15<44:52, 8.34s/it] 50%|βββββββββββββββββββββββββββββββ | 328/650 [51:24<44:36, 8.31s/it] 51%|βββββββββββββββββββββββββββββββ | 329/650 [51:32<44:22, 8.30s/it] 51%|βββββββββββββββββββββββββββββββ | 330/650 [51:40<44:11, 8.28s/it] {'loss': 2.7829, 'grad_norm': 1.4858134984970093, 'learning_rate': 1.0299154661693987e-05, 'ppl': 16.16583, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 210.44517517089844, 'tokens/total': 5304320, 'tokens/trainable': 2398647, 'epoch': 25.39} | |
| 51%|βββββββββββββββββββββββββββββββ | 330/650 [51:40<44:11, 8.28s/it] 51%|βββββββββββββββββββββββββββββββ | 331/650 [51:48<44:00, 8.28s/it] 51%|βββββββββββββββββββββββββββββββ | 332/650 [51:57<43:50, 8.27s/it] 51%|βββββββββββββββββββββββββββββββ | 333/650 [52:05<43:41, 8.27s/it] 51%|βββββββββββββββββββββββββββββββ | 334/650 [52:13<43:32, 8.27s/it] 52%|βββββββββββββββββββββββββββββββ | 335/650 [52:21<43:23, 8.26s/it] {'loss': 2.7914, 'grad_norm': 1.3230637311935425, 'learning_rate': 1.0049866343387582e-05, 'ppl': 16.30383, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 243.39698791503906, 'tokens/total': 5386240, 'tokens/trainable': 2436590, 'epoch': 25.78} | |
| 52%|βββββββββββββββββββββββββββββββ | 335/650 [52:21<43:23, 8.26s/it] 52%|βββββββββββββββββββββββββββββββ | 336/650 [52:30<43:15, 8.26s/it] 52%|βββββββββββββββββββββββββββββββ | 337/650 [52:38<43:05, 8.26s/it] 52%|ββββββββββββββββββββββββββββββββ | 338/650 [52:44<39:46, 7.65s/it] 52%|ββββββββββββββββββββββββββββββββ | 339/650 [52:54<43:40, 8.43s/it] 52%|ββββββββββββββββββββββββββββββββ | 340/650 [53:03<43:16, 8.38s/it] {'loss': 2.8088, 'grad_norm': 1.4489848613739014, 'learning_rate': 9.800547026367022e-06, 'ppl': 16.59, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 224.92234802246094, 'tokens/total': 5464064, 'tokens/trainable': 2471039, 'epoch': 26.16} | |
| 52%|ββββββββββββββββββββββββββββββββ | 340/650 [53:03<43:16, 8.38s/it] 52%|ββββββββββββββββββββββββββββββββ | 341/650 [53:11<42:56, 8.34s/it] 53%|ββββββββββββββββββββββββββββββββ | 342/650 [53:19<42:40, 8.31s/it] 53%|ββββββββββββββββββββββββββββββββ | 343/650 [53:27<42:29, 8.30s/it] 53%|ββββββββββββββββββββββββββββββββ | 344/650 [53:36<42:16, 8.29s/it] 53%|ββββββββββββββββββββββββββββββββ | 345/650 [53:43<40:42, 8.01s/it] {'loss': 2.7648, 'grad_norm': 1.5243545770645142, 'learning_rate': 9.551351696494854e-06, 'ppl': 15.87586, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 254.18601989746094, 'tokens/total': 5545984, 'tokens/trainable': 2508382, 'epoch': 26.55} | |
| 53%|ββββββββββββββββββββββββββββββββ | 345/650 [53:43<40:42, 8.01s/it] 53%|ββββββββββββββββββββββββββββββββ | 346/650 [53:50<39:33, 7.81s/it] 53%|ββββββββββββββββββββββββββββββββ | 347/650 [53:59<40:08, 7.95s/it] 54%|ββββββββββββββββββββββββββββββββ | 348/650 [54:07<40:28, 8.04s/it] 54%|βββββββββββββββββββββββββββββββββ | 349/650 [54:15<40:39, 8.10s/it] 54%|βββββββββββββββββββββββββββββββββ | 350/650 [54:23<40:44, 8.15s/it] {'loss': 2.7646, 'grad_norm': 1.346397042274475, 'learning_rate': 9.302435262558748e-06, 'ppl': 15.87269, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 239.00399780273438, 'tokens/total': 5627904, 'tokens/trainable': 2545727, 'epoch': 26.94} | |
| 54%|βββββββββββββββββββββββββββββββββ | 350/650 [54:23<40:44, 8.15s/it][2026-01-24 14:19:56,877] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.83it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.85it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:45, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.89it/s][A | |
| 17%|βββββββββββ | 17/100 [00:09<00:59, 1.40it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:53, 1.53it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:10<00:49, 1.63it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:46, 1.70it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:11<00:45, 1.73it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:12<00:43, 1.78it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:12<00:42, 1.82it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:13<00:41, 1.85it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:41, 1.82it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:14<00:39, 1.86it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.87it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:15<00:38, 1.88it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.85it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:16<00:37, 1.87it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.88it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:17<00:35, 1.89it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:18<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:19<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:20<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:20<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:21<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:21<00:31, 1.89it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:22<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:23<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.89it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:24<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:25<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.89it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:26<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:27<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.89it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:28<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:29<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:29<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:30<00:23, 1.89it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:31<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:32<00:21, 1.89it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:33<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:34<00:19, 1.89it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:35<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:36<00:16, 1.89it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:37<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:37<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:38<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:38<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:39<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:40<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.89it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:41<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:42<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.89it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:43<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:44<00:09, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.89it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:45<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:46<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:46<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:47<00:06, 1.89it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.85it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:48<00:05, 1.87it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:49<00:04, 1.89it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.84it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:50<00:03, 1.87it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.88it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:51<00:02, 1.89it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.85it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:52<00:01, 1.87it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.81it/s][A | |
| [A{'eval_loss': 1.7851392030715942, 'eval_runtime': 54.6859, 'eval_samples_per_second': 3.657, 'eval_steps_per_second': 1.829, 'eval_ppl': 5.96041, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 26.94, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 54%|βββββββββββββββββββββββββββββββββ | 350/650 [55:18<40:44, 8.15s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.81it/s][A | |
| [A 54%|ββββββββββββββββββββββββββββββββ | 351/650 [55:24<1:59:35, 24.00s/it] 54%|ββββββββββββββββββββββββββββββββ | 352/650 [55:35<1:38:45, 19.88s/it] 54%|ββββββββββββββββββββββββββββββββ | 353/650 [55:43<1:21:11, 16.40s/it] 54%|ββββββββββββββββββββββββββββββββ | 354/650 [55:51<1:08:51, 13.96s/it] 55%|ββββββββββββββββββββββββββββββββ | 355/650 [55:59<1:00:13, 12.25s/it] {'loss': 2.729, 'grad_norm': 1.6997756958007812, 'learning_rate': 9.05395245997463e-06, 'ppl': 15.31756, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 241.9293975830078, 'tokens/total': 5705728, 'tokens/trainable': 2580343, 'epoch': 27.31} | |
| 55%|ββββββββββββββββββββββββββββββββ | 355/650 [55:59<1:00:13, 12.25s/it] 55%|βββββββββββββββββββββββββββββββββ | 356/650 [56:08<54:09, 11.05s/it] 55%|βββββββββββββββββββββββββββββββββ | 357/650 [56:16<49:53, 10.22s/it] 55%|βββββββββββββββββββββββββββββββββ | 358/650 [56:24<46:51, 9.63s/it] 55%|ββββββββββββββββββββββββββββββββββ | 359/650 [56:32<44:41, 9.22s/it] 55%|ββββββββββββββββββββββββββββββββββ | 360/650 [56:41<43:09, 8.93s/it] {'loss': 2.7679, 'grad_norm': 1.5107603073120117, 'learning_rate': 8.806057754597559e-06, 'ppl': 15.92516, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 228.39883422851562, 'tokens/total': 5787648, 'tokens/trainable': 2617759, 'epoch': 27.71} | |
| 55%|ββββββββββββββββββββββββββββββββββ | 360/650 [56:41<43:09, 8.93s/it] 56%|ββββββββββββββββββββββββββββββββββ | 361/650 [56:49<42:02, 8.73s/it] 56%|ββββββββββββββββββββββββββββββββββ | 362/650 [56:57<41:12, 8.59s/it] 56%|ββββββββββββββββββββββββββββββββββ | 363/650 [57:05<40:35, 8.48s/it] 56%|ββββββββββββββββββββββββββββββββββ | 364/650 [57:12<37:15, 7.81s/it] 56%|ββββββββββββββββββββββββββββββββββ | 365/650 [57:22<40:31, 8.53s/it] {'loss': 2.7165, 'grad_norm': 1.5286332368850708, 'learning_rate': 8.558905246700202e-06, 'ppl': 15.12728, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 241.69508361816406, 'tokens/total': 5865472, 'tokens/trainable': 2652851, 'epoch': 28.08} | |
| 56%|ββββββββββββββββββββββββββββββββββ | 365/650 [57:22<40:31, 8.53s/it] 56%|ββββββββββββββββββββββββββββββββββ | 366/650 [57:30<39:58, 8.45s/it] 56%|ββββββββββββββββββββββββββββββββββ | 367/650 [57:38<39:33, 8.39s/it] 57%|ββββββββββββββββββββββββββββββββββ | 368/650 [57:47<39:13, 8.35s/it] 57%|ββββββββββββββββββββββββββββββββββ | 369/650 [57:55<38:56, 8.31s/it] 57%|βββββββββββββββββββββββββββββββββββ | 370/650 [58:03<38:42, 8.29s/it] {'loss': 2.755, 'grad_norm': 1.4682552814483643, 'learning_rate': 8.312648575178552e-06, 'ppl': 15.72104, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.88661193847656, 'tokens/total': 5947392, 'tokens/trainable': 2689749, 'epoch': 28.47} | |
| 57%|βββββββββββββββββββββββββββββββββββ | 370/650 [58:03<38:42, 8.29s/it] 57%|βββββββββββββββββββββββββββββββββββ | 371/650 [58:11<38:30, 8.28s/it] 57%|βββββββββββββββββββββββββββββββββββ | 372/650 [58:20<38:19, 8.27s/it] 57%|βββββββββββββββββββββββββββββββββββ | 373/650 [58:28<38:08, 8.26s/it] 58%|βββββββββββββββββββββββββββββββββββ | 374/650 [58:36<37:59, 8.26s/it] 58%|βββββββββββββββββββββββββββββββββββ | 375/650 [58:44<37:49, 8.25s/it] {'loss': 2.7209, 'grad_norm': 1.2960118055343628, 'learning_rate': 8.06744082204447e-06, 'ppl': 15.19399, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 236.8324432373047, 'tokens/total': 6029312, 'tokens/trainable': 2727282, 'epoch': 28.86} | |
| 58%|βββββββββββββββββββββββββββββββββββ | 375/650 [58:44<37:49, 8.25s/it] 58%|βββββββββββββββββββββββββββββββββββ | 376/650 [58:53<37:40, 8.25s/it] 58%|βββββββββββββββββββββββββββββββββββ | 377/650 [58:59<34:47, 7.65s/it] 58%|βββββββββββββββββββββββββββββββββββ | 378/650 [59:09<38:14, 8.44s/it] 58%|βββββββββββββββββββββββββββββββββββ | 379/650 [59:17<37:50, 8.38s/it] 58%|βββββββββββββββββββββββββββββββββββ | 380/650 [59:25<36:16, 8.06s/it] {'loss': 2.7082, 'grad_norm': 1.470637321472168, 'learning_rate': 7.823434417264378e-06, 'ppl': 15.00225, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 263.5998840332031, 'tokens/total': 6107136, 'tokens/trainable': 2761812, 'epoch': 29.24} | |
| 58%|βββββββββββββββββββββββββββββββββββ | 380/650 [59:25<36:16, 8.06s/it] 59%|ββββββββββββββββββββββββββββββββββββ | 381/650 [59:33<36:22, 8.11s/it] 59%|ββββββββββββββββββββββββββββββββββββ | 382/650 [59:41<36:24, 8.15s/it] 59%|ββββββββββββββββββββββββββββββββββββ | 383/650 [59:49<36:25, 8.18s/it] 59%|ββββββββββββββββββββββββββββββββββββ | 384/650 [59:58<36:22, 8.20s/it] 59%|βββββββββββββββββββββββββββββββββββ | 385/650 [1:00:06<36:16, 8.21s/it] {'loss': 2.7407, 'grad_norm': 1.5776325464248657, 'learning_rate': 7.580781044003324e-06, 'ppl': 15.49783, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 217.9058380126953, 'tokens/total': 6189056, 'tokens/trainable': 2799357, 'epoch': 29.63} | |
| 59%|βββββββββββββββββββββββββββββββββββ | 385/650 [1:00:06<36:16, 8.21s/it] 59%|βββββββββββββββββββββββββββββββββββ | 386/650 [1:00:14<36:12, 8.23s/it] 60%|βββββββββββββββββββββββββββββββββββ | 387/650 [1:00:22<36:06, 8.24s/it] 60%|βββββββββββββββββββββββββββββββββββ | 388/650 [1:00:31<35:59, 8.24s/it] 60%|βββββββββββββββββββββββββββββββββββ | 389/650 [1:00:39<35:52, 8.25s/it] 60%|βββββββββββββββββββββββββββββββββββ | 390/650 [1:00:45<33:08, 7.65s/it] {'loss': 2.6897, 'grad_norm': 1.5815974473953247, 'learning_rate': 7.33963154433325e-06, 'ppl': 14.72726, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 200.30467224121094, 'tokens/total': 6266880, 'tokens/trainable': 2834070, 'epoch': 30.0} | |
| 60%|βββββββββββββββββββββββββββββββββββ | 390/650 [1:00:45<33:08, 7.65s/it] 60%|βββββββββββββββββββββββββββββββββββ | 391/650 [1:00:55<36:19, 8.42s/it] 60%|βββββββββββββββββββββββββββββββββββ | 392/650 [1:01:04<35:59, 8.37s/it] 60%|βββββββββββββββββββββββββββββββββββ | 393/650 [1:01:12<35:42, 8.34s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 394/650 [1:01:20<35:29, 8.32s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 395/650 [1:01:28<35:16, 8.30s/it] {'loss': 2.7001, 'grad_norm': 1.572016954421997, 'learning_rate': 7.100135825464138e-06, 'ppl': 14.88122, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 236.17343139648438, 'tokens/total': 6348800, 'tokens/trainable': 2871943, 'epoch': 30.39} | |
| 61%|ββββββββββββββββββββββββββββββββββββ | 395/650 [1:01:28<35:16, 8.30s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 396/650 [1:01:37<35:04, 8.29s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 397/650 [1:01:45<34:54, 8.28s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 398/650 [1:01:53<34:44, 8.27s/it] 61%|ββββββββββββββββββββββββββββββββββββ | 399/650 [1:02:01<34:34, 8.27s/it] 62%|ββββββββββββββββββββββββββββββββββββ | 400/650 [1:02:10<34:25, 8.26s/it] {'loss': 2.6891, 'grad_norm': 1.3674300909042358, 'learning_rate': 6.862442766556297e-06, 'ppl': 14.71842, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 229.60092163085938, 'tokens/total': 6430720, 'tokens/trainable': 2908815, 'epoch': 30.78} | |
| 62%|ββββββββββββββββββββββββββββββββββββ | 400/650 [1:02:10<34:25, 8.26s/it][2026-01-24 14:27:43,271] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.86it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.83it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.85it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.83it/s][A | |
| [A{'eval_loss': 1.7668417692184448, 'eval_runtime': 53.9832, 'eval_samples_per_second': 3.705, 'eval_steps_per_second': 1.852, 'eval_ppl': 5.85234, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 30.78, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 62%|ββββββββββββββββββββββββββββββββββββ | 400/650 [1:03:04<34:25, 8.26s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.83it/s][A | |
| [A[2026-01-24 14:28:37,265] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-400 | |
| 62%|βββββββββββββββββββββββββββββββββββ | 401/650 [1:03:13<1:42:25, 24.68s/it] 62%|βββββββββββββββββββββββββββββββββββ | 402/650 [1:03:21<1:21:38, 19.75s/it] 62%|βββββββββββββββββββββββββββββββββββ | 403/650 [1:03:27<1:04:39, 15.71s/it] 62%|ββββββββββββββββββββββββββββββββββββ | 404/650 [1:03:37<57:40, 14.07s/it] 62%|βββββββββββββββββββββββββββββββββββββ | 405/650 [1:03:46<50:18, 12.32s/it] {'loss': 2.6584, 'grad_norm': 1.4385159015655518, 'learning_rate': 6.6267001261717015e-06, 'ppl': 14.27343, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 242.539306640625, 'tokens/total': 6508544, 'tokens/trainable': 2943355, 'epoch': 31.16} | |
| 62%|βββββββββββββββββββββββββββββββββββββ | 405/650 [1:03:46<50:18, 12.32s/it] 62%|βββββββββββββββββββββββββββββββββββββ | 406/650 [1:03:54<45:09, 11.10s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 407/650 [1:04:02<41:30, 10.25s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 408/650 [1:04:11<38:54, 9.65s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 409/650 [1:04:19<37:04, 9.23s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 410/650 [1:04:27<35:44, 8.93s/it] {'loss': 2.6745, 'grad_norm': 1.3860982656478882, 'learning_rate': 6.393054450421963e-06, 'ppl': 14.5051, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 227.2841339111328, 'tokens/total': 6590464, 'tokens/trainable': 2980426, 'epoch': 31.55} | |
| 63%|βββββββββββββββββββββββββββββββββββββ | 410/650 [1:04:27<35:44, 8.93s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 411/650 [1:04:35<34:45, 8.73s/it] 63%|βββββββββββββββββββββββββββββββββββββ | 412/650 [1:04:43<34:02, 8.58s/it] 64%|βββββββββββββββββββββββββββββββββββββ | 413/650 [1:04:52<33:30, 8.48s/it] 64%|βββββββββββββββββββββββββββββββββββββ | 414/650 [1:05:00<33:05, 8.41s/it] 64%|βββββββββββββββββββββββββββββββββββββ | 415/650 [1:05:08<32:45, 8.36s/it] {'loss': 2.6757, 'grad_norm': 1.506183385848999, 'learning_rate': 6.1616509818699975e-06, 'ppl': 14.52251, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 237.92295837402344, 'tokens/total': 6672384, 'tokens/trainable': 3017993, 'epoch': 31.94} | |
| 64%|βββββββββββββββββββββββββββββββββββββ | 415/650 [1:05:08<32:45, 8.36s/it] 64%|βββββββββββββββββββββββββββββββββββββ | 416/650 [1:05:14<30:08, 7.73s/it] 64%|ββββββββββββββββββββββββββββββββββββββ | 417/650 [1:05:25<32:47, 8.45s/it] 64%|ββββββββββββββββββββββββββββββββββββββ | 418/650 [1:05:33<32:25, 8.38s/it] 64%|ββββββββββββββββββββββββββββββββββββββ | 419/650 [1:05:41<32:06, 8.34s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 420/650 [1:05:49<31:51, 8.31s/it] {'loss': 2.6824, 'grad_norm': 1.30038321018219, 'learning_rate': 5.932633569242e-06, 'ppl': 14.62014, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 236.77151489257812, 'tokens/total': 6750208, 'tokens/trainable': 3052970, 'epoch': 32.31} | |
| 65%|ββββββββββββββββββββββββββββββββββββββ | 420/650 [1:05:49<31:51, 8.31s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 421/650 [1:05:58<31:39, 8.30s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 422/650 [1:06:06<31:28, 8.28s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 423/650 [1:06:14<31:18, 8.27s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 424/650 [1:06:22<31:08, 8.27s/it] 65%|ββββββββββββββββββββββββββββββββββββββ | 425/650 [1:06:31<30:58, 8.26s/it] {'loss': 2.6681, 'grad_norm': 1.412518858909607, 'learning_rate': 5.706144578005908e-06, 'ppl': 14.41256, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.9322052001953, 'tokens/total': 6832128, 'tokens/trainable': 3089967, 'epoch': 32.71} | |
| 65%|ββββββββββββββββββββββββββββββββββββββ | 425/650 [1:06:31<30:58, 8.26s/it] 66%|ββββββββββββββββββββββββββββββββββββββ | 426/650 [1:06:39<30:49, 8.26s/it] 66%|ββββββββββββββββββββββββββββββββββββββ | 427/650 [1:06:47<30:41, 8.26s/it] 66%|βββββββββββββββββββββββββββββββββββββββ | 428/650 [1:06:55<30:32, 8.25s/it] 66%|βββββββββββββββββββββββββββββββββββββββ | 429/650 [1:07:02<28:10, 7.65s/it] 66%|βββββββββββββββββββββββββββββββββββββββ | 430/650 [1:07:12<30:46, 8.40s/it] {'loss': 2.6753, 'grad_norm': 1.5622631311416626, 'learning_rate': 5.4823248018719184e-06, 'ppl': 14.5167, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.12832641601562, 'tokens/total': 6909952, 'tokens/trainable': 3125039, 'epoch': 33.08} | |
| 66%|βββββββββββββββββββββββββββββββββββββββ | 430/650 [1:07:12<30:46, 8.40s/it] 66%|βββββββββββββββββββββββββββββββββββββββ | 431/650 [1:07:20<30:29, 8.35s/it] 66%|βββββββββββββββββββββββββββββββββββββββ | 432/650 [1:07:28<30:14, 8.32s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 433/650 [1:07:36<30:01, 8.30s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 434/650 [1:07:45<29:49, 8.29s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 435/650 [1:07:53<29:38, 8.27s/it] {'loss': 2.7232, 'grad_norm': 2.0204951763153076, 'learning_rate': 5.2613133752700145e-06, 'ppl': 15.22898, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 220.20835876464844, 'tokens/total': 6991872, 'tokens/trainable': 3162004, 'epoch': 33.47} | |
| 67%|βββββββββββββββββββββββββββββββββββββββ | 435/650 [1:07:53<29:38, 8.27s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 436/650 [1:08:01<29:29, 8.27s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 437/650 [1:08:09<29:19, 8.26s/it] 67%|βββββββββββββββββββββββββββββββββββββββ | 438/650 [1:08:18<29:11, 8.26s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 439/650 [1:08:26<29:02, 8.26s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 440/650 [1:08:34<28:54, 8.26s/it] {'loss': 2.6203, 'grad_norm': 1.6616003513336182, 'learning_rate': 5.043247686859024e-06, 'ppl': 13.73984, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 234.8267364501953, 'tokens/total': 7073792, 'tokens/trainable': 3199441, 'epoch': 33.86} | |
| 68%|ββββββββββββββββββββββββββββββββββββββββ | 440/650 [1:08:34<28:54, 8.26s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 441/650 [1:08:43<28:46, 8.26s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 442/650 [1:08:49<26:33, 7.66s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 443/650 [1:08:59<28:59, 8.40s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 444/650 [1:09:07<28:41, 8.36s/it] 68%|ββββββββββββββββββββββββββββββββββββββββ | 445/650 [1:09:15<28:26, 8.33s/it] {'loss': 2.5927, 'grad_norm': 1.363121509552002, 'learning_rate': 4.8282632941208725e-06, 'ppl': 13.36581, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 225.45587158203125, 'tokens/total': 7151616, 'tokens/trainable': 3234035, 'epoch': 34.24} | |
| 68%|ββββββββββββββββββββββββββββββββββββββββ | 445/650 [1:09:15<28:26, 8.33s/it] 69%|ββββββββββββββββββββββββββββββββββββββββ | 446/650 [1:09:24<28:13, 8.30s/it] 69%|ββββββββββββββββββββββββββββββββββββββββ | 447/650 [1:09:32<28:02, 8.29s/it] 69%|ββββββββββββββββββββββββββββββββββββββββ | 448/650 [1:09:40<27:51, 8.28s/it] 69%|ββββββββββββββββββββββββββββββββββββββββ | 449/650 [1:09:48<27:41, 8.27s/it] 69%|βββββββββββββββββββββββββββββββββββββββββ | 450/650 [1:09:57<27:32, 8.26s/it] {'loss': 2.6843, 'grad_norm': 1.3428306579589844, 'learning_rate': 4.616493839093179e-06, 'ppl': 14.64794, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 217.40847778320312, 'tokens/total': 7233536, 'tokens/trainable': 3271329, 'epoch': 34.63} | |
| 69%|βββββββββββββββββββββββββββββββββββββββββ | 450/650 [1:09:57<27:32, 8.26s/it][2026-01-24 14:35:30,194] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:56, 1.69it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:53, 1.76it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:51, 1.81it/s][A | |
| 8%|βββββ | 8/100 [00:04<00:50, 1.84it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:50, 1.82it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.85it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.87it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.85it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.87it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:45, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:12<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:35, 1.79it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.90it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:31, 1.91it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:21<00:31, 1.91it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.87it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.89it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:29<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:38<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:46<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.758092999458313, 'eval_runtime': 54.0627, 'eval_samples_per_second': 3.699, 'eval_steps_per_second': 1.85, 'eval_ppl': 5.80136, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 34.63, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 69%|βββββββββββββββββββββββββββββββββββββββββ | 450/650 [1:10:51<27:32, 8.26s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A 69%|βββββββββββββββββββββββββββββββββββββββ | 451/650 [1:10:59<1:21:14, 24.49s/it] 70%|βββββββββββββββββββββββββββββββββββββββ | 452/650 [1:11:07<1:04:45, 19.62s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 453/650 [1:11:16<53:13, 16.21s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 454/650 [1:11:24<45:08, 13.82s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 455/650 [1:11:30<37:31, 11.55s/it] {'loss': 2.6419, 'grad_norm': 1.5269215106964111, 'learning_rate': 4.408070965292534e-06, 'ppl': 14.03985, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 233.68751525878906, 'tokens/total': 7311360, 'tokens/trainable': 3306415, 'epoch': 35.0} | |
| 70%|βββββββββββββββββββββββββββββββββββββββββ | 455/650 [1:11:30<37:31, 11.55s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 456/650 [1:11:40<35:56, 11.11s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 457/650 [1:11:48<32:59, 10.26s/it] 70%|βββββββββββββββββββββββββββββββββββββββββ | 458/650 [1:11:57<30:53, 9.65s/it] 71%|βββββββββββββββββββββββββββββββββββββββββ | 459/650 [1:12:05<29:23, 9.23s/it] 71%|βββββββββββββββββββββββββββββββββββββββββ | 460/650 [1:12:13<28:17, 8.94s/it] {'loss': 2.6498, 'grad_norm': 1.5623174905776978, 'learning_rate': 4.203124235880179e-06, 'ppl': 14.15121, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.97872924804688, 'tokens/total': 7393280, 'tokens/trainable': 3343588, 'epoch': 35.39} | |
| 71%|βββββββββββββββββββββββββββββββββββββββββ | 460/650 [1:12:13<28:17, 8.94s/it] 71%|ββββββββββββββββββββββββββββββββββββββββββ | 461/650 [1:12:21<27:29, 8.73s/it] 71%|ββββββββββββββββββββββββββββββββββββββββββ | 462/650 [1:12:30<26:54, 8.59s/it] 71%|ββββββββββββββββββββββββββββββββββββββββββ | 463/650 [1:12:38<26:26, 8.48s/it] 71%|ββββββββββββββββββββββββββββββββββββββββββ | 464/650 [1:12:46<26:04, 8.41s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 465/650 [1:12:54<25:46, 8.36s/it] {'loss': 2.6411, 'grad_norm': 1.2584384679794312, 'learning_rate': 4.001781053120863e-06, 'ppl': 14.02863, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.45700073242188, 'tokens/total': 7475200, 'tokens/trainable': 3381149, 'epoch': 35.78} | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββ | 465/650 [1:12:54<25:46, 8.36s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 466/650 [1:13:03<25:31, 8.32s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 467/650 [1:13:11<25:18, 8.30s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 468/650 [1:13:17<23:18, 7.68s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 469/650 [1:13:27<25:25, 8.43s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 470/650 [1:13:35<25:07, 8.37s/it] {'loss': 2.6171, 'grad_norm': 2.3391544818878174, 'learning_rate': 3.804166579185018e-06, 'ppl': 13.69595, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 235.0704803466797, 'tokens/total': 7553024, 'tokens/trainable': 3415846, 'epoch': 36.16} | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββ | 470/650 [1:13:35<25:07, 8.37s/it] 72%|ββββββββββββββββββββββββββββββββββββββββββ | 471/650 [1:13:44<24:51, 8.33s/it] 73%|ββββββββββββββββββββββββββββββββββββββββββ | 472/650 [1:13:52<24:38, 8.30s/it] 73%|βββββββββββββββββββββββββββββββββββββββββββ | 473/650 [1:14:00<24:27, 8.29s/it] 73%|βββββββββββββββββββββββββββββββββββββββββββ | 474/650 [1:14:08<24:16, 8.27s/it] 73%|βββββββββββββββββββββββββββββββββββββββββββ | 475/650 [1:14:17<24:07, 8.27s/it] {'loss': 2.632, 'grad_norm': 1.43043053150177, 'learning_rate': 3.610403658343443e-06, 'ppl': 13.90155, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 205.83262634277344, 'tokens/total': 7634944, 'tokens/trainable': 3453459, 'epoch': 36.55} | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββ | 475/650 [1:14:17<24:07, 8.27s/it] 73%|βββββββββββββββββββββββββββββββββββββββββββ | 476/650 [1:14:25<23:57, 8.26s/it] 73%|βββββββββββββββββββββββββββββββββββββββββββ | 477/650 [1:14:33<23:48, 8.26s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 478/650 [1:14:41<23:39, 8.25s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 479/650 [1:14:50<23:30, 8.25s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 480/650 [1:14:58<23:21, 8.24s/it] {'loss': 2.6387, 'grad_norm': 1.3542659282684326, 'learning_rate': 3.4206127406028744e-06, 'ppl': 13.995, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 209.72723388671875, 'tokens/total': 7716864, 'tokens/trainable': 3490192, 'epoch': 36.94} | |
| 74%|βββββββββββββββββββββββββββββββββββββββββββ | 480/650 [1:14:58<23:21, 8.24s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 481/650 [1:15:04<21:31, 7.64s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 482/650 [1:15:14<23:28, 8.38s/it] 74%|βββββββββββββββββββββββββββββββββββββββββββ | 483/650 [1:15:23<23:13, 8.34s/it] 74%|ββββββββββββββββββββββββββββββββββββββββββββ | 484/650 [1:15:31<22:59, 8.31s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 485/650 [1:15:39<22:47, 8.29s/it] {'loss': 2.6439, 'grad_norm': 1.367038369178772, 'learning_rate': 3.234911806829948e-06, 'ppl': 14.06796, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 213.08050537109375, 'tokens/total': 7794688, 'tokens/trainable': 3524600, 'epoch': 37.31} | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 485/650 [1:15:39<22:47, 8.29s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 486/650 [1:15:47<22:37, 8.28s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 487/650 [1:15:55<22:27, 8.27s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 488/650 [1:16:04<22:17, 8.26s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 489/650 [1:16:12<22:09, 8.25s/it] 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 490/650 [1:16:20<22:00, 8.25s/it] {'loss': 2.6147, 'grad_norm': 1.3298630714416504, 'learning_rate': 3.0534162954100264e-06, 'ppl': 13.66312, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 223.60806274414062, 'tokens/total': 7876608, 'tokens/trainable': 3561920, 'epoch': 37.71} | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 490/650 [1:16:20<22:00, 8.25s/it] 76%|ββββββββββββββββββββββββββββββββββββββββββββ | 491/650 [1:16:28<21:51, 8.25s/it] 76%|ββββββββββββββββββββββββββββββββββββββββββββ | 492/650 [1:16:37<21:42, 8.24s/it] 76%|ββββββββββββββββββββββββββββββββββββββββββββ | 493/650 [1:16:45<21:34, 8.25s/it] 76%|ββββββββββββββββββββββββββββββββββββββββββββ | 494/650 [1:16:51<19:52, 7.65s/it] 76%|βββββββββββββββββββββββββββββββββββββββββββββ | 495/650 [1:17:01<21:41, 8.40s/it] {'loss': 2.6059, 'grad_norm': 1.354701042175293, 'learning_rate': 2.876239030486554e-06, 'ppl': 13.54341, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.70867919921875, 'tokens/total': 7954432, 'tokens/trainable': 3597435, 'epoch': 38.08} | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββ | 495/650 [1:17:01<21:41, 8.40s/it] 76%|βββββββββββββββββββββββββββββββββββββββββββββ | 496/650 [1:17:10<21:26, 8.35s/it] 76%|βββββββββββββββββββββββββββββββββββββββββββββ | 497/650 [1:17:18<21:12, 8.32s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 498/650 [1:17:26<21:00, 8.29s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 499/650 [1:17:34<20:49, 8.28s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 500/650 [1:17:43<20:39, 8.27s/it] {'loss': 2.6048, 'grad_norm': 1.7885570526123047, 'learning_rate': 2.703490151825492e-06, 'ppl': 13.52852, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.56715393066406, 'tokens/total': 8036352, 'tokens/trainable': 3634384, 'epoch': 38.47} | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 500/650 [1:17:43<20:39, 8.27s/it][2026-01-24 14:43:16,073] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.86it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.72it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.81it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.87it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.87it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.86it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:44, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.89it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.90it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:39, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:35, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:29<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:08, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.85it/s][A | |
| [A{'eval_loss': 1.7533553838729858, 'eval_runtime': 53.9658, 'eval_samples_per_second': 3.706, 'eval_steps_per_second': 1.853, 'eval_ppl': 5.77394, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 38.47, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 500/650 [1:18:37<20:39, 8.27s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.85it/s][A | |
| [A[2026-01-24 14:44:10,048] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-500 | |
| 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 501/650 [1:18:46<1:01:21, 24.71s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 502/650 [1:18:54<48:46, 19.77s/it] 77%|βββββββββββββββββββββββββββββββββββββββββββββ | 503/650 [1:19:02<39:57, 16.31s/it] 78%|βββββββββββββββββββββββββββββββββββββββββββββ | 504/650 [1:19:10<33:48, 13.90s/it] 78%|βββββββββββββββββββββββββββββββββββββββββββββ | 505/650 [1:19:19<29:29, 12.20s/it] {'loss': 2.6406, 'grad_norm': 1.8489106893539429, 'learning_rate': 2.5352770463484986e-06, 'ppl': 14.02161, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 229.4249725341797, 'tokens/total': 8118272, 'tokens/trainable': 3671583, 'epoch': 38.86} | |
| 78%|βββββββββββββββββββββββββββββββββββββββββββββ | 505/650 [1:19:19<29:29, 12.20s/it] 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 506/650 [1:19:27<26:26, 11.02s/it] 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 507/650 [1:19:33<22:51, 9.59s/it] 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 508/650 [1:19:43<23:04, 9.75s/it] 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 509/650 [1:19:52<21:51, 9.30s/it] 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 510/650 [1:19:59<20:19, 8.71s/it] {'loss': 2.629, 'grad_norm': 1.4269434213638306, 'learning_rate': 2.371704281377335e-06, 'ppl': 13.8599, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 256.02679443359375, 'tokens/total': 8196096, 'tokens/trainable': 3706605, 'epoch': 39.24} | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββ | 510/650 [1:19:59<20:19, 8.71s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 511/650 [1:20:07<19:51, 8.57s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 512/650 [1:20:15<19:29, 8.48s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 513/650 [1:20:24<19:11, 8.41s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 514/650 [1:20:32<18:56, 8.36s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 515/650 [1:20:40<18:43, 8.33s/it] {'loss': 2.6052, 'grad_norm': 1.502051830291748, 'learning_rate': 2.2128735396310606e-06, 'ppl': 13.53393, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.72364807128906, 'tokens/total': 8278016, 'tokens/trainable': 3744295, 'epoch': 39.63} | |
| 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 515/650 [1:20:40<18:43, 8.33s/it] 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 516/650 [1:20:48<18:32, 8.30s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 517/650 [1:20:57<18:22, 8.29s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 518/650 [1:21:05<18:12, 8.27s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 519/650 [1:21:13<18:02, 8.26s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 520/650 [1:21:19<16:35, 7.66s/it] {'loss': 2.6347, 'grad_norm': 1.6698100566864014, 'learning_rate': 2.05888355601639e-06, 'ppl': 13.93913, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 208.06512451171875, 'tokens/total': 8355840, 'tokens/trainable': 3778760, 'epoch': 40.0} | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 520/650 [1:21:19<16:35, 7.66s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 521/650 [1:21:29<18:03, 8.40s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 522/650 [1:21:38<17:49, 8.36s/it] 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 523/650 [1:21:46<17:37, 8.33s/it] 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 524/650 [1:21:54<17:26, 8.30s/it] 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 525/650 [1:22:02<17:16, 8.29s/it] {'loss': 2.6084, 'grad_norm': 1.3377642631530762, 'learning_rate': 1.9098300562505266e-06, 'ppl': 13.57731, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.1595916748047, 'tokens/total': 8437760, 'tokens/trainable': 3816182, 'epoch': 40.39} | |
| 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 525/650 [1:22:02<17:16, 8.29s/it] 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 526/650 [1:22:11<17:06, 8.28s/it] 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 527/650 [1:22:19<16:57, 8.28s/it] 81%|βββββββββββββββββββββββββββββββββββββββββββββββ | 528/650 [1:22:27<16:48, 8.27s/it] 81%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 529/650 [1:22:35<16:40, 8.26s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 530/650 [1:22:44<16:31, 8.26s/it] {'loss': 2.6119, 'grad_norm': 1.3317139148712158, 'learning_rate': 1.765805697354608e-06, 'ppl': 13.62491, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 231.95037841796875, 'tokens/total': 8519680, 'tokens/trainable': 3853350, 'epoch': 40.78} | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 530/650 [1:22:44<16:31, 8.26s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 531/650 [1:22:52<16:22, 8.26s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 532/650 [1:23:00<16:14, 8.26s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 533/650 [1:23:06<14:55, 7.65s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 534/650 [1:23:16<15:42, 8.13s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 535/650 [1:23:24<15:38, 8.16s/it] {'loss': 2.6077, 'grad_norm': 1.3852050304412842, 'learning_rate': 1.6269000100547682e-06, 'ppl': 13.56781, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 216.89588928222656, 'tokens/total': 8597504, 'tokens/trainable': 3888209, 'epoch': 41.16} | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 535/650 [1:23:24<15:38, 8.16s/it] 82%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 536/650 [1:23:32<15:33, 8.19s/it] 83%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 537/650 [1:23:40<15:27, 8.21s/it] 83%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 538/650 [1:23:49<15:20, 8.22s/it] 83%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 539/650 [1:23:57<15:13, 8.23s/it] 83%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 540/650 [1:24:05<15:05, 8.24s/it] {'loss': 2.6187, 'grad_norm': 1.2737482786178589, 'learning_rate': 1.4931993431266056e-06, 'ppl': 13.71788, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 240.7917938232422, 'tokens/total': 8679424, 'tokens/trainable': 3925107, 'epoch': 41.55} | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 540/650 [1:24:05<15:05, 8.24s/it] 83%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 541/650 [1:24:13<14:58, 8.24s/it] 83%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 542/650 [1:24:22<14:50, 8.25s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 543/650 [1:24:30<14:42, 8.25s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 544/650 [1:24:38<14:33, 8.24s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 545/650 [1:24:46<14:25, 8.25s/it] {'loss': 2.6447, 'grad_norm': 1.2583863735198975, 'learning_rate': 1.364786809717692e-06, 'ppl': 14.07922, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 229.8025360107422, 'tokens/total': 8761344, 'tokens/trainable': 3962486, 'epoch': 41.94} | |
| 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 545/650 [1:24:46<14:25, 8.25s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 546/650 [1:24:53<13:15, 7.65s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 547/650 [1:25:03<14:25, 8.40s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 548/650 [1:25:11<14:11, 8.35s/it] 84%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 549/650 [1:25:19<14:00, 8.32s/it] 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 550/650 [1:25:28<13:49, 8.30s/it] {'loss': 2.6118, 'grad_norm': 1.8680920600891113, 'learning_rate': 1.2417422356814345e-06, 'ppl': 13.62355, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 198.96746826171875, 'tokens/total': 8839168, 'tokens/trainable': 3997476, 'epoch': 42.31} | |
| 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 550/650 [1:25:28<13:49, 8.30s/it][2026-01-24 14:51:01,152] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.84it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:47, 1.84it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:46, 1.87it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:45, 1.88it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.89it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:21, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:19, 1.89it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.89it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:38<00:14, 1.89it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.85it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.89it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.89it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:09, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.89it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.90it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.7504650354385376, 'eval_runtime': 54.0358, 'eval_samples_per_second': 3.701, 'eval_steps_per_second': 1.851, 'eval_ppl': 5.75728, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 42.31, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 550/650 [1:26:22<13:49, 8.30s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 551/650 [1:26:30<40:26, 24.51s/it] 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 552/650 [1:26:38<32:04, 19.63s/it] 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 553/650 [1:26:46<26:12, 16.22s/it] 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 554/650 [1:26:55<22:07, 13.82s/it] 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 555/650 [1:27:03<19:14, 12.15s/it] {'loss': 2.5915, 'grad_norm': 2.939488410949707, 'learning_rate': 1.124142109954459e-06, 'ppl': 13.34978, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 226.67120361328125, 'tokens/total': 8921088, 'tokens/trainable': 4034699, 'epoch': 42.71} | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 555/650 [1:27:03<19:14, 12.15s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 556/650 [1:27:11<17:12, 10.98s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 557/650 [1:27:19<15:44, 10.16s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 558/650 [1:27:28<14:41, 9.58s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 559/650 [1:27:34<13:00, 8.58s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 560/650 [1:27:44<13:34, 9.05s/it] {'loss': 2.6333, 'grad_norm': 1.4904693365097046, 'learning_rate': 1.012059537008332e-06, 'ppl': 13.91963, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 211.8202667236328, 'tokens/total': 8998912, 'tokens/trainable': 4069449, 'epoch': 43.08} | |
| 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 560/650 [1:27:44<13:34, 9.05s/it] 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 561/650 [1:27:52<13:03, 8.81s/it] 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 562/650 [1:28:01<12:40, 8.64s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 563/650 [1:28:09<12:21, 8.52s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 564/650 [1:28:17<12:05, 8.44s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 565/650 [1:28:25<11:52, 8.38s/it] {'loss': 2.6157, 'grad_norm': 1.5149389505386353, 'learning_rate': 9.055641914051783e-07, 'ppl': 13.67679, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 223.85400390625, 'tokens/total': 9080832, 'tokens/trainable': 4106635, 'epoch': 43.47} | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 565/650 [1:28:25<11:52, 8.38s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 566/650 [1:28:34<11:41, 8.35s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 567/650 [1:28:42<11:30, 8.32s/it] 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 568/650 [1:28:50<11:20, 8.30s/it] 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 569/650 [1:28:58<11:11, 8.29s/it] 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 570/650 [1:29:07<11:02, 8.28s/it] {'loss': 2.607, 'grad_norm': 1.4548736810684204, 'learning_rate': 8.047222744854943e-07, 'ppl': 13.55831, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 230.1149139404297, 'tokens/total': 9162752, 'tokens/trainable': 4144144, 'epoch': 43.86} | |
| 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 570/650 [1:29:07<11:02, 8.28s/it] 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 571/650 [1:29:15<10:53, 8.27s/it] 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 572/650 [1:29:21<09:57, 7.66s/it] 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 573/650 [1:29:31<10:47, 8.41s/it] 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 574/650 [1:29:40<10:35, 8.37s/it] 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 575/650 [1:29:48<10:25, 8.33s/it] {'loss': 2.571, 'grad_norm': 1.3647756576538086, 'learning_rate': 7.095964732149741e-07, 'ppl': 13.0789, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 229.0497283935547, 'tokens/total': 9240576, 'tokens/trainable': 4178948, 'epoch': 44.24} | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 575/650 [1:29:48<10:25, 8.33s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 576/650 [1:29:56<10:15, 8.31s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 577/650 [1:30:04<10:05, 8.29s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 578/650 [1:30:13<09:55, 8.28s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 579/650 [1:30:21<09:46, 8.27s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 580/650 [1:30:29<09:38, 8.26s/it] {'loss': 2.6009, 'grad_norm': 1.5256224870681763, 'learning_rate': 6.202459212160638e-07, 'ppl': 13.47586, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 223.7174835205078, 'tokens/total': 9322496, 'tokens/trainable': 4216437, 'epoch': 44.63} | |
| 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 580/650 [1:30:29<09:38, 8.26s/it] 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 581/650 [1:30:37<09:29, 8.25s/it] 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 582/650 [1:30:45<09:21, 8.25s/it] 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 583/650 [1:30:54<09:12, 8.25s/it] 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 584/650 [1:31:02<09:04, 8.25s/it] 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 585/650 [1:31:08<08:16, 7.64s/it] {'loss': 2.6057, 'grad_norm': 1.5354381799697876, 'learning_rate': 5.367261620083575e-07, 'ppl': 13.5407, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 213.19332885742188, 'tokens/total': 9400320, 'tokens/trainable': 4251105, 'epoch': 45.0} | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 585/650 [1:31:08<08:16, 7.64s/it] 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 586/650 [1:31:18<08:58, 8.41s/it] 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 587/650 [1:31:27<08:46, 8.36s/it] 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 588/650 [1:31:35<08:36, 8.33s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 589/650 [1:31:43<08:26, 8.30s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 590/650 [1:31:51<08:17, 8.28s/it] {'loss': 2.5728, 'grad_norm': 1.498939037322998, 'learning_rate': 4.5908911448075746e-07, 'ppl': 13.10246, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 208.5529327392578, 'tokens/total': 9482240, 'tokens/trainable': 4288708, 'epoch': 45.39} | |
| 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 590/650 [1:31:51<08:17, 8.28s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 591/650 [1:32:00<08:08, 8.28s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 592/650 [1:32:08<07:59, 8.27s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 593/650 [1:32:16<07:50, 8.26s/it] 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 594/650 [1:32:24<07:42, 8.26s/it] 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 595/650 [1:32:33<07:33, 8.25s/it] {'loss': 2.6226, 'grad_norm': 1.6790930032730103, 'learning_rate': 3.8738304061681107e-07, 'ppl': 13.77148, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 224.23695373535156, 'tokens/total': 9564160, 'tokens/trainable': 4325772, 'epoch': 45.78} | |
| 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 595/650 [1:32:33<07:33, 8.25s/it] 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 596/650 [1:32:41<07:25, 8.25s/it] 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 597/650 [1:32:49<07:17, 8.25s/it] 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 598/650 [1:32:55<06:37, 7.65s/it] 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 599/650 [1:33:05<07:07, 8.39s/it] 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 600/650 [1:33:14<06:57, 8.35s/it] {'loss': 2.6024, 'grad_norm': 1.36123788356781, 'learning_rate': 3.2165251549333585e-07, 'ppl': 13.49609, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 226.9879608154297, 'tokens/total': 9641984, 'tokens/trainable': 4360627, 'epoch': 46.16} | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 600/650 [1:33:14<06:57, 8.35s/it][2026-01-24 14:58:47,274] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.85it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.35it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.75it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.80it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.84it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.86it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.83it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.86it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.88it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.89it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:46, 1.85it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.88it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:45, 1.89it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.90it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:20<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:28<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.90it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:20, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:18, 1.90it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:37<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.90it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.90it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:09, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:45<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.89it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.86it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.89it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.89it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:52<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.750334620475769, 'eval_runtime': 54.0532, 'eval_samples_per_second': 3.7, 'eval_steps_per_second': 1.85, 'eval_ppl': 5.75653, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 46.16, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 600/650 [1:34:08<06:57, 8.35s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A[2026-01-24 14:59:41,336] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-600 | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 601/650 [1:34:17<20:15, 24.81s/it] 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 602/650 [1:34:25<15:52, 19.85s/it] 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 603/650 [1:34:33<12:49, 16.37s/it] 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 604/650 [1:34:42<10:40, 13.93s/it] 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 605/650 [1:34:49<08:57, 11.95s/it] {'loss': 2.6019, 'grad_norm': 1.2960628271102905, 'learning_rate': 2.6193839957093683e-07, 'ppl': 13.48934, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 255.81251525878906, 'tokens/total': 9723904, 'tokens/trainable': 4397575, 'epoch': 46.55} | |
| 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 605/650 [1:34:49<08:57, 11.95s/it] 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 606/650 [1:34:57<07:57, 10.84s/it] 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 607/650 [1:35:06<07:13, 10.07s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 608/650 [1:35:14<06:40, 9.53s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 609/650 [1:35:22<06:15, 9.15s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 610/650 [1:35:30<05:55, 8.88s/it] {'loss': 2.6131, 'grad_norm': 1.4798760414123535, 'learning_rate': 2.082778132936858e-07, 'ppl': 13.64127, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 211.33847045898438, 'tokens/total': 9805824, 'tokens/trainable': 4435252, 'epoch': 46.94} | |
| 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 610/650 [1:35:30<05:55, 8.88s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 611/650 [1:35:37<05:15, 8.10s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 612/650 [1:35:47<05:32, 8.76s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 613/650 [1:35:55<05:18, 8.60s/it] 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 614/650 [1:36:03<05:05, 8.50s/it] 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 615/650 [1:36:12<04:54, 8.42s/it] {'loss': 2.6039, 'grad_norm': 1.4365811347961426, 'learning_rate': 1.6070411401370335e-07, 'ppl': 13.51635, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 219.77572631835938, 'tokens/total': 9883648, 'tokens/trainable': 4469887, 'epoch': 47.31} | |
| 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 615/650 [1:36:12<04:54, 8.42s/it] 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 616/650 [1:36:20<04:44, 8.37s/it] 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 617/650 [1:36:28<04:35, 8.33s/it] 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 618/650 [1:36:36<04:25, 8.31s/it] 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 619/650 [1:36:45<04:17, 8.29s/it] 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 620/650 [1:36:53<04:08, 8.29s/it] {'loss': 2.5701, 'grad_norm': 1.370497226715088, 'learning_rate': 1.192468752550402e-07, 'ppl': 13.06713, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 240.24427795410156, 'tokens/total': 9965568, 'tokens/trainable': 4507458, 'epoch': 47.71} | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 620/650 [1:36:53<04:08, 8.29s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 621/650 [1:37:01<03:59, 8.27s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 622/650 [1:37:09<03:51, 8.27s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 623/650 [1:37:18<03:43, 8.26s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 624/650 [1:37:24<03:19, 7.66s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 625/650 [1:37:34<03:29, 8.38s/it] {'loss': 2.5994, 'grad_norm': 1.9178601503372192, 'learning_rate': 8.393186832969746e-08, 'ppl': 13.45566, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 219.11024475097656, 'tokens/total': 10043392, 'tokens/trainable': 4541957, 'epoch': 48.08} | |
| 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 625/650 [1:37:34<03:29, 8.38s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 626/650 [1:37:42<03:20, 8.34s/it] 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 627/650 [1:37:50<03:11, 8.31s/it] 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 628/650 [1:37:59<03:02, 8.29s/it] 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 629/650 [1:38:07<02:53, 8.28s/it] 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 630/650 [1:38:15<02:45, 8.27s/it] {'loss': 2.6103, 'grad_norm': 1.330645203590393, 'learning_rate': 5.4781046317267103e-08, 'ppl': 13.60313, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 238.9775848388672, 'tokens/total': 10125312, 'tokens/trainable': 4578931, 'epoch': 48.47} | |
| 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 630/650 [1:38:15<02:45, 8.27s/it] 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 631/650 [1:38:23<02:37, 8.26s/it] 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 632/650 [1:38:32<02:28, 8.26s/it] 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 633/650 [1:38:40<02:20, 8.25s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 634/650 [1:38:48<02:12, 8.25s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 635/650 [1:38:56<02:03, 8.25s/it] {'loss': 2.6172, 'grad_norm': 1.4630374908447266, 'learning_rate': 3.181253041809052e-08, 'ppl': 13.69732, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 236.7567596435547, 'tokens/total': 10207232, 'tokens/trainable': 4616278, 'epoch': 48.86} | |
| 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 635/650 [1:38:56<02:03, 8.25s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 636/650 [1:39:05<01:55, 8.25s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 637/650 [1:39:11<01:39, 7.65s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 638/650 [1:39:21<01:40, 8.39s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 639/650 [1:39:29<01:31, 8.34s/it] 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 640/650 [1:39:38<01:23, 8.31s/it] {'loss': 2.5952, 'grad_norm': 1.9492207765579224, 'learning_rate': 1.5040598688482732e-08, 'ppl': 13.39927, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 232.969970703125, 'tokens/total': 10285056, 'tokens/trainable': 4651282, 'epoch': 49.24} | |
| 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 640/650 [1:39:38<01:23, 8.31s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 641/650 [1:39:46<01:14, 8.30s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 642/650 [1:39:54<01:06, 8.28s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 643/650 [1:40:02<00:57, 8.28s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 644/650 [1:40:11<00:49, 8.27s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 645/650 [1:40:19<00:41, 8.27s/it] {'loss': 2.6223, 'grad_norm': 1.9795691967010498, 'learning_rate': 4.475677164966774e-09, 'ppl': 13.76735, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 218.2692413330078, 'tokens/total': 10366976, 'tokens/trainable': 4688745, 'epoch': 49.63} | |
| 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 645/650 [1:40:19<00:41, 8.27s/it] 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 646/650 [1:40:27<00:33, 8.27s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 647/650 [1:40:35<00:24, 8.27s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 648/650 [1:40:44<00:16, 8.26s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 649/650 [1:40:52<00:08, 8.27s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 650/650 [1:40:58<00:00, 7.68s/it] {'loss': 2.5727, 'grad_norm': 1.5955945253372192, 'learning_rate': 1.2433338308137645e-10, 'ppl': 13.10115, 'memory/max_active (GiB)': 30.06, 'memory/max_allocated (GiB)': 30.06, 'memory/device_reserved (GiB)': 31.79, 'tokens/train_per_sec_per_gpu': 236.20530700683594, 'tokens/total': 10444800, 'tokens/trainable': 4723450, 'epoch': 50.0} | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 650/650 [1:40:58<00:00, 7.68s/it][2026-01-24 15:06:31,727] [INFO] [axolotl.core.trainers.base.evaluate:400] [PID:9359] Running evaluation step... | |
| 0%| | 0/100 [00:00<?, ?it/s][A | |
| 2%|ββ | 2/100 [00:00<00:25, 3.84it/s][A | |
| 3%|ββ | 3/100 [00:01<00:35, 2.71it/s][A | |
| 4%|βββ | 4/100 [00:01<00:40, 2.34it/s][A | |
| 5%|βββ | 5/100 [00:02<00:54, 1.74it/s][A | |
| 6%|ββββ | 6/100 [00:02<00:52, 1.79it/s][A | |
| 7%|βββββ | 7/100 [00:03<00:50, 1.83it/s][A | |
| 8%|βββββ | 8/100 [00:03<00:49, 1.85it/s][A | |
| 9%|ββββββ | 9/100 [00:04<00:49, 1.82it/s][A | |
| 10%|ββββββ | 10/100 [00:05<00:48, 1.85it/s][A | |
| 11%|βββββββ | 11/100 [00:05<00:47, 1.87it/s][A | |
| 12%|ββββββββ | 12/100 [00:06<00:46, 1.88it/s][A | |
| 13%|ββββββββ | 13/100 [00:06<00:47, 1.85it/s][A | |
| 14%|βββββββββ | 14/100 [00:07<00:45, 1.87it/s][A | |
| 15%|ββββββββββ | 15/100 [00:07<00:45, 1.88it/s][A | |
| 16%|ββββββββββ | 16/100 [00:08<00:44, 1.89it/s][A | |
| 17%|βββββββββββ | 17/100 [00:08<00:44, 1.86it/s][A | |
| 18%|βββββββββββ | 18/100 [00:09<00:43, 1.88it/s][A | |
| 19%|ββββββββββββ | 19/100 [00:09<00:42, 1.89it/s][A | |
| 20%|βββββββββββββ | 20/100 [00:10<00:42, 1.90it/s][A | |
| 21%|βββββββββββββ | 21/100 [00:10<00:42, 1.86it/s][A | |
| 22%|ββββββββββββββ | 22/100 [00:11<00:41, 1.88it/s][A | |
| 23%|ββββββββββββββ | 23/100 [00:11<00:40, 1.89it/s][A | |
| 24%|βββββββββββββββ | 24/100 [00:12<00:40, 1.90it/s][A | |
| 25%|ββββββββββββββββ | 25/100 [00:13<00:40, 1.86it/s][A | |
| 26%|ββββββββββββββββ | 26/100 [00:13<00:39, 1.88it/s][A | |
| 27%|βββββββββββββββββ | 27/100 [00:14<00:38, 1.89it/s][A | |
| 28%|βββββββββββββββββ | 28/100 [00:14<00:37, 1.90it/s][A | |
| 29%|ββββββββββββββββββ | 29/100 [00:15<00:38, 1.86it/s][A | |
| 30%|βββββββββββββββββββ | 30/100 [00:15<00:37, 1.88it/s][A | |
| 31%|βββββββββββββββββββ | 31/100 [00:16<00:36, 1.89it/s][A | |
| 32%|ββββββββββββββββββββ | 32/100 [00:16<00:35, 1.90it/s][A | |
| 33%|βββββββββββββββββββββ | 33/100 [00:17<00:36, 1.86it/s][A | |
| 34%|βββββββββββββββββββββ | 34/100 [00:17<00:35, 1.88it/s][A | |
| 35%|ββββββββββββββββββββββ | 35/100 [00:18<00:34, 1.89it/s][A | |
| 36%|ββββββββββββββββββββββ | 36/100 [00:18<00:33, 1.90it/s][A | |
| 37%|βββββββββββββββββββββββ | 37/100 [00:19<00:33, 1.86it/s][A | |
| 38%|ββββββββββββββββββββββββ | 38/100 [00:19<00:32, 1.88it/s][A | |
| 39%|ββββββββββββββββββββββββ | 39/100 [00:20<00:32, 1.89it/s][A | |
| 40%|βββββββββββββββββββββββββ | 40/100 [00:21<00:31, 1.90it/s][A | |
| 41%|βββββββββββββββββββββββββ | 41/100 [00:21<00:31, 1.86it/s][A | |
| 42%|ββββββββββββββββββββββββββ | 42/100 [00:22<00:30, 1.88it/s][A | |
| 43%|βββββββββββββββββββββββββββ | 43/100 [00:22<00:30, 1.89it/s][A | |
| 44%|βββββββββββββββββββββββββββ | 44/100 [00:23<00:29, 1.90it/s][A | |
| 45%|ββββββββββββββββββββββββββββ | 45/100 [00:23<00:29, 1.86it/s][A | |
| 46%|ββββββββββββββββββββββββββββ | 46/100 [00:24<00:28, 1.88it/s][A | |
| 47%|βββββββββββββββββββββββββββββ | 47/100 [00:24<00:28, 1.89it/s][A | |
| 48%|ββββββββββββββββββββββββββββββ | 48/100 [00:25<00:27, 1.90it/s][A | |
| 49%|ββββββββββββββββββββββββββββββ | 49/100 [00:25<00:27, 1.86it/s][A | |
| 50%|βββββββββββββββββββββββββββββββ | 50/100 [00:26<00:26, 1.88it/s][A | |
| 51%|βββββββββββββββββββββββββββββββ | 51/100 [00:26<00:25, 1.89it/s][A | |
| 52%|ββββββββββββββββββββββββββββββββ | 52/100 [00:27<00:25, 1.90it/s][A | |
| 53%|βββββββββββββββββββββββββββββββββ | 53/100 [00:27<00:25, 1.86it/s][A | |
| 54%|βββββββββββββββββββββββββββββββββ | 54/100 [00:28<00:24, 1.88it/s][A | |
| 55%|ββββββββββββββββββββββββββββββββββ | 55/100 [00:29<00:23, 1.89it/s][A | |
| 56%|βββββββββββββββββββββββββββββββββββ | 56/100 [00:29<00:23, 1.90it/s][A | |
| 57%|βββββββββββββββββββββββββββββββββββ | 57/100 [00:30<00:23, 1.86it/s][A | |
| 58%|ββββββββββββββββββββββββββββββββββββ | 58/100 [00:30<00:22, 1.88it/s][A | |
| 59%|ββββββββββββββββββββββββββββββββββββ | 59/100 [00:31<00:21, 1.89it/s][A | |
| 60%|βββββββββββββββββββββββββββββββββββββ | 60/100 [00:31<00:21, 1.89it/s][A | |
| 61%|ββββββββββββββββββββββββββββββββββββββ | 61/100 [00:32<00:21, 1.86it/s][A | |
| 62%|ββββββββββββββββββββββββββββββββββββββ | 62/100 [00:32<00:20, 1.88it/s][A | |
| 63%|βββββββββββββββββββββββββββββββββββββββ | 63/100 [00:33<00:19, 1.89it/s][A | |
| 64%|βββββββββββββββββββββββββββββββββββββββ | 64/100 [00:33<00:19, 1.89it/s][A | |
| 65%|ββββββββββββββββββββββββββββββββββββββββ | 65/100 [00:34<00:18, 1.86it/s][A | |
| 66%|βββββββββββββββββββββββββββββββββββββββββ | 66/100 [00:34<00:18, 1.88it/s][A | |
| 67%|βββββββββββββββββββββββββββββββββββββββββ | 67/100 [00:35<00:17, 1.89it/s][A | |
| 68%|ββββββββββββββββββββββββββββββββββββββββββ | 68/100 [00:35<00:16, 1.90it/s][A | |
| 69%|ββββββββββββββββββββββββββββββββββββββββββ | 69/100 [00:36<00:16, 1.86it/s][A | |
| 70%|βββββββββββββββββββββββββββββββββββββββββββ | 70/100 [00:36<00:15, 1.88it/s][A | |
| 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 71/100 [00:37<00:15, 1.89it/s][A | |
| 72%|ββββββββββββββββββββββββββββββββββββββββββββ | 72/100 [00:38<00:14, 1.90it/s][A | |
| 73%|βββββββββββββββββββββββββββββββββββββββββββββ | 73/100 [00:38<00:14, 1.86it/s][A | |
| 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 74/100 [00:39<00:13, 1.88it/s][A | |
| 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 75/100 [00:39<00:13, 1.89it/s][A | |
| 76%|βββββββββββββββββββββββββββββββββββββββββββββββ | 76/100 [00:40<00:12, 1.89it/s][A | |
| 77%|βββββββββββββββββββββββββββββββββββββββββββββββ | 77/100 [00:40<00:12, 1.86it/s][A | |
| 78%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 78/100 [00:41<00:11, 1.88it/s][A | |
| 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 79/100 [00:41<00:11, 1.89it/s][A | |
| 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 80/100 [00:42<00:10, 1.89it/s][A | |
| 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 81/100 [00:42<00:10, 1.86it/s][A | |
| 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 82/100 [00:43<00:09, 1.88it/s][A | |
| 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 83/100 [00:43<00:09, 1.89it/s][A | |
| 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84/100 [00:44<00:08, 1.90it/s][A | |
| 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85/100 [00:44<00:08, 1.86it/s][A | |
| 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86/100 [00:45<00:07, 1.88it/s][A | |
| 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87/100 [00:46<00:06, 1.89it/s][A | |
| 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88/100 [00:46<00:06, 1.90it/s][A | |
| 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/100 [00:47<00:05, 1.85it/s][A | |
| 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90/100 [00:47<00:05, 1.88it/s][A | |
| 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91/100 [00:48<00:04, 1.89it/s][A | |
| 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92/100 [00:48<00:04, 1.90it/s][A | |
| 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93/100 [00:49<00:03, 1.86it/s][A | |
| 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94/100 [00:49<00:03, 1.88it/s][A | |
| 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95/100 [00:50<00:02, 1.89it/s][A | |
| 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96/100 [00:50<00:02, 1.89it/s][A | |
| 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97/100 [00:51<00:01, 1.86it/s][A | |
| 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98/100 [00:51<00:01, 1.88it/s][A | |
| 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 99/100 [00:52<00:00, 1.89it/s][A | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A{'eval_loss': 1.7479416131973267, 'eval_runtime': 54.1236, 'eval_samples_per_second': 3.695, 'eval_steps_per_second': 1.848, 'eval_ppl': 5.74277, 'memory/max_active (GiB)': 14.84, 'memory/max_allocated (GiB)': 14.84, 'memory/device_reserved (GiB)': 31.79, 'epoch': 50.0, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 650/650 [1:41:52<00:00, 7.68s/it] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:53<00:00, 1.82it/s][A | |
| [A[2026-01-24 15:07:25,861] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-650 | |
| {'train_runtime': 6116.7339, 'train_samples_per_second': 0.85, 'train_steps_per_second': 0.106, 'train_loss': 3.3152668556800253, 'memory/max_active (GiB)': 7.18, 'memory/max_allocated (GiB)': 7.18, 'memory/device_reserved (GiB)': 7.19, 'epoch': 50.0, 'tokens/train_per_sec_per_gpu': 0.0} | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 650/650 [1:41:53<00:00, 7.68s/it] 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 650/650 [1:41:53<00:00, 9.41s/it] | |
| [2026-01-24 15:07:34,109] [INFO] [axolotl.train.save_trained_model:233] [PID:9359] Training completed! Saving trained model to ./phi4_african_history_lora_out. | |
| [2026-01-24 15:07:34,468] [INFO] [axolotl.train.save_trained_model:351] [PID:9359] Model successfully saved to ./phi4_african_history_lora_out | |
| [2026-01-24 15:07:34,702] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out | |
| Processing Files (0 / 0) : | | 0.00B / 0.00B | |
| New Data Upload : | | 0.00B / 0.00B [A | |
| ...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A | |
| ...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A | |
| ...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A | |
| ...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A | |
| ...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A | |
| ...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A Processing Files (3 / 3) : 100%|βββββββββββββββββββββββββββββββββ| 21.8MB / 21.8MB, ???B/s | |
| ...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A | |
| ...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A | |
| ...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A | |
| ...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A | |
| ...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A | |
| ...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A Processing Files (3 / 3) : 100%|βββββββββββββββββββββββββββββββββ| 21.8MB / 21.8MB, 0.00B/s | |
| New Data Upload : | | 0.00B / 0.00B, 0.00B/s | |
| ...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB | |
| ...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB | |
| ...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB |