| Building profile model... |
| Device: cuda |
| Batch: 256, Dim: 256, Anchors: 256, Comp: 8Γ64 |
| Parameters: 4,334,244 |
|
|
| ================================================================================ |
| SECTION 1: FORWARD PASS COMPONENTS |
| ================================================================================ |
|
|
|
|
| ================================================================================ |
| SECTION 2: INDIVIDUAL LOSS TERMS (forward only) |
| ================================================================================ |
|
|
|
|
| ================================================================================ |
| SECTION 3: CV LOSS β OLD SEQUENTIAL vs BATCHED |
| ================================================================================ |
|
|
|
|
| ================================================================================ |
| SECTION 4: BACKWARD COSTS (forward + backward) |
| ================================================================================ |
|
|
|
|
|
|
| ================================================================================ |
| FULL TIMING REPORT (sorted by cost) |
| ================================================================================ |
|
|
| CV metric OLD n=200 117.339ms ββββββββββββββββββββββββββββββββββββββββ 17.3% |
| CV OLD n=200 117.218ms ββββββββββββββββββββββββββββββββββββββββ 17.3% |
| CV OLD n=128 75.116ms ββββββββββββββββββββββββββββββββββββββββ 11.1% |
| fwd+bwd NCE_pw 71.622ms ββββββββββββββββββββββββββββββββββββββββ 10.5% |
| fwd+bwd NCE_emb 70.474ms ββββββββββββββββββββββββββββββββββββββββ 10.4% |
| fwd+bwd CV old 48.412ms ββββββββββββββββββββββββββββββββββββββββ 7.1% |
| CV OLD n=64 36.944ms ββββββββββββββββββββββββββββββββββββββββ 5.4% |
| fwd+bwd CE 35.398ms ββββββββββββββββββββββββββββββββββββββββ 5.2% |
| fwd+bwd Bridge 35.372ms ββββββββββββββββββββββββββββββββββββββββ 5.2% |
| FULL forward (both views) 22.355ms ββββββββββββββββββββββββββββββββββββββββ 3.3% |
| CV OLD n=32 20.031ms ββββββββββββββββββββββββββββββββββββββββ 2.9% |
| fwd+bwd CV batch 11.265ms ββββββββββββββββββββββββββββββββββββββββ 1.7% |
| encoder(v1) 10.891ms ββββββββββββββββββββββββββββββββββββββββ 1.6% |
| patchwork(tri) 1.022ms ββββββββββββββββββββββββββββββββββββββββ 0.2% |
| CV BATCH n=128 0.847ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| CV metric BATCH n=200 0.830ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| CV BATCH n=32 0.818ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| CV BATCH n=200 0.814ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| CV BATCH n=64 0.811ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| Spread (AΓA + relu) 0.365ms ββββββββββββββββββββββββββββββββββββββββ 0.1% |
| NCE_pw (norm + BΓB + CE) 0.245ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| task_head(feat) 0.243ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| NCE_tri (norm + BΓB + CE) 0.240ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| kNN (BΓB + argmax) 0.170ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| Assign BCE 0.139ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| NCE_assign (BΓB + CE) 0.102ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| NCE_emb (BΓB + CE) 0.099ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| Bridge (soft CE) 0.082ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| Attraction (max + mean) 0.068ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| bridge(pw) 0.058ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| triangulation (emb@A.T) 0.040ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| soft_assign (softmax) 0.037ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| CE (cross_entropy) 0.033ms ββββββββββββββββββββββββββββββββββββββββ 0.0% |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| SUM 679.501ms |
|
|
| ================================================================================ |
| CV SPEEDUP SUMMARY |
| ================================================================================ |
| n= 128: 75.12ms β 0.85ms (88.6x speedup) |
| n= 200: 117.34ms β 0.83ms (141.3x speedup) |
| n= 32: 20.03ms β 0.82ms (24.5x speedup) |
| n= 64: 36.94ms β 0.81ms (45.6x speedup) |
|
|
| ================================================================================ |
| PER-STEP ESTIMATE |
| ================================================================================ |
| Forward (both views): 22.35ms |
| fwd+bwd CE: 35.40ms |
| fwd+bwd CV (old): 48.41ms |
| fwd+bwd CV (batched): 11.26ms |
| CV savings per step: 37.15ms (77%) |
|
|
| ================================================================================ |
| PROFILING COMPLETE |
| ================================================================================ |