PixelModel πΌοΈ
A neural network where the weights are the image.
π§ͺ Dataset vs Outputs
Ground truth dataset images compared with generated outputs.
| Red | Green | Blue |
|---|---|---|
|
dataset
output
|
dataset
output
|
dataset
output
|
| White | Yellow | Dark |
|
dataset
output
|
dataset
output
|
dataset
output
|
What is this?
model.png is not a picture of anything β it is the model.
Every pixel's RGB values encode neural network weights:
- R channel β weight magnitude
- B channel β weight sign (β₯128 = positive)
- G channel β bias values
At inference, pixels are parsed into 3 weight matrices forming a tiny MLP. The prompt is embedded into a vector, then a forward pass generates a 32Γ32 image. Training directly optimizes pixel values via gradient descent until the PNG itself becomes the model.
π Files
model.png β THE MODEL (64Γ3200 px) main.py β inference train.py β training model.py β architecture dataset/ β training data cat.png cat.txt β prompt: "a cat" ...
βοΈ Usage
Train
python train.py python train.py --epochs 500 --lr 0.05
Generate
python main.py "red" python main.py "a cat" --out cat_out.png --scale 8
--scale 8 upscales 32Γ32 β 256Γ256 using nearest-neighbour interpolation.
π§ Architecture
prompt string β char-level embedding β 32-dim vector β W1 (64Γ32) β tanh β W2 (64Γ64) β tanh β W3 (3072Γ64) β sigmoid β reshape β 32Γ32Γ3 image
All weights live inside model.png. Opening the PNG is literally opening the neural network.
π Dataset Tips
- 6β20 image-prompt pairs is enough
- Simple targets converge fastest (solid colors, gradients, shapes)
- 200β500 epochs typically sufficient
- Loss below 0.001 is good for simple datasets
- Model capacity is fixed (~600K implicit parameters)
It's a toy. It's not useful. But it's cool that it works.
Seton Labs Β· Coordinate Β· Evaluate Β· Upgrade