architecture
updated
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing
Paper
• 2312.05605
• Published
• 3
VMamba: Visual State Space Model
Paper
• 2401.10166
• Published
• 40
Rethinking Patch Dependence for Masked Autoencoders
Paper
• 2401.14391
• Published
• 26
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
• 2401.14404
• Published
• 18
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D
Generation
Paper
• 2403.12019
• Published
• 10
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
• 2404.02905
• Published
• 74
On the Scalability of Diffusion-based Text-to-Image Generation
Paper
• 2404.02883
• Published
• 19
ViTAR: Vision Transformer with Any Resolution
Paper
• 2403.18361
• Published
• 55
When Do We Not Need Larger Vision Models?
Paper
• 2403.13043
• Published
• 26
Paper
• 2405.18407
• Published
• 48
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
• 2406.09415
• Published
• 51
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
• 2407.08083
• Published
• 32
FAN: Fourier Analysis Networks
Paper
• 2410.02675
• Published
• 29
Paper
• 2410.05258
• Published
• 180
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
• 2506.14603
• Published
• 19