~aniket

Recent Notes

"Unbenannt"
Jun 17, 2026
autoencoders
Jun 17, 2026
orbis
Jun 17, 2026
werden
Jun 16, 2026
AlexNet
Jun 16, 2026
ADAM
Jun 10, 2026
hyperparam tuning using wandb
Jun 10, 2026
preventing overfitting in nn
Jun 10, 2026
transfer learning
Jun 10, 2026
Building a simple deep nn with 2 hidden layers
Jun 09, 2026
image classification with a simple nn
Jun 06, 2026
UNet
Jun 03, 2026
Point transformers
Jun 01, 2026
Deep Q-Learning
May 31, 2026
Detection transformer
May 30, 2026
filters and convolutions
May 29, 2026
Open AI CLIP
May 28, 2026
swin transformer
May 26, 2026
vision transformers
May 24, 2026
🏠 home
Apr 20, 2026
tf files and directories
Apr 20, 2026
terraform variables
Apr 19, 2026
tf statefile management with az-storage
Apr 16, 2026
az storage using tf
Apr 15, 2026
terraform providers
Apr 14, 2026
what is infra as code?
Apr 14, 2026
docker compose
Apr 09, 2026
langchain
Apr 03, 2026
configmaps and secrets
Mar 30, 2026
all about k8s ingress!
Mar 30, 2026
RBAC
Mar 29, 2026
k8s monitoring
Mar 29, 2026
K8s interview questions
Mar 25, 2026
ansible
Mar 24, 2026
k8s architecture
Mar 22, 2026
docker networking
Mar 20, 2026
docker storage - bind mounts and volumes
Mar 20, 2026
multi-stage builds in docker
Mar 20, 2026
archives in linux
Mar 20, 2026
What are containers?
Mar 18, 2026
container runtimes
Mar 18, 2026
Learning CICD with Azure DevOps
Mar 13, 2026
what are neural networks?
Oct 23, 2025
What is Pytorch?
Oct 23, 2025
what is machine learning?
Oct 21, 2025
what and why ml?
Oct 21, 2025
what is terraform?
Oct 03, 2025
dockerfile
Oct 02, 2025
dockerhub
Oct 02, 2025
terraform with aws
Oct 02, 2025

👉

👉

research papers

Jun 17, 20262 min read

orbis
flow-matching

Orbis - Overcoming challenge of Long-Horizon prediction in Driving World Models

→ existing problems:

existing world models issues - Vista, GEM, or Cosmos
bad at turns
discrete (better at long rollout but images lose res) ex- GAIA-1 vs continous (better res image but struggles with long rollouts)

→ Orbis MVP

imagines and generates realistic video of the next frames
long-horizon generation even in turns and chaotic moments
favors continuous math over discrete tokens
single-cam video and basic driving actions - steering, speed
469M params only
trained on only 280 h of video

→ discrete vs continuous

continuous - images are unbroken numerical values - diffusion models, flow matching
discrete - image is broken into patches and each patch is a token

→ hybrid tokenizer

high res video frame → encoder → latent space
latent space → quantizer (turns data into discrete ids) → Discrete models
latent space → continuous vector (keeps data fluid) → flow matching models

architecture:

initial real frame → encoder → latent rep
pure noise → flow matching (gets the latent rep of the real frame + driving actions - steering, throttle, brake) → next frame latent rep
next frame latent → decoder → pixels
failiure mode of the model
categories of objects in anomaly detection
quantify different failure modes
OOD datasets
failed2drive
syntehtic eval
real - DOTA, search more ds
eval - rollout, based on classifier, knn, linear probe
15-30 frames

Links:

202606171509

Graph View

Orbis - Overcoming challenge of Long-Horizon prediction in Driving World Models
Links:

GitHub
Linkedin
Twitter
Portfolio