Deployment is taking something from "working in my notebook" to "working in production serving millions of users." This is much harder than it sounds. Development and production are entirely different worlds.
In development, you might have one model running on your laptop. In production, you might have 100 model instances distributed across multiple regions, serving thousands of requests per second, with high availability requirements.
A simple deployment might be: upload model file to a server, run inference server, expose an API. That works for small scale. For production scale, you need: load balancing (distributing requests across multiple instances), auto-scaling (adding instances when traffic is high, removing when it's low), health checks (detecting broken instances), monitoring (understanding what's happening), logging (recording all activity), caching (avoiding redundant computation), fallbacks (what happens if the primary instance fails?).
Blue-green deployments are common. You have two production environments: blue (current), green (new). You deploy to green, test it, then switch traffic to green. If something goes wrong, you can quickly switch back to blue.
Canary deployments (gradually roll out to a small percentage of traffic, monitor closely, then increase) are safer than big-bang deployments. You can catch problems with 1% of traffic before affecting 100% of users.
Model versioning is essential. You might have Model v1 and Model v2 running simultaneously. Some users get v1, others get v2. You compare performance and migrate traffic gradually. This enables safe rollouts and quick rollback if needed.
Configuration management: your model might have multiple configurations (different hyperparameters, different data sources). Changing a configuration might affect performance. Configurations should be version-controlled and tested before deployment.
Database and infrastructure: the model isn't just code. It might need specific database structures, specific amounts of memory, specific GPU hardware. Infrastructure as code (defining infrastructure in version-controlled files) is increasingly standard.
Testing before deployment is essential. You test with realistic data and realistic load. If you test with 10 requests per second and production receives 10,000 per second, you'll discover problems.
Hotpatching (fixing production without shutting down) is important for high-availability systems. You can't just stop production for an hour to deploy a fix; you need mechanisms to swap components without interrupting service.
Observability is crucial. After deployment, you need to know immediately if something goes wrong. Is latency degrading? Are errors increasing? Are models hallucinating more? Dashboards and alerting are essential.
Rollback capability is critical. If a deployment breaks production, you need to quickly revert to the previous version. This should be fast and automatic (if error rate spikes, automatically rollback).
Why It Matters
A great model that doesn't deploy successfully is useless. Deployment is the bridge between research and impact. Organizations with strong deployment infrastructure can move fast and deploy safely.
Example
A company deploys a new recommendation model: they use blue-green deployment (old model on blue, new model on green), route 5% of traffic to green, monitor closely for 4 hours, see that green performs 3% better, gradually increase traffic (10%, 25%, 50%, 100% over 2 days), finally shut down blue. If they had discovered performance degradation, they'd quickly switch traffic back to blue.