Strategies for Speed -> Faster Inference
An introduction to architecture-lead AI product experience
Nothing is more annoying than a laggy user interface. Is it loading, it is thinking, is it broken? As a user, you usually can’t tell the difference.
Pro tip: You do not want to get in the way of your users and the problem they are trying to solve. Indeed, nothing ever loads above.
Especially in the early stages of your product journey, it makes sense to spend some thought on your future architecture.
To make this road easier for you, here are a couple of ideas that might help you get your product experience up to speed (quite literally). I will be using this post as an introductory overview of what can be done rather than a technical or architectural deep dive. I plan to cover these later in my tech deep dives.
Cloud platforms
While VPS providers like Linode or on-premise servers are tempting to use, cloud-based platforms offer more computing power and resources. Amazon AWS, Azure, and Google Cloud Platform (GCP) are popular providers.
AWS, GCP, and Azure are all leading cloud providers that offer a variety of services for running generative AI models in production.
Let’s compare them