<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>7-Days-of-Ai-Platform-Engineering on @milinddethe15</title><link>https://milinddethe15.tech/tags/7-days-of-ai-platform-engineering/</link><description>Recent content in 7-Days-of-Ai-Platform-Engineering on @milinddethe15</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 10 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://milinddethe15.tech/tags/7-days-of-ai-platform-engineering/index.xml" rel="self" type="application/rss+xml"/><item><title>Day 7: What Does a Complete AI Platform Actually Look Like?</title><link>https://milinddethe15.tech/posts/day-7-ai-platform-engineering/</link><pubDate>Wed, 10 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-7-ai-platform-engineering/</guid><description>Over the last 6 days, I&amp;rsquo;ve learned about:
GPUs Kubernetes GPU Scheduling Ray vLLM Each solved a specific problem.
But I still had a question:
How do all these pieces fit together in a real AI platform?
The answer became much clearer when I stopped looking at individual tools and started looking at the entire system.
Layer 1: AI Applications Link to heading At the top are the products users interact with.</description></item><item><title>Day 6: Why vLLM Exists</title><link>https://milinddethe15.tech/posts/day-6-ai-platform-engineering/</link><pubDate>Tue, 09 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-6-ai-platform-engineering/</guid><description>Over the last few days, I&amp;rsquo;ve learned how AI platforms manage GPUs:
Kubernetes discovers GPUs using Device Plugins. GPU utilization is hard. Scheduling AI workloads is different. Ray distributes computation across a cluster. That led me to another question:
Why does everyone seem to use vLLM for serving LLMs?
Why not just load a model and expose an API?
The Naive Approach Link to heading Suppose we deploy an LLM.</description></item><item><title>Day 5: Why Ray Exists</title><link>https://milinddethe15.tech/posts/day-5-ai-platform-engineering/</link><pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-5-ai-platform-engineering/</guid><description>Over the last few days, I&amp;rsquo;ve learned that AI infrastructure introduces challenges that traditional applications rarely face:
GPUs are expensive. GPU utilization matters. AI workloads often require multiple GPUs at once. Scheduling becomes much harder. That led me to another question:
If Kubernetes already schedules containers, why does Ray exist?
What Kubernetes Does Well Link to heading Kubernetes is excellent at managing infrastructure.
It answers questions like:
Which node should this Pod run on?</description></item><item><title>Day 4: Why AI Needs Different Scheduling</title><link>https://milinddethe15.tech/posts/day-4-ai-platform-engineering/</link><pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-4-ai-platform-engineering/</guid><description>In Day 3, I learned that GPU utilization is one of the biggest challenges in AI infrastructure.
That led me to another question:
Why can&amp;rsquo;t Kubernetes scheduling alone solve this problem?
After all, Kubernetes has been scheduling workloads for years.
The answer is that AI workloads have very different requirements from traditional applications.
Traditional Scheduling Link to heading A typical application might look like this:
Frontend ↓ Backend ↓ Database Each component can run independently.</description></item><item><title>Day 3: Why GPU Sharing Is Hard</title><link>https://milinddethe15.tech/posts/day-3-ai-platform-engineering/</link><pubDate>Sat, 06 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-3-ai-platform-engineering/</guid><description>In Day 2, I learned how Kubernetes discovers GPUs using Device Plugins.
Once a GPU is exposed as a resource, a Pod can request it just like CPU or memory.
That led me to a new question:
If Kubernetes can schedule GPUs, why is GPU utilization still such a big problem?
The answer is simple:
Most workloads don&amp;rsquo;t use an entire GPU.
The GPU Utilization Problem Link to heading Imagine you have an NVIDIA A100 GPU with 80 GB of memory.</description></item><item><title>Day 2: How Kubernetes Sees GPUs</title><link>https://milinddethe15.tech/posts/day-2-ai-platform-engineering/</link><pubDate>Fri, 05 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-2-ai-platform-engineering/</guid><description>Yesterday I learned that AI infrastructure is fundamentally about managing expensive compute resources.
That led me to a practical question:
How does Kubernetes even know a GPU exists?
For CPUs and memory, Kubernetes can discover resources directly from the node.
GPUs are different.
A GPU isn&amp;rsquo;t automatically visible to Kubernetes.
There is an extra layer that makes everything work.
The Journey from GPU to Pod Link to heading The path looks like this:</description></item><item><title>Day 1: Why AI Workloads Are Different from Traditional Applications</title><link>https://milinddethe15.tech/posts/day-1-ai-platform-engineering/</link><pubDate>Thu, 04 Jun 2026 00:00:00 +0000</pubDate><guid>https://milinddethe15.tech/posts/day-1-ai-platform-engineering/</guid><description>As someone who has spent most of my time with Kubernetes, observability, and platform engineering, I started my AI Platform Engineering journey with a simple question:
Why can&amp;rsquo;t AI workloads run like normal microservices?
At first glance, they seem similar.
Both are packaged into containers.
Both run on Kubernetes.
Both expose APIs.
Both scale based on demand.
So why is there an entirely new ecosystem around AI infrastructure?
After digging deeper, I realized the answer lies in where the bottleneck exists.</description></item></channel></rss>