center

AIOps / MLOps / LLMOps

1. Medium - Exploring MLflow experiments with a powerful UI

This is a great article that provides information on how to combine MLflow, a popular model registry and delivery tool, to create a complete MLOps lifecycle. For added advantage, MLOps also requires a UI for tracking and visualizing model information. Aim (https://github.com/aimhubio/aim) now addresses this need, offering enhanced user experience for experiment tracking.
This article provides a full tutorial for setting up these stacks, along with a detailed example illustrating experimentation. It looks very effective to both track and visualize model weights and parameters through the Aim integration with MLflow.

Architecture

1. Linkedin - A Deep Dive into NVIDIA GPU Virtualization: Passthrough, MIG, vGPU, and Time-Slicing 🌟 (Recommended)

This excellent article presents several GPU virtualization approaches, especially those involving NVIDIA technologies, which have become the backbone of many systems worldwide. Within this article, you will learn a summary of each type, their use cases, mechanisms, and more.
Finally, the article provides a detailed comparison of these methods, outlining their respective supported features. This offers a comprehensive overview of how to utilize GPUs effectively in any system, particularly for inference workloads.
To explore this topic further, you can also check out my related article on vGPU in a Kubernetes environment: vGPU and Kubernetes Story.

2. Medium - How to Efficiently Use GPUs for Distributed Machine Learning in MLOps 🌟 (Recommended)

This article presents an approach to accelerated distributed training, unpacking critical technical considerations for scaling modern machine learning workloads across system setup, orchestration, and performance optimization.
The article lists several solutions that enable GPU-to-GPU interaction via gRPC. To further enhance performance, it introduces combining NCCL (NVIDIA) (https://github.com/NVIDIA/nccl) and RCCL (AMD) (https://github.com/rocm/rccl).
Following this, the author references how Kubernetes is an excellent choice for accelerating GPU usage. This is due to its high-quality feature set, robust support for NVIDIA and AMD, and multiple efficient mechanisms for managing GPUs, particularly with the use of the General GPU Operator (https://dev.to/aws-builders/nvidia-gpu-operator-explained-simplifying-gpu-workloads-on-kubernetes-479b), the NVIDIA GPU Operator (https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html), and the AMD GPU Operator (https://instinct.docs.amd.com/projects/gpu-operator/en/release-v1.4.0/).
Next, the article discusses performance optimization and resource tuning for GPUs, focusing on techniques for improving utilization, such as MIG (Multi-Instance GPU), Time-Slicing, and vGPU. The author then expands the discussion to include networking, NUMA (Non-Uniform Memory Access) (https://www.kernel.org/doc/html/v4.18/vm/numa.html), and Topology Awareness (https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/) in the combined architecture of AI theory and Cloud-Native technologies like Kubernetes. The article outlines the limitations of these approaches but also supplies unique ways to reduce bottlenecks, such as adopting RDMA (Remote Direct Memory Access) and Fast Storage solutions.

3. Medium - Simplifying Kubernetes Multi-Region Management: Challenges & Solutions 🌟 (Recommended)

The article emphasizes the importance of Kubernetes for multi-region purposes, but underscores that this is a significant undertaking. It poses a true challenge to ensure several critical aspects are managed effectively across regions, including networking, high-availability, ingress, and more. The Kubernetes community designed the platform to focus on multi-cluster or multi-zone capabilities rather than true multi-region spanning. This is largely due to issues with the underlying etcd and Raft consensus algorithm, where operating high availability with high network latency is highly inefficient.
The author lists several other challenges and reasons to carefully consider before implementing Kubernetes for multi-region use, specifically focusing on the complexities of Kubernetes features and CNI (Container Network Interface) implementation necessary to make it work.
In the next section, the author provides information on how to pursue a multi-region path despite the numerous trade-offs. While not impossible, handling this use case requires combining various tools, techniques, and deep expertise in the Kubernetes API, networking, and other related concepts. The author asserts that this is one of the hardest problems in Kubernetes, unlike simple scaling via HPA (Horizontal Pod Autoscaler) or Node Autoscaler. Therefore, one must fully understand and prepare a robust strategy before introducing it into a production environment.
The article concludes by referencing the author’s platform, Plural (https://www.plural.sh/product), as a Kubernetes platform solution designed to centralize all clusters into a single portal, enabling unified management and operation of tasks like CI/CD and IaC (Infrastructure as Code).

4. Youtube - Pods Everywhere! InterLink: A Virtual Kubelet Abstraction Streamlining HPC… - Diego Ciangottini 🌟 (Recommended)

This excellent video presents a unique architecture for those looking to implement cross-cluster or hybrid Kubernetes orchestration, utilizing Virtual Kubelet (https://github.com/virtual-kubelet/virtual-kubelet) and InterLink (https://github.com/interlink-hq/interLink). These are two technologies that have recently piqued my curiosity, offering a novel approach to Kubernetes orchestration, especially for AI workloads that demand massive resources for operation, inference, and training.
The author effectively showcases the impressive potential of Virtual Kubelet in scenarios where traditional Kubernetes struggles. This technology opens up design possibilities, allowing for the addition of distinct environments to a cluster via an abstraction layer. While this approach is cool, it is also technically challenging. If you seek to experiment with it and believe it can solve problems for your business, its impact could be significant.

5. Uber - From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey

This segment is about the massive and truly impressive journey of building Uber’s ML Platform, Michelangelo (https://www.uber.com/en-VN/blog/michelangelo-machine-learning-platform/). The article reveals why it took them several years to evolve their platform capabilities, starting from basic models like Linear Prediction all the way up to Deep Learning (DL) and, eventually, Generative AI (LLMs). Within this article, you will learn and see the actual challenges they faced, the methods they used to overcome them, and how they subsequently scaled their platform.
Beyond just the development journey, they share their comprehensive architecture. This allows us to understand their core definitions, the technologies used, and the theoretical approaches they adapted to maximize value. You are sure to enjoy digesting this architecture and gaining significant experience for your own work.

6. AWS - Hybrid and Multi-Region Kubernetes Orchestration Using Kublr 🌟 (Recommended)

Following the discussion on multi-region Kubernetes challenges, Kublr (https://docs.kublr.com/overview/) emerges as an alternative platform to Plural (https://www.plural.sh/product) for handling complex infrastructure scaling. It is difficult to definitively say which is superior, but Kublr can significantly streamline the process of bootstrapping architecture on AWS, automating the provisioning of resources like VPC, EBS, and EKS.
The Kublr Architecture is composed of two main elements:
- Kublr Platform (Kublr Control Plane): The centralized management platform.
- Kublr Kubernetes Clusters: The clusters provisioned and managed by the Control Plane.
Using these elements, Kublr manages crucial networking components in the AWS Cloud by using a single CloudFormation template. This template automatically provisions all necessary resources for the Kubernetes cluster, including the VPC, subnets, scaling groups, Internet Gateway, NAT Gateway, routing tables, security groups, load balancers, and other underlying resources.
The post also details a tutorial on setting up a multi-zone cluster bootstrap concept. This setup features the control plane in us-east-1 and managed Kubernetes clusters spanning both us-east-1 and us-west-1. The detailed, step-by-step process is said to take only 15–20 minutes to complete. Therefore, for those approaching a multi-region Kubernetes solution, Kublr is a platform worth investigating.

7. DigitalOcean - A Deep Dive into Cloud Auto Scaling Techniques

This article provides a great overview and vision of auto-scaling techniques across various types of architectures, emphasizing how to leverage them effectively with major Cloud resources. These techniques are constantly being developed and refined by cloud providers, from small to large scale.
You will learn how auto-scaling works, the underlying mechanisms used to build these solutions, and how they are implemented across platforms like AWS, Azure, GCP, DigitalOcean, and various Kubernetes Platforms.
The blog concludes with a valuable comparison table that highlights the difference between traditional scaling and automated scaling. It also details different auto-scaling policies that can be adopted and points out the common mistakes often encountered when implementing these solutions.

8. AWS - Running AWS Fargate with virtual-kubelet

For more information on Virtual Kubelet (https://github.com/virtual-kubelet/virtual-kubelet), I found another blog post from AWS focusing on its integration with the Fargate platform—a truly cool implementation. The author provides an excellent tutorial on using Virtual Kubelet with a Kubernetes cluster provisioned by KOps (https://github.com/kubernetes/kops) on EC2, allowing workloads to run on both the EC2 nodes and Fargate.
This approach offers a standard Kubernetes mapping mechanism, giving you the ability to interact with Fargate for deployment, rollout, and introspection using familiar Kubernetes commands. The post suggests trying this immediately. However, while Virtual Kubelet itself has been around since 2018, its adoption and available documentation are still limited. Therefore, you should exercise caution when considering it for production environments. Nonetheless, exploring this technology gives you a unique opportunity to differentiate your offering, especially when working with various small-to-average cloud providers.

9. Blog - Scheduling simulations and ghosts in the cluster 🪄

This original blog post delves into the creation and purpose of Virtual Kubelet. Upon reviewing the article, it’s clear the author’s primary goal was to demonstrate how to simulate Kubernetes scheduling without needing to run the actual workloads on a real cluster. This aligns with the modern serverless architecture approach where many providers offer a higher abstraction layer (Container Instance or Serverless) above the bare machine layer, which is the foundational purpose behind their tool, SimKube (https://github.com/acrlabs/simkube).
I am truly curious about their project, as it involves a comprehensive set of tools, including skctl (CLI), sk-tracer, sk-ctrl, sk-driver, sk-vnode, and sk-cloudprov. These components are designed to introduce a sophisticated, new approach to Kubernetes orchestration that is serverless yet entirely Kubernetes native—truly an innovative concept. The article also mentions KWOK (Kubernetes WithOut Kubelet) (https://github.com/kubernetes-sigs/kwok), suggesting a potential integration with SimKube through collaboration among maintainers. This is intriguing, and I hope to gain more experience or get closer to these technologies soon.

10. Blog - Supernatural abilities of a virtual kubelet 🌀

This blog post is a continuation of the discussion on the problems above. In this segment, the author elaborates on using Virtual Kubelet as a “ghost” node within the Kubernetes cluster, specifically via the provider InterLink (https://github.com/interlink-hq/interLink), which functions as a Virtual Kubelet Plugin Engine. Throughout the article, you’ll have the chance to examine the architecture and understand how these tools work together.
To detail the deployment, the article outlines various approaches to setting up InterLink. It goes into detail for each method, providing command walkthroughs and explanations, even noting potential failures. This approach, which allows for the practice and implementation of native Kubernetes orchestration without involving any physical server or Kubelet, promises to open a new wave of deployment strategies.

DevOps

1. VictoriaMetrics - Prometheus Alerting 101: Rules, Recording Rules, and Alertmanager

This article is part of a monitoring series written by Victoria Metrics (https://victoriametrics.com/) that offers clarification on monitoring concepts for engineers and experienced users. You can explore other articles in the series for more interesting information. This specific article focuses on providing a detailed vision of how to manage, define, and set up alert rules based on metrics within the Prometheus Ecosystem (https://prometheus.io/). It details each component with clear illustrations, making it accessible even for newcomers.

Kubernetes

1. ScaleOps - Kubernetes HPA: Use Cases, Limitations & Best Practices in 2025 🌟 (Recommended)

This is a great article for learning about the Horizontal Pod Autoscaler (HPA) and related operational stories, offering insight into scaling Kubernetes with real-world examples. It covers implementing custom metrics to enhance scaling efficiency, provides best practices for adopting HPA, and discusses common issues to encounter.
The section on implementing custom metrics (e.g., requests_per_second, kafka_consumer_lag_seconds) via the Prometheus Adapter (https://github.com/kubernetes-sigs/prometheus-adapter) is particularly interesting. It allows you to define demand-driven metrics, enabling more tactical and efficient scaling compared to relying solely on basic metrics (CPU, Memory).
Throughout the article, the author presents several case studies detailing the pitfalls of HPA (such as slow scaling, P99 latency issues, memory-based scaling complexity, and budget waste). They provide practical truths and solutions to redefine the operational landscape and enhance scaling efficiency, including techniques like the Buffer Pod (N+2) Strategy and Pre-warming with CronJobs.

2. Medium - GitOps at Scale: Mastering Multi-Cluster Management and Advanced Patterns 🌟 (Recommended)

GitOps Implementation with ArgoCD: This is an interesting article that provides a fundamental understanding of how and where to begin implementing GitOps strategies using ArgoCD. It covers various patterns, such as App of Apps and ApplicationSet.
Best Practices and Evidence: Throughout the article, the author provides clear evidence and use cases detailing the best practices for adopting these patterns in your ArgoCD setup. This helps you bootstrap your Kubernetes environment and deploy new applications efficiently.
Multi-Cluster Management: The article emphasizes the value of leveraging ArgoCD to serve as a management plane for controlling your multi-cluster environment. This capability is extremely streamlined and useful given the enormous landscape of Kubernetes clusters today.
Complete GitOps Strategy: Ultimately, the article gives you a complete vision for bootstrapping your GitOps strategy, including a robust file and folder structure approach. You can use this knowledge to start building your own system for controlling multiple Kubernetes clusters.

3. Akuity - Which Argo CD Architecture is Best? Comparing Single, Per Cluster, and Hybrid Models

This article provides a walkthrough of the three most common ArgoCD architectures: Single Instance, Instance Per Cluster, and Hybrid (Instance Per Logical Group). The goal is to help you determine the best and most compatible option for your specific situation. For each model, the author details the advantages, disadvantages, how the architecture works, and shares practical challenges and experiences. This makes it a perfect resource for anyone struggling to find the optimal deployment strategy for their system.
Finally, Akuity also introduces its product, designed to support the control plane and serve as a centralized platform for managing ArgoCD Per Cluster deployments. The summary covers the advantages and key trade-offs of their solution, which is built upon an Agent-Based Architecture.

4. Medium - Building a Kubernetes Platform — Think Big, Think in Planes 🌟 (Recommended)

We’re back to discussing Internal Developer Platform (IDP) concepts in Kubernetes. A big shoutout goes to Artem Lajko (https://www.linkedin.com/in/lajko) for publishing a high-quality article for the community, which directly addresses the big question: Why do we need a platform? The article provides evidence that adopting a platform isn’t easy; it’s complicated, success is difficult to achieve on the first try, and it costs a lot, often with results that are hard to immediately validate.
In this article, the author introduces the “Planes” layer concept in IDP (as defined by PlatformEngineering.org), instead of merely listing tool layers. These planes include: Developer Control, Integration & Delivery, Monitoring & Logging, Security, and Resource. This framework attempts to solve the problem of engineers drowning in the pool of tools found in the CNCF landscape (https://landscape.cncf.io/), which isn’t a great starting point.
The core takeaway is to understand the progression: from the complexity of the CNCF Landscape to the principles of Platform Engineering and finally to the defined Planes. Each phase has meaning, and together they lead to the desired overall infrastructure. Read this article if you want to understand how to start defining each “plane” in your platform, moving from a jungle of tools to a bootstrapped infrastructure for your system.

5. AWS - Maximizing GPU utilization with NVIDIA’s Multi-Instance GPU (MIG) on Amazon EKS: Running more pods per GPU for enhanced performance

This article approaches the concept of Multi-Instance GPU (MIG) from NVIDIA for optimizing GPU reservation. This technique splits a large GPU into smaller, isolated partitions, allowing each segment to work independently while remaining efficient. The core topic revolves around implementing MIG specifically with Amazon EKS.
Currently, MIG is only enabled on unique GPU categories like the A100, H100, and H800. The author references the substantial benefits of this technique, especially when used with Kubernetes. Many customers prefer Kubernetes for ML tasks due to its inherent features, particularly scheduling. However, working with Kubernetes requires attention to a multitude of components, including the GPU driver, container runtime, and device plugin.
The article provides a full walkthrough demonstrating how to use EKS with MIG, utilizing an A100 GPU and the NVIDIA GPU Operator. This configuration will provide a clear vision and definition of the problems involved and illustrate the complexity of using Kubernetes to manage GPUs. Ultimately, bridging this gap—from initial deployment to managing your ML workload—always points back to the power of Kubernetes orchestration.

6. AWS - GPU sharing on Amazon EKS with NVIDIA time-slicing and accelerated EC2 instances

This post is another article related to EKS and GPU reservation, but it offers a different approach by providing several choices for working with GPU sharing, such as Time-Slicing and vGPU, instead of only MIG. This particular post focuses on the Time-Slicing technique for sharing GPU resources in small time intervals, thereby ensuring efficient utilization and task concurrency.
Similar to the MIG article, this post provides a detailed tutorial for setting up the full component stack of EKS to effectively work with the Time-Slicing method of NVIDIA GPUs.

7. Youtube - Unlocking the Power of Kubernetes: Create your own Resources with CRDs

8. Blog - 10 + 1 Things I wish I knew about operators before I wrote one

9. Enix - Kubebuilder: Easily create a Kubernetes operator

These articles and videos focus on the topic of implementing your own Custom Resource Definitions (CRDs) (https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) and Operators (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). This allows you to integrate your custom logic and business strategies directly into the Kubernetes cluster, automating complex, multi-step tasks.
You can leverage frameworks like Operator SDK (https://sdk.operatorframework.io/) or KubeBuilder (https://github.com/kubernetes-sigs/kubebuilder) to significantly reduce the complexity of bootstrapping the project structure from scratch. The secondary blog post offers valuable experience sharing from others who have written their own Operators, detailing the problems they encountered. This allows you to learn from their lessons and prevent similar issues when defining your first version.
Building these components requires a deep understanding of the Kubernetes API, the mindset of a Kubernetes Engineer, and clear target objectives. While developing custom Operators is certainly a high-end solution and represents a key advantage when implementing complex logic in Kubernetes, it is not easy. However, it offers a great chance to innovate and differentiate your platform, so it is a worthwhile endeavor to start with simple, focused tasks.

xeusnguyen.xyz

Explorer

Recent Notes

Awesome My Blogs

HomePage 🏡

DueWeekly Tech: 06-10-2025 to 08-11-2025

DueWeekly Tech: 06-10-2025 to 08-11-2025

AIOps / MLOps / LLMOps

Architecture

DevOps

Kubernetes

Graph View

Table of Contents

Backlinks