center

AIOps / MLOps / LLMOps

1. Alibaba - Model Service Mesh: Model Service Management in Cloud-native Scenario

  • Service mesh isn’t just for microservices anymore—it’s also a powerful tool for optimizing AI workloads. By using a service mesh, you can significantly streamline the versioning, scaling, and deployment of your machine learning models. While Alibaba Cloud Service Mesh (ASM) offers a robust, high-performance solution for managing and scheduling model services, you can achieve similar results with the open-source KServe.
  • This article will provide two complete, hands-on guides. You’ll learn how to use KServe and PVCs to deploy your models effectively. This approach supports custom model runtimes like PyTorch, TensorRT, and ONNX. Plus, we’ll show you how to use a model mesh to run Large Language Model (LLM) inference at scale, giving you the distributed and scalable functionality needed for high-workload scenarios

2. Blog - Your AI workloads still need a service mesh

  • Whether a service mesh is the right choice for an AI workload depends entirely on your specific needs. While it might seem like a solution for everything, a service mesh may not always be the best option—and sometimes, a different approach is better.
  • The author of this article provides strong evidence to explain why blindly adopting a service mesh can be a mistake. Sometimes, a technology is more of a trend than a truly effective solution, and I completely agree with that perspective.
  • Before you adopt any new technology for a production environment, you need to think critically about whether it’s the right fit. The most successful people in this industry always ask the hard questions, practice with the technology, and honestly evaluate the results.
  • One more suggestion to inference host, check out at gateway api inference extension

Architecture

1. Alibaba - Cloud-Native Encountering Hybrid Cloud: How to Balance between Change and Stability

  • Hybrid Cloud is one of hyper things in Cloud couple year ago, and it grows more and more to recent day, it become solution to satisfied a couple of stuff about security, cost-effectiveness, technologies innovation, …
  • The article also try to proven the cloud-native supporting the evolution of hybrid cloud, with the represented by Kubernetes, the game changed and no doubt the cloud-native era and containerization make this industry become more interesting, complex of course but your have more and more things to built success for your enterprise with Cloud Native

2. Alibaba - Start Hybrid Cloud and Multi-Cloud with Alibaba Cloud ACK One (Part One) 🌟 (Recommended)

3. Alibaba - Start Hybrid Cloud and Multi-Cloud with Alibaba Cloud ACK One (Part Two) 🌟 (Recommended)

  • This is the story of SoftBank, a Japanese company that chose to transition its architecture from an on-premises model to a multi-cloud environment using both Alibaba Cloud and Azure. This case study is a great source of inspiration for anyone who wants to understand how a major enterprise centralizes and controls its operations in a multi-cloud world.
  • Throughout the article, you’ll get a detailed look at how SoftBank organized its ACK (Alibaba Cloud Kubernetes) and connected it with AKS (Azure Kubernetes Service). It dives into their pipelines, networking, security protocols, and more.
  • This article truly impressed me. It provides a comprehensive, well-illustrated overview of their approach and explains their core strategy. What’s more, it goes deep into their network infrastructure, detailing how they leveraged VPN, SD-WAN, and a OnePort - Network Service which offered by SoftBank to connect directly to these major cloud providers.
  • This is an incredible resource for anyone looking to explore advanced system architecture. It’s a goldmine of information, and I highly recommend it.

Career / Story

1. Medium - Senior DevOps Engineer Interview at Microsoft

  • This is a good story about a Microsoft interview question, for which you need to prepare a lot of concepts. Honestly, the questions are so complex that I didn’t know a lot of the material, and they deal with many topics related to Azure Cloud. If you don’t have a chance to prepare or need some keywords, this article will be helpful.
  • By the way, the complexity of the questions doesn’t really prove whether you’re good or not; it just shows your preparation for the interview. For me, it’s better to come with your own knowledge, show what you’ve got, and be confident. I’ve found that this is a great way to start any job.

Data Engineer

1. Medium - Icebox: Your Five-Minute Flight Simulator for Apache Iceberg

  • Meet the Icebox, your local simulator for building a modern data lakehouse with less effort. Designed for developer ergonomics above all else, Icebox gives you a faster, easier way to learn and experiment. With sensible defaults, you’re productive immediately. It’s packed with features like Zero-Friction Liftoff, a Full Avionics Suite, True Iceberg Power, and a Portable & Shareable setup.
  • After you set up Icebox, you get a complete guide to working with an impressive tech stack, including Go, Iceberg, Arrow, DuckDB, SQLite, and MinIO. Icebox is designed to eliminate the complexity and pain points of working with data lakehouses. If you want to dive deeper into Iceberg and surrounding technologies, be sure to check out my article The First Data Lakehouse and Challenge

Kubernetes

1. Medium - Kubernetes multi-cluster implementation in under 10 minutes

  • If you’re looking to dive into the complexities and high-level advantages of Kubernetes, you’ve come to the right place. This article is a great starting point for learning about multi-cluster Kubernetes using Kind and Cilium—two truly incredible tools.
  • This guide is about more than just the technical aspects. It highlights the author’s journey and their deliberate choice to use the cilium-cli instead of Helm. We’ll provide a full walkthrough for setting up Kind on your local machine and integrating both Cilium and MetalLB. I was personally curious and fascinated by this topic, and I think you will be too

2. Medium - Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops

  • This is an excellent checklist for everyone using ArgoCD. It provides a straightforward, evidence-based approach to double-checking your work and preventing common, but crucial, errors when you’re using this modern and popular GitOps solution.
  • You can go through each category based on its specific pattern. This will help you understand which concepts apply to what, ensuring you’re ready to tackle and resolve any ArgoCD issues that come your way. This is a very helpful resource for pinpointing and solving problems efficiently.

3. CNCF - Karmada and Open Cluster Management: two new approaches to the multicluster fleet management challenge

  • Continuing the journey of KubeFed, two projects have emerged to provide multi-cluster orchestration platforms with active and vibrant communities: Karmada and Open Cluster Management (OCM).
  • First, a quick shout-out to KubeFed. While it has a legacy of its own, its feature incompatibilities, lack of extensibility, and steep learning curve made it a difficult platform to manage, especially when it came to migrations.
  • That’s where these new tools come in. Both Karmada and OCM were created to address the challenges of KubeFed by providing a core set of features for multi-cluster management. There’s real hope for what these two platforms will bring in the future.

4. Medium - Securing Kubernetes Layer by Layer: An OSI Approach (Part 1: L2 & L3)

5. Medium - Securing Kubernetes Layer by Layer: An OSI Approach (Part 2 : L4)

  • These articles are truly impressive. they tackle one of the biggest challenges in any system: how to ensure security and effective remediation. You will encounter many issues as you set up and operate a system, and in a world where a cyberattack is always a possibility, your policies must be set up correctly to defend your system.

  • To build a strong defense, you need to understand the OSI Model. This is especially true when working with Cloud Native platforms like Kubernetes, one of today’s most popular architectures. This article provides a compelling visual guide to network defense, implementing security at each layer:

    • L2 (MAC Address): Using a CNI
    • L3 (IP & Routing): With NetworkPolicies
    • L4 (TCP/UDP): Using a Service Mesh, Authorization, Namespace Communication, and mTLS
  • This is an invaluable opportunity for both you and me to learn something new. As a reminder, adding more security is a good thing, but it always comes with trade-offs. You must understand those trade-offs before you push anything to a production environment. It’s incredibly difficult to roll back if you don’t know what you’re doing.

  • Don’t blindly adopt modern architectures if you don’t fully understand them. Instead, learn, research, and roll out slowly to reach your goals while continuously improving your knowledge. This approach will be far more valuable than carrying technical debt forever.