center

AIOps / MLOps / LLMOps

1. Medium - Getting Started With MLOps For DevOps Engineers 🌟 (Recommended)

  • This article provides an excellent introduction to MLOps, a relatively new concept that has gained significant traction in recent years. You’ll learn several prominent tools and technologies widely used in MLOps to solve a problem settle on top of articles in house-prediction with Regression, such as Docker/Podman, MLFlow, Streamlit, FastAPI and moreover. The article includes great use cases and examples to guide your understanding.
  • MLOps is rapidly becoming a highly sought-after profession, gaining recognition across numerous technology companies and within the broader social landscape. With the ongoing expansion of AI technologies, MLOps is poised for continuous growth, cementing its position as a top domain within the xOps marketplace.

2. Medium - 10 Must-Try Small Local LLMs That Run on Less Than 8GB RAM/VRAM 🌟 (Recommended)

  • This brilliant article provides compelling evidence that you can run and execute LLM Models locally on environments with less than 8GB of RAM/VRAM, such as laptops, IoT devices, or PCs. This is truly remarkable for anyone curious about bringing LLMs into their home.
  • The article explains how this feat is achieved through model quantization, offering a brief definition of quantization and its crucial role in model optimization.
  • Furthermore, you’ll find several suggested tools, including Ollama, LM Studio, or LLama.cpp, along with a variety of models to help you get started, such as Llama 3.1 8B (Quantized), Gemma 3:4B (Quantized), DeepSeek R1 7B/8B (Quantized), BitNet b1.58 2B4T (Impressive), and more.

Data Engineer

1. Medium - Build a Data Lakehouse with Apache Iceberg, Polaris, Trino & MinIO 🌟 (Recommended)

  • This tutorial provides a comprehensive walkthrough for building a data lakehouse using a modern data tech stack, including Iceberg, Polaris, and Trino. The guide includes a Docker Compose file, significantly simplifying the self-hosting of these complex data technologies.
  • After bringing up the stack, you can follow the tutorial to configure and interact with Polaris (with Iceberg) and Trino, aided by excellent code snippets for insight. Throughout the setup, the author provides abundant information and demonstrates interaction with these technologies using simple examples.
  • This article contains even more valuable content; come and explore it to discover how to build your first lakehouse.

2. Medium - Build a Streaming Data Lakehouse with Apache Flink, Kafka, Iceberg and Polaris

  • This is the sequential part of the article mentioned previously, focusing on the streaming process into a lakehouse by adding Kafka and Flink to the stack. Just like the last blog, the author is truly excellent at providing a docker-compose file, which significantly simplifies the painful aspects of self-hosting this data stack in a local environment.
  • This article covers numerous techniques for working with Kafka and Flink, such as building a UI for Flink, generating random transactions with jr, or testing real-time pipelines. By the end, you’ll achieve real-time ingestion with Kafka, ACID guarantees with Iceberg and Polaris, and query and analytics capabilities with Trino, among other benefits.
  • Come and check out this article! It’s completely free and an excellent choice for anyone looking to learn new things in data engineering.

3. Medium - Introduction to REST Catalogs for Apache Iceberg 🌟 (Recommended)

  • This article is a walkthrough to delve into Iceberg, which is becoming a popular data format for lakehouse implementation. A significant aspect of this platform is its metadata, with different backend catalogs acting as a reference and containing a pointer to the metadata file for a given table, providing atomicity. These catalogs include REST, Hive, JDBC, Nessie, and more.
  • The article will explain and illustrate how Iceberg catalogs work. You can find more information about the evolution of catalogs in Iceberg, from Hive Metastore to REST, and a comprehensive comparison at Onehouse - Comprehensive Data Catalog Comparison.
  • Finally, you can delve deepest into REST catalogs to understand their advantages and disadvantages. This is a truly interesting topic that I am currently experimenting with and exploring, and I will share more details in a separate discussion.

4. MinIO - The Definitive Guide to Lakehouse Architecture with Iceberg and MinIO

5. Blog - Iceberg Lakehouse on Docker Using Spark, MinIO, PyIceberg, Jupyter Notebooks, and REST Catalog

6. Dremio - Quick Start with Apache Iceberg and Apache Polaris on your Laptop (quick setup notebook environment)

7. Stackable - About Data Lakehouse with iceberg, trino & spark

  • These articles are a collection of my experiments and research into building a data lakehouse with Iceberg and its attached tech stack. These one will provide you with a lot of information, examples, and good tutorials to self-host and practice with these modern data engineering technologies.

  • We explore multiple interesting fusions, such as:

    • Iceberg + MinIO: To build the data lakehouse.
    • Spark + Iceberg + MinIO + PyIceberg + Jupyter Notebook: To build the lakehouse and interact with it via libraries/tools and the Iceberg Catalog.
    • Iceberg + Polaris: An alternative REST Catalog for Iceberg implemented by Apache Polaris.
    • Iceberg + Trino + Spark: An alternative version where Spark becomes a tool for analytics and interaction with the Iceberg REST Catalog.
  • P.S.: This has been a pretty interesting journey when I learn about Iceberg and the tech stack surrounding this technology. I will return in another session to share more about my story.

8. Medium - Modern Data Stack for Newbies: Don’t Panic, Just Read This 🌟 (Recommended)

  • This article addresses a prevalent topic in data engineering: the overwhelming number of tools available today. Data engineers often find themselves β€œdrowning” in options, so the key is to break down the process, focus on smaller pieces, and identify the most suitable tools for individual needs to foster learning and growth in the field.
  • Therefore, if you come across this article, you’ll gain insights into various stages of a data pipelineβ€”from ingestion, transformation, and storage to orchestrationβ€”along with corresponding tools like Airbyte and Fivetran for ingestion, DuckDB for storage/processing, dbt for transformation, and Dagster for orchestration. The article provides illustrations, code snippets, and explanations for each tool and concept, making it genuinely interesting for beginners in the field.

Kubernetes

1. Ngrok - Introducing the ngrok Kubernetes Operator 🌟 (Recommended)

  • This article offers a valuable introduction and tutorial on Kubernetes tunneling, specifically how to expose your TCP/UDP/HTTP services to the internet. This approach is beneficial for sandbox environments or testing APIs/servers without modifying firewalls, port-forwarding, or NAT configurations, leveraging Ngrok and its operator.

  • This blog expands your knowledge of tools and setup methods for Kubernetes with tunneling techniques. It’s a great, freely accessible option for testing and implementation. You can also explore alternatives to Ngrok, such as:

    • inlets-operator: Get public TCP LoadBalancers for local Kubernetes clusters.
    • cloudflare-operator: A Kubernetes Operator to create and manage Cloudflare Tunnels and DNS records for (HTTP/TCP/UDP) Service Resources.

2. Sysbee - Testing Kubernetes deployments with KIND

3. KuberMatic - Running Kubernetes In The CI Pipeline For Integration & E2E Tests

  • These articles present an excellent approach to running Kubernetes testing with Kind in CI. It makes perfect sense to automatically spin up a Kubernetes cluster, deploy your application for initial testing, run unit tests to gather results, or verify your Kubernetes CD tools (e.g., Kustomize, Helm, or Jsonnet) are functioning correctly.

  • When integrating Kind with Kubernetes, you should thoroughly double-check your End-to-End (E2E) setup and network configuration, as these can be quite complex to manage. However, if you can successfully operate this in your CI as a dynamic sandbox environment that’s well-suited for testing, it presents a great opportunity to integrate it into your pipeline. Read more if you’re looking for examples to apply to your pipeline:

    • GitHub - Kind CI Examples: A repository providing samples and testing for running sigs.k8s.io/kind on various CI services 🌟 (Recommended - but somewhat outdated).
    • GitHub - kind-example-config.yaml: According to the official Kind documentation, this provides a solid configuration to delve deeper with Kind 🌟 (Recommended). NOTE: Pay special attention to how the kind API changes over time; this file will show you how to patch kubeadm using kubeadmConfigPatchesJSON6902.
    • Kind - Configuration: This provides the full Kind configuration, opening up most options for modifying your Kind cluster. NOTE: Remember that this Kind configuration will be saved in /kind within the control plane container.

Security

1. Medium - Building Honeypots for Internal Network Monitoring

  • This article focuses on honeypots, highlighting their current popularity and importance in network security. It emphasizes their benefits for early detection of intrusions, insider threat monitoring, and gaining behavioral insights.

  • The article explains how honeypots work and the advantages of deploying them. Furthermore, it delves into practical tips for working with honeypots, suggests various tools for their operation, and provides examples of real-world implementation scenarios: