Purpose
This place where I store whole things I know, learn and inspect about AI, ML and DA to build up these into consistent collections
General Knowledge
Landscape
- AI Agent Landscape
- LF AI & Data Foundation Interactive Landscape
- The 2024 MAD (Machine Learning, AI and Data) Landscape: Get it in PDF
Organization
- DeepSeek: LLM Model for Prompting
- Machine Learning Tooling: Provide multiple collections about ML
- MLFoundations: Community for implementing ML Model
- OpenAI: OpenAI Github Community
- Qwen: Alibaba Cloudβs general-purpose AI models
- Seldon: Machine Learning Deployment for Kubernetes
- Triton Inference Server: provides a cloud and edge inferencing solution optimized for both CPUs and GPUs.
Page
- Feature Stores for ML: Collection pages about Feature stores for ML
- Hugging Face : The Opensource AI community
- Kaggle: Your Machine Learning and Data Science Community
- LF AI & DATA Projects: Linux Foundation project about Open Source Innovation in Artificial Intelligence and Data
- Made With ML: Learning how to responsibly deliver value with ML!
- NGC Catalog: NGC Catalog - GPU Accelerated AI models and SDKs that help you infuse AI into your applications at speed of light (NVIDIA)
- Papers With Code: The latest in Machine Learning
- TensorFlow Hub: A repository of trained machine learning models.
Artificial Intelligence / Machine Learning
Articles
- Nanonets - Tesseract OCR in Python with Pytesseract & OpenCV
- Milvus - Deploy a Milvus Cluster on EKS
- Medium - Understanding Milvus: Key Concepts and Potential Applications
- CNCF - CNCF Cloud Native AI White Paper
- Medium - Why You Shouldnβt Invest In Vector Databases?
- Datacamp - The Top 7 Vector Databases in 2025
Awesome Repositories
- applied-ml: π Papers & tech blogs by companies sharing their work on data science & machine learning in production.
- awesome-deep-learning : A curated list of awesome Deep Learning tutorials, projects and communities.
- awesome-local-ai : An awesome repository of local AI tools
- awesome-model-quantization: A list of papers, docs, codes about model quantization
- awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning. Website
- best-of-ml-python: π A ranked list of awesome machine learning Python libraries. Updated weekly.
Blogs
- DigitalOcean - AI/ML Topics: Articles and Community about AI/ML
- Machine Learning Blog: ML@CMU | Carnegie Mellon University
- Machine Learning cΖ‘ bαΊ£n: Vietnamese Forum and Community about ML
- Machine Learning Mastery: The best resources to approaching ML
- MarkTechPost: ML and AI Tech News
- Medium - Marvelous MLOps
- MΓ¬ AI - Hα»c AI theo cΓ‘ch mΓ¬ Δn liα»n!: Learn about AI in Vietnamese Community
- Neptune.ai: Learn from AI/ML engineers, researchers, and folks building foundation models: best practices, tool reviews, and real-world examples.
- PyImageSearch: A brand new Computer Vision, Deep Learning, and OpenCV
Topic
Youtube Channel
- NeuralNine : Educational brand focusing on programming, machine learning and computer science
- sentdex : Funny guy who teach you about build cool stuff with python like AI
- MLOps.community : The MLOps Community fills the swiftly growing need to share real-world Machine Learning Operations best practices from engineers in the field
Data Analysis
Articles
- LakeFS - The State of Data Engineering 2024
- Practicle Data Engineering - Open Source Data Engineering Landscape 2024
- Medium - Data Pipeline Development with MinIO, Iceberg, Nessie, Polars, StarRocks, Mage, and Docker
- Medium - ETL and ELT
- Medium - ELT with Fabric, Azure and Databricks
- Medium - Apache Airflow Overview
- Blog - Change Data Capture (CDC)
- Blog - Data Lake vs Data Warehouse
- ML4Devs - Scalable Efficient Big Data Pipeline Architecture
Awesome Repositories
- awesome-bigdata: A curated list of awesome big data frameworks, ressources and other awesomeness.
- awesome-datascience: π An awesome Data Science repository to learn and apply for real world problems.
- awesome-open-source-data-engineering
- data-engineering-roadmap: A comprehensive roadmap tailored for data engineering professionals at all levels
Blogs
Topic
MLOps
Articles
- Neptune.ai - MLOps Landscape in 2024: Top Tools and Platforms
- Neptune.ai - MLOps Principles and How to Implement Them
- Google Cloud - MLOps: Continuous delivery and automation pipelines in machine learning
- Datacamp - 25 Top MLOps Tools You Need to Know in 2025
Awesome Repositories
- awesome-mlops: A curated list of references for MLOps
Blogs
- Machine Learning Operations: Provide an end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable ML-powered software.
- Neptune.ai - MLOps Learning Hub: Strategies, tools, practical insights, and example projects on MLOps
Topics
AI/ML/Data/MLOps Tools
Computer Vision
- opencv: Open Source Computer Vision Library. Website Version
Data Orchestration Workflow
- airbyte: The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
- airflow: A platform to programmatically author, schedule, and monitor workflows
Labeling and Annotation
- Argilla: a collaboration tool for AI engineers and domain experts to build high-quality datasets
LLM
- langfuse: πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets
- litellm: Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format
- TaskingAI: The open source platform for AI-native application development.
Models
- trl: Train transformer language models with reinforcement learning. Hugging Face
Model Management and Serving
- KServe: A standardΒ Model Inference PlatformΒ onΒ Kubernetes, built forΒ highly scalableΒ use cases.
- mlflow: Open source platform for the machine learning lifecycle
- MLServer: An open source inference server for your machine learning models.
- ray: an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Streaming / CDC
Toolkits
- openvino: OpenVINOβ’ is an open-source toolkit for optimizing and deploying AI inference
- pachyderm: Data-Centric Pipelines and Data Versioning
Training
- open_flamingo: An open-source framework for training large multimodal models.