How can we work with EKS



When you kick off new EKS, I believe you can messup a bit with how we can setup the authentication to help kubectl or aws can retrieve the information inside Kubernetes Cluster via Kubernetes API. This one is really interesting question

I need spend a couple of hours to see how this work, because I need to access to cluster to see what happen inside EKS Milvus Cluster, you can explore more information at First EKS Cluster with Milvus DB.

Methodology to authenticate

When I first time to use EKS, It’s totally different than AKS. And I reach to how to configuration the permission to help this blocker resolved, through

There is many ways to authenticate, and you can catch up that really exist in two types

  • An AWS Identity and Access Management (IAM)Β principalΒ (role or user) – This type requires authentication to IAM. Users can sign in to AWS as anΒ IAMΒ user or with aΒ federated identityΒ by using credentials provided through an identity source.
  • A user in your own OpenID Connect (OIDC) provider – This type requires authentication to yourΒ OIDCΒ provider. For more information about setting up your own OIDC provider with your Amazon EKS cluster, seeΒ Grant users access to Kubernetes with an external OIDC provider.


Following my situation, I prefer to use AWS IAM because it’s kinda easily to configuration because I’am using IAM Identity Center that way can really help grant access for role assume for my shell to using kubeconfig via kubectl

To enable this feature, you can follow in couple of steps in next part

Cluster Authenticated Mode and IAM Identities

If you choose follow to use IAM principal (role and user) to access Kubernetes, you need to follow methods to allow your IAM principal to access Kubernetes object in your cluster, including

  • Creating access entries
  • Adding entries to theΒ aws-authΒ ConfigMap

In term of mine, I am using access entries to access to Kubernetes but it’s kinda require you choose once of two option depend on your cluster version and platform version. Explore more at Associate IAM Identities with Kubernetes Permissions. If you wanna easily, you should use access entries than aws-auth

To set access entries available, you have couple stuff do with Cluster Authentication Mode because you need to set your cluster determine to permit your IAM principals be able to access Kubernetes object. There are three mode you can set for your Cluster, including

  • TheΒ aws-authΒ ConfigMapΒ inside the cluster (Original)
  • Both theΒ ConfigMapΒ and access entries
  • Access entries only

In my perspective, you can choose to use both because I don’t wanna corrupt anything because you know that tuff if you want intercept any problem from your Cluster to AWS Service integrated like MilvusDB, so for ensure, I enable both, and when you apply it you can see the authentication mode of cluster like

Awesome, now you can turn back and add the entry depend on your definition. If you want explore more way to configuration EKS cluster authentication mode, don’t forget to double-check blog A deep dive into simplified Amazon EKS access management controls

Configure IAM Access Entry


As you know about I use IAM Identity Center to configure authentication and authorization for user and group to access AWS Resources. With IAM Access Entry for EKS, It’s not exception, I need to add the policies to grant account can describe EKS Cluster configuration for retrieving kubeconfig to current shell

First of all, back to terraform configuration to configure add-on policies, you can consider to provide policy AmazonEKSWorkerNodePolicy because you need permission eks:DescribeCluster to get kubeconfig and this policy actual contains it, so hit to terraform and do it for yourself. Explore at Reuse with your AWS SSO module

module "sso_identity" {
  source = "gitlab.com/awesome_terraform_practice/aws-iam-identity-center/aws"
  version = "0.0.1"
    DeveloperServiceAccess = {
      description          = "Provides AWS Developers permissions.",
      session_duration     = "PT3H"
      aws_managed_policies = [
      tags                 = { ManagedBy = "Terraform" }

Now apply terraform and you can describe your EKS Cluster with AWS Portal, It gains you permission to retrieve kubeconfig through command

aws eks update-kubeconfig --region <CLUSTER_REGION> --name <NAME_CLUSTER>

You just need to setup it for Developer because if you have administrator to access AWS, you may need have permission to view and configuration tons of things with EKS. BTW, you need to request DevOps or AWS Admin to create access entries for your DeveloperServiceAccess role. You can follow step at AWS Docs - Create access entries

I prefer the way to create this entries through AWS Portal, kinda convenient and after you complete, who have DeveloperServiceAccess can able to use this access Kubernetes object. BTW you need consider what policy provide for this role inside Kubernetes cluster, you can double-check at AWS Docs - Review access policy permissions


In my situation, I like give permission AmazonEKSClusterAdminPolicy for who is Administrator of EKS Cluster and AmazonEKSViewPolicy for who is developer of EKS Cluster

Now you can try to export profile with AWS SSO, and see what happen with your developer permission

aws configure sso --profile DeveloperAccess

And It’s work recently, you can retrieve pods, logs inside your target EKS Cluster. If you want to retrieve the metrics through metrics.k8s.io, you need to grant permission from AmazonEKSViewPolicy to AmazonEKSAdminViewPolicy (Note: Secret be able to view with this permission)


Now you can access and practice with EKS Cluster from both AWS Portal and your local machine with kubectl

EKS Cluster Monitoring and Observability



As you can see, when you access Kubernetes, you can use kubectl to view both metrics and logs but you know but in term of developer, that kinda tuff when they first face up with kubectl and kubernetes so read all of these output inside the shell can be tough. But, AWS always offer another way to monitor and observer the EKS Cluster through CloudWatch or Grafana/Prometheus, It’s up to you to choose one of these to operating for your cluster

For me, CloudWatch is such a good thing, so we can try follow that implementation leverage that one to create fully stack to directly monitor your EKS Cluster, let’s check it out

Intercept metrics and logs with Cloudwatch Agents

First of all, I need to figure how we can do, so reach to couple blog and documentation of AWS to explore more information, such as

After spend a bit time to read and view couple results from implementation, so I decide to install stack to monitor cluster, including

  • CloudWatch Agent - Scrape metrics
  • Fluentbit - Scrape logs

To easier in implementation, so I try retrieve a fully manifest combine both CloudWatch Agent and Fluentbit, you can double check before install inside your cluster. Explore it at cwagent-fluent-bit-quickstart-enhanced.yaml

# create amazon-cloudwatch namespace
apiVersion: v1
kind: Namespace
  name: amazon-cloudwatch
    name: amazon-cloudwatch
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
  name: cloudwatch-agent
  namespace: amazon-cloudwatch
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
  name: cloudwatch-agent-role
  - apiGroups: [""]
    resources: ["pods", "nodes", "endpoints"]
    verbs: ["list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "daemonsets", "deployments", "statefulsets"]
    verbs: ["list", "watch"]
  - apiGroups: [ "" ]
    resources: [ "services" ]
    verbs: [ "list", "watch" ]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["nodes/stats", "configmaps", "events"]
    verbs: ["create", "get"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["cwagent-clusterleader"]
    verbs: ["get","update"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get", "list", "watch"]
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
  name: cloudwatch-agent-role-binding
  - kind: ServiceAccount
    name: cloudwatch-agent
    namespace: amazon-cloudwatch
  kind: ClusterRole
  name: cloudwatch-agent-role
  apiGroup: rbac.authorization.k8s.io
# create configmap for cwagent config
apiVersion: v1
  # Configuration is in Json format. No matter what configure change you make,
  # please keep the Json blob valid.
  cwagentconfig.json: |
      "agent": {
        "region": "{{region_name}}"
      "logs": {
        "metrics_collected": {
          "kubernetes": {
            "cluster_name": "{{cluster_name}}",
            "metrics_collection_interval": 60,
            "enhanced_container_insights": true
        "force_flush_interval": 5
kind: ConfigMap
  name: cwagentconfig
  namespace: amazon-cloudwatch
# deploy cwagent as daemonset
apiVersion: apps/v1
kind: DaemonSet
  name: cloudwatch-agent
  namespace: amazon-cloudwatch
      name: cloudwatch-agent
        name: cloudwatch-agent
        - name: cloudwatch-agent
          image: public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300032.3b392
          #  - containerPort: 8125
          #    hostPort: 8125
          #    protocol: UDP
              cpu:  400m
              memory: 400Mi
              cpu: 400m
              memory: 400Mi
          # Please don't change below envs
            - name: HOST_IP
                  fieldPath: status.hostIP
            - name: HOST_NAME
                  fieldPath: spec.nodeName
            - name: K8S_NAMESPACE
                  fieldPath: metadata.namespace
            - name: CI_VERSION
              value: "k8s/1.3.20"
          # Please don't change the mountPath
            - name: cwagentconfig
              mountPath: /etc/cwagentconfig
            - name: rootfs
              mountPath: /rootfs
              readOnly: true
            - name: dockersock
              mountPath: /var/run/docker.sock
              readOnly: true
            - name: varlibdocker
              mountPath: /var/lib/docker
              readOnly: true
            - name: containerdsock
              mountPath: /run/containerd/containerd.sock
              readOnly: true
            - name: sys
              mountPath: /sys
              readOnly: true
            - name: devdisk
              mountPath: /dev/disk
              readOnly: true
        kubernetes.io/os: linux
        - name: cwagentconfig
            name: cwagentconfig
        - name: rootfs
            path: /
        - name: dockersock
            path: /var/run/docker.sock
        - name: varlibdocker
            path: /var/lib/docker
        - name: containerdsock
            path: /run/containerd/containerd.sock
        - name: sys
            path: /sys
        - name: devdisk
            path: /dev/disk/
      terminationGracePeriodSeconds: 60
      serviceAccountName: cloudwatch-agent
# create configmap for cluster name and aws region for CloudWatch Logs
# need to replace the placeholders {{cluster_name}} and {{region_name}}
# and need to replace {{http_server_toggle}} and {{http_server_port}}
# and need to replace {{read_from_head}} and {{read_from_tail}}
apiVersion: v1
  cluster.name: {{cluster_name}}
  logs.region: {{region_name}}
  http.server: {{http_server_toggle}}
  http.port: {{http_server_port}}
  read.head: {{read_from_head}}
  read.tail: {{read_from_tail}}
kind: ConfigMap
  name: fluent-bit-cluster-info
  namespace: amazon-cloudwatch
apiVersion: v1
kind: ServiceAccount
  name: fluent-bit
  namespace: amazon-cloudwatch
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
  name: fluent-bit-role
  - nonResourceURLs:
      - /metrics
      - get
  - apiGroups: [""]
      - namespaces
      - pods
      - pods/logs
      - nodes
      - nodes/proxy
    verbs: ["get", "list", "watch"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
  name: fluent-bit-role-binding
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-role
  - kind: ServiceAccount
    name: fluent-bit
    namespace: amazon-cloudwatch
apiVersion: v1
kind: ConfigMap
  name: fluent-bit-config
  namespace: amazon-cloudwatch
    k8s-app: fluent-bit
  fluent-bit.conf: |
        Flush                     5
        Grace                     30
        Log_Level                 error
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               ${HTTP_SERVER}
        HTTP_Port                 ${HTTP_PORT}
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 5M
    @INCLUDE application-log.conf
    @INCLUDE dataplane-log.conf
    @INCLUDE host-log.conf
  application-log.conf: |
        Name                tail
        Tag                 application.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*.log
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/fluent-bit*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_log.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/cloudwatch-agent*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_cwagent.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Use_Kubelet         On
        Kubelet_Port        10250
        Buffer_Size         0
        Name                cloudwatch_logs
        Match               application.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights
  dataplane-log.conf: |
        Name                systemd
        Tag                 dataplane.systemd.*
        Systemd_Filter      _SYSTEMD_UNIT=docker.service
        Systemd_Filter      _SYSTEMD_UNIT=containerd.service
        Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
        DB                  /var/fluent-bit/state/systemd.db
        Path                /var/log/journal
        Read_From_Tail      ${READ_FROM_TAIL}
        Name                tail
        Tag                 dataplane.tail.*
        Path                /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_dataplane_tail.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}
        Name                modify
        Match               dataplane.systemd.*
        Rename              _HOSTNAME                   hostname
        Rename              _SYSTEMD_UNIT               systemd_unit
        Rename              MESSAGE                     message
        Remove_regex        ^((?!hostname|systemd_unit|message).)*$
        Name                aws
        Match               dataplane.*
        imds_version        v2
        Name                cloudwatch_logs
        Match               dataplane.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights
  host-log.conf: |
        Name                tail
        Tag                 host.dmesg
        Path                /var/log/dmesg
        Key                 message
        DB                  /var/fluent-bit/state/flb_dmesg.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}
        Name                tail
        Tag                 host.messages
        Path                /var/log/messages
        Parser              syslog
        DB                  /var/fluent-bit/state/flb_messages.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}
        Name                tail
        Tag                 host.secure
        Path                /var/log/secure
        Parser              syslog
        DB                  /var/fluent-bit/state/flb_secure.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}
        Name                aws
        Match               host.*
        imds_version        v2
        Name                cloudwatch_logs
        Match               host.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/host
        log_stream_prefix   ${HOST_NAME}.
        auto_create_group   true
        extra_user_agent    container-insights
  parsers.conf: |
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
apiVersion: apps/v1
kind: DaemonSet
  name: fluent-bit
  namespace: amazon-cloudwatch
    k8s-app: fluent-bit
    version: v1
    kubernetes.io/cluster-service: "true"
      k8s-app: fluent-bit
        k8s-app: fluent-bit
        version: v1
        kubernetes.io/cluster-service: "true"
      - name: fluent-bit
        image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
        imagePullPolicy: Always
            - name: AWS_REGION
                  name: fluent-bit-cluster-info
                  key: logs.region
            - name: CLUSTER_NAME
                  name: fluent-bit-cluster-info
                  key: cluster.name
            - name: HTTP_SERVER
                  name: fluent-bit-cluster-info
                  key: http.server
            - name: HTTP_PORT
                  name: fluent-bit-cluster-info
                  key: http.port
            - name: READ_FROM_HEAD
                  name: fluent-bit-cluster-info
                  key: read.head
            - name: READ_FROM_TAIL
                  name: fluent-bit-cluster-info
                  key: read.tail
            - name: HOST_NAME
                  fieldPath: spec.nodeName
            - name: HOSTNAME
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: CI_VERSION
              value: "k8s/1.3.20"
              memory: 200Mi
              cpu: 500m
              memory: 100Mi
        # Please don't change below read-only permissions
        - name: fluentbitstate
          mountPath: /var/fluent-bit/state
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: runlogjournal
          mountPath: /run/log/journal
          readOnly: true
        - name: dmesg
          mountPath: /var/log/dmesg
          readOnly: true
      terminationGracePeriodSeconds: 10
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      - name: fluentbitstate
          path: /var/fluent-bit/state
      - name: varlog
          path: /var/log
      - name: varlibdockercontainers
          path: /var/lib/docker/containers
      - name: fluent-bit-config
          name: fluent-bit-config
      - name: runlogjournal
          path: /run/log/journal
      - name: dmesg
          path: /var/log/dmesg
      serviceAccountName: fluent-bit
        kubernetes.io/os: linux


You need to modify two parameter inside the manifest, including

  1. CWAgent
  • {{cluster_name}} by your cluster name (e.g: milvus-cluster)
  • {{region_name}} by your region deployed EKS (e.g: ap-southeast-1)
  1. Fluentbit: Explore at Setting up Fluent Bit
  • cluster.name: {{cluster_name}} by your cluster name (e.g: milvus-cluster)
  • logs.region: {{region_name}} by your region deployed EKS (e.g: ap-southeast-1)
  • http.server: {{http_server_toggle}} (e.g: β€˜Off’)
  • http.port: {{http_server_port}} (e.g: β€˜2020’)
  • read.head: {{read_from_head}} (e.g: β€˜Off’)
  • read.tail: {{read_from_tail}} (e.g: β€˜On’)

Now you can apply manifest and see the result expose from your EKS Cluster, but remember login into Administrator role to have permission write into cluster

kubectl apply -f cwagent-fluent-bit-quickstart-enhanced.yaml


Take a guess, It’s totally failure because we forgot setup the legit important to gain permission EKS Cluster have put logs and metrics to CloudWatch. Back to this configuration at AWS Docs - Verifying prerequisites for Container Insights in CloudWatch

As you can see, we need provide more permission into role attached with EKS Cluster to help send logs and metrics to cloudwatch through that one, CloudWatchAgentServerPolicy need to be configured

After you make sure anything work with your node group, you can apply manifest again and see the result.


Boom, successfully you totally view both metrics and logs inside your CloudWatch

Couple of results when deployed successful

Workload of EKS Monitoring

With metrics, CWAgent will send that into CloudWatch in metrics ContainerInsights

With logs, Fluentbit will send that into CloudWatch as log groups in context

  • /aws/containerinsights/Cluster_Name/application
  • /aws/containerinsights/Cluster_Name/host
  • /aws/containerinsights/Cluster_Name/dataplane
  • /aws/containerinsights/Cluster_Name/performance


Now you can take a look and debug directly your cluster through CloudWatch Portal, truly convenient btw consider your cost paid for that. If you think that really work, go for it that really cool stuff you can make bit for your EKS Cluster




