14 October 2015

In this article I install a k8s cluster, try to setup all the services needed to host a real web service, then give a summarize.

Install Kubernetes Master and Minions

To install Kubernetes, I followed Severalnines’ guide. Note that

  • The firewalld is turned off
  • Kubernetes package are installed by yum
  • I use Kubernetes release-1.0

I setup a 1 master 3 minion deployment. If you are using vmware/virtualbox, be sure to enable promiscuous mode on vswitch. I didn’t implement HA. For HA deployment of Kubernetes, refer to official HA guide. My k8s version below

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"0+", GitVersion:"v1.0.0-290-gb2dafdaef5acea", GitCommit:"b2dafdaef5aceafad503ab56254b60f80da9e980", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"0+", GitVersion:"v1.0.0-290-gb2dafdaef5acea", GitCommit:"b2dafdaef5aceafad503ab56254b60f80da9e980", GitTreeState:"clean"}

$ uname -r
3.10.0-229.el7.x86_64

To start services on master node

$ for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler; do 
    systemctl restart $SERVICES
    systemctl enable $SERVICES
    systemctl status $SERVICES 
done

To start services on minion nodes. Note that we need to launch flanneld earilier than docker, otherwise docker may not be using flannel network.

$ for SERVICES in kube-proxy kubelet flanneld docker; do     # Launch flannel earilier than docker
    systemctl restart $SERVICES
    systemctl enable $SERVICES
    systemctl status $SERVICES 
done

Verify the Network

Basically, to let docker use flannel network, you need to add --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} to docker daemon startup options. But above installation will automatically do that for you.

source /var/run/flannel/subnet.env
docker -d --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}

Use ps -ef|grep docker to check docker daemon options. Systemd unit file you should be able to see docker requires flannel

ps -ef | grep docker
cat /etc/systemd/system/docker.service.requires/flanneld.service

Once flannel network is running, each container on each host should be using different IP. They should be able to ping each other ip address. Use below to launch temporary containers to test

docker run -d ubuntu:trusty /bin/bash -c "while true; do echo hello; sleep 10; done"

References: [1][2]

ServiceAccount Error

If you hit below error, take out ServiceAccount of /etc/kubernetes/apiserver::KUBE_ADMISSION_CONTROL. Just delete the ServiceAccount. (issue 11222). Then restart the master node.

$ kubectl create -f redis-master1.yaml
Error from server: error when creating "redis-master1.yaml": Pod "redis-master1" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account

Play with K8S

By setting up a series of common services on k8s, I try to figure out how to host a production level web site on k8s. There needs multinode databases, volumes, load balancing, caching (redis/memcached), dns and monitoring.

Run Kubernetes Guestbook (Without GCE, without DNS)

I’m using kubernetes release-1.0. To run guestbook without GCE, without DNS, just on CentOS 7. Follow official guide.

git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
git checkout 5adae4e4a35202abe1c130e32240d0461b3a1c36    # the version which I experimented with

Create redis master pods and services

kubectl create -f examples/guestbook/redis-master-controller.yaml
kubectl get pods -o wide --all-namespaces=true
kubectl create -f examples/guestbook/redis-master-service.yaml

Create redis slave

vim examples/guestbook/redis-slave-controller.yaml
...    # comment out 'value: dns', uncomment `value: env` under GET_HOSTS_FROM
kubectl create -f examples/guestbook/redis-slave-controller.yaml
kubectl create -f examples/guestbook/redis-slave-service.yaml
kubectl logs redis-slave-*    # should show successfully sync with master

Create frontend controller

vim examples/guestbook/frontend-controller.yaml
...    # comment out 'value: dns', uncomment `value: env` under GET_HOSTS_FROM
kubectl create -f examples/guestbook/frontend-controller.yaml
kubectl create -f examples/guestbook/frontend-service.yaml

To expose frontend-service externally, I use NodePort. See publishing services

vim examples/guestbook/frontend-services.yaml
...    # Write `type: NodePort` under `spec`.
kubectl delete service frontend
kubectl create -f examples/guestbook/frontend-service.yaml

# to see which port is mapped
kubectl describe service frontend | grep NodePort    # in my case the nodeport is 30363

After you open firewall or iptables, you should be able to access the web frontend from http://<any-minion-ip>:<nodeport>.

To checkout what have been saved in redis

# on a k8s minion
$ docker exec -it k8s_worker.8bef144a_redis-slave-usfge_default_88895109-57a2-11e5-9a9e-005056957d30_3156e0e8 /bin/bash
$ redis-cli keys \*
1) "messages"
$ redis-cli get messages
",hello world,123,456"

Or just get it from web browser by http://<any-minion-ip>:<nodeport>/guestbook.php?cmd=get&key=messages.

Run Multinode Galera Mysql on Kubernetes

To run galera mysql on k8s, one solution is Planet Mysql’s (I think the blog content is a bit out-of-dated compared to its code repo). Checkout its entrypoint.sh. The key point is how each galera instance find the others’ addresses. Planet Mysql’s solution uses k8s service environment variable to locate the other hosts and write it to gcomm://. Each mysql galera instance comprise of a pod and a service.

Some solution that I can think of to help galera mysql instance to locate its peer, besides Planet Mysql’s

  • Each mysql galera instance comprise of a pod and a service. We just hardcode the ip address in pod yaml. If one pod down, we manually fill in the latest ip adn launch a new instance.
  • We launch an etcd cluster on k8s, all galera mysql instance find peer info from etcd.

Anyway, I will use Planet Mysql’s solution now. P.S. To host a mysql cluster, besides galera, there are also mysql sharding solutions on k8s, such as Vitess Mysql by youtube.

First, download the code base

git clone https://github.com/CaptTofu/mysql_replication_kubernetes.git
cd mysql_replication_kubernetes
git checkout f7c2bc4f411d6950ca575e804189114026b2ba69     # the version I experimented with
cd galera_sync_replication

I think we should set all WSREP_CLUSTER_ADDRESS to gcomm:// in each pxc-nodeN, according to the container image’s entrypoint.sh.

vim pxc-node1.yaml
...    # change variable WSREP_CLUSTER_ADDRESS to gcomm://
vim pxc-node2.yaml
...    # same as above
vim pxc-node3.yaml
...    # same as above

Next we launch galera mysql instances one by one

kubectl create -f pxc-node1.yaml
kubectl create -f pxc-node1-service.yaml

Wait the first galera mysql instance is running. Than launch below. This is because galera mysql needs at least one instance running to form the cluster.

kubectl create -f pxc-node2.yaml
kubectl create -f pxc-node2-service.yaml
kubectl create -f pxc-node3.yaml
kubectl create -f pxc-node3-service.yaml

To verify correctness. See galera mysql status guide.

# check the pod log
kubectl logs pxc-node3

# check galera mysql status
kubectl get pods -o wide
docker exec -it 98d568b88aac  /bin/bash
mysql -uroot -p
mysql> SHOW GLOBAL STATUS LIKE 'wsrep_%';    # You should be able to see cluster_size = 3

# write something to mysql and check the other node reading it
...

I think k8s service design may not be really appropriate for p2p clusters like galera mysql, redis, memcached etc. Because

  • In the cluster, e.g. galera mysql, why should I be forced to couple a service for each of the instance?
  • If an app wants to use service provided by those kind of cluster, e.g. memcached, it needs to know every ip/hostname of them, because the consistent hashing, load balancing and failover are performed on the app side. It is hard to just hide the cluster under one service, which exposes one ip.

So I think the more appropriate way is to provide those services, such as mysql, redis and memcached, from the outside of k8s. Stateless web apps, however, are more appropriate to run on k8s.

Debugger Pod

Sometime I need a simple pod to debug and test network connectivity to other pods. Here’s my pod yaml.

debugvm.yaml
apiVersion: v1
kind: Pod
metadata:
  name: debugvm
  labels:
    name: debugvm
spec:
  containers:
    - name: debugvm
      image: ubuntu:trusty
      command: [ "bin/bash", "-c", "while true; do echo 'hello world'; sleep 60; done" ]

Once launched in k8s, login to debugvm and use nc to test network connectivity to other pods

docker exec -it k8s_debugvm.* /bin/bash
nc <host> <port>

Run Multinode Redis for Memory Caching on Kubernetes

It is popular for web apps to use a cluster of Redis or Memcached as in-memory caching. I launch a 3-node redis cluster. Each redis has no idea of its peer. Each redis instance comprises of a pod and a service. Here’s redis instance 1

# the redis-master1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: redis-master1
  labels:
    name: redis-master1
spec:
  containers:
  - name: redis-master1
    image: redis
    ports:
    - containerPort: 6379

# the redis-master1-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-master1
  labels:
    name: redis-master1
spec:
  ports:
  # the default redis serving port
  - port: 6379
    targetPort: 6379
  selector:
    name: redis-master1
  type: NodePort

For redis instance 2 and 3, just change the 1 to 2 or 3 above. The web app, to use the 3 redis instances as the caching cluster, needs to perform consistent hashing itself and select which redis instance to access. If you use PHP, lib Predis can do this.

Why I need to attach a k8s service to each redis instance? The client needs to know each IP of redis instances and perform consistent hashing. However one k8s service just expose on IP. For a cluster which needs client-side sharding / load balancing / failover or whatever client-side staff, client-side needs to know more IPs. The default k8s service model doesn’t fit that well.

Setup DNS for Kubernetes

Kuberetes can use dns services for its service discovery. See official doc. DNS is a k8s addon. I followed this guide to setup dns. The cluster_dns address should be in range specified in /etc/kubernetes/apiserver KUBE_SERVICE_ADDRESSES.

# on each kubelet server

# add '--cluster_dns=10.254.0.10 --cluster_domain=cluster.local' to KUBELET_ARGS
vim /etc/kubernetes/kubelet

# restate kubelet
systemctl daemon-reload
systemctl restart kubelet

Launch the dns service, below is the manifest. Copied and edited from k8s repo.

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: default
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.254.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP

Launch the dns pods, below is the manifest. I have tried several different manifests before setting it up.

This one was copied and edited from here. Don’t use it. It has issue 12534.

apiVersion: v1
kind: ReplicationController
metadata:
    name: kube-dns
    namespace: default
    labels:
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
spec:
    replicas: 1
    selector:
        k8s-app: kube-dns
    template:
        metadata:
            labels:
                k8s-app: kube-dns
                kubernetes.io/cluster-service: "true"
        spec:
            dnsPolicy: "Default"  # Don't use cluster DNS.
            containers:
              - name: etcd
                image: quay.io/coreos/etcd:latest
                command: [
                        "/etcd",
                        "--listen-client-urls",
                        "http://127.0.0.1:2379,http://127.0.0.1:4001",
                        "--advertise-client-urls",
                        "http://127.0.0.1:2379,http://127.0.0.1:4001",
                ]
              - name: kube2sky
                image: gcr.io/google_containers/kube2sky:1.11
                args: [
                        # entrypoint = "/kube2sky",
                        "-domain=cluster.local",
                ]
              - name: skydns
                image: kubernetes/skydns:2014-12-23-001
                args: [
                        # entrypoint = "/skydns",
                        "-machines=http://localhost:4001",
                        "-addr=0.0.0.0:53",
                        "-domain=cluster.local",
                ]
                ports:
                  - name: dns
                    containerPort: 53
                    protocol: UDP

This one was copied and editted from k8s repo version. It has the same issue 12534.

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v9
  namespace: default
  labels:
    k8s-app: kube-dns
    version: v9
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v9
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v9
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: etcd
        image: gcr.io/google_containers/etcd:2.0.9
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        command:
        - /usr/local/bin/etcd
        - -data-dir
        - /var/etcd/data
        - -listen-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -advertise-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -initial-cluster-token
        - skydns-etcd
        volumeMounts:
        - name: etcd-storage
          mountPath: /var/etcd/data
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.11
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/kube2sky"
        - -domain=cluster.local
      - name: skydns
        image: gcr.io/google_containers/skydns:2015-03-11-001
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/skydns"
        - -machines=http://localhost:4001
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 1
          timeoutSeconds: 5
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
      volumes:
      - name: etcd-storage
        emptyDir: {}
      dnsPolicy: Default  # Don't use cluster DNS.

After studied the solution provide by guybrush’s. I pieced together the one works. The key is the missing command option, -kube_master_url=http://<your-master-host-ip>:8080, which prevents kube2sky from accessing kubernetes master node, and raise errors in 12534. Simple.

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v9
  namespace: default
  labels:
    k8s-app: kube-dns
    version: v9
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v9
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v9
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: etcd
        image: gcr.io/google_containers/etcd:2.0.9
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        command:
        - /usr/local/bin/etcd
        - -data-dir
        - /var/etcd/data
        - -listen-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -advertise-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -initial-cluster-token
        - skydns-etcd
        volumeMounts:
        - name: etcd-storage
          mountPath: /var/etcd/data
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.11
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/kube2sky"
        - -kube_master_url=http://10.62.98.245:8080
        - -domain=cluster.local
      - name: skydns
        image: gcr.io/google_containers/skydns:2015-03-11-001
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/skydns"
        - -machines=http://localhost:4001
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 1
          timeoutSeconds: 5
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
      volumes:
      - name: etcd-storage
        emptyDir: {}
      dnsPolicy: Default  # Don't use cluster DNS.

To verify whether dns works, I followed this guide. First you check each pod logs to verify. But if you can ignore this error

skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]

After that, create a busybox pod

# busybox.yaml
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

After the pod is running, run below command you should see

$ kubectl exec busybox -- nslookup kubernetes
Server:    10.254.0.10
Address 1: 10.254.0.10

Name:      kubernetes
Address 1: 10.254.0.1

$ kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local
Server:    10.254.0.10
Address 1: 10.254.0.10

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1

Run Heapster to Monitor Kubernetes Pods

I follow official guide to set Heapster up.

git clone https://github.com/kubernetes/heapster.git
kubectl create -f deploy/kube-config/influxdb/

If you hit below error

Error from server: error when creating "deploy/kube-config/influxdb/grafana-service.json": Service "monitoring-grafana" is forbidden: Namespace kube-system does not exist
Error from server: error when creating "deploy/kube-config/influxdb/heapster-controller.json": ReplicationController "heapster" is forbidden: Namespace kube-system does not exist
Error from server: error when creating "deploy/kube-config/influxdb/heapster-service.json": Service "heapster" is forbidden: Namespace kube-system does not exist
Error from server: error when creating "deploy/kube-config/influxdb/influxdb-grafana-controller.json": ReplicationController "infludb-grafana" is forbidden: Namespace kube-system does not exist
Error from server: error when creating "deploy/kube-config/influxdb/influxdb-service.json": Service "monitoring-influxdb" is forbidden: Namespace kube-system does not exist

This is because your k8s doesn’t have kube-system namespace. Solution is to change all kube-system to default in each manifests.

Heapster relies on k8s dns to work. You have to enable dns in k8s before start to install Heapster. Similar issue found here.

If you check heapster pod logs and found heapster keeps crash by below error

$ kubectl logs heapster-yrh5i
...
open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

The issue is that heapster is missing serviceaccount to access apiserver. Refer to this guide. My solution is to “use a heapster-only serviceaccount”. First, run below to create the serviceaccount

cat <EOF | kubectl create -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: heapster
EOF

After that, need to add serviceAccount: "heapster" and --source=kubernetes:http://kubernetes-ro?inClusterConfig=false&useServiceAccount=true&auth= in spec.

"spec": {
    "serviceAccount": "heapster",
    "containers": [
        {
            "image": "kubernetes/heapster:v0.18.0",
            "name": "heapster",
            "command": [
                "/heapster",
                "--source=kubernetes:http://kubernetes-ro?inClusterConfig=false&useServiceAccount=true&auth=",
                "--sink=influxdb:http://monitoring-influxdb:8086"
            ]
        }
    ]
}

If you meet below error

$ kubectl logs heapster-yrh5i
...
Failed to list *api.Node: Get http://kubernetes-ro/api/v1/nodes: dial tcp: lookup kubernetes-ro: no such host
Failed to list *api.Namespace: Get http://kubernetes-ro/api/v1/namespaces: dial tcp: lookup kubernetes-ro: no such host
Failed to list *api.Pod: Get http://kubernetes-ro/api/v1/pods?fieldSelector=spec.nodeName%21%3D: dial tcp: lookup kubernetes-ro: no such host

This is because the above kubernetes-ro is not dns resolvable. try kubectl exec busybox -- nslookup kubernetes-ro to verify. This thread indicates that kubernetes-ro is deprecated. So change it to <kubeneters-master-ip>:8080 (I wish I could avoid hardcoded ip).

{
    "apiVersion": "v1",
    "kind": "ReplicationController",
    "metadata": {
        "labels": {
            "k8s-app" : "heapster",
            "name": "heapster",
            "version": "v6"
        },
        "name": "heapster",
        "namespace": "default"
    },
    "spec": {
        "replicas": 1,
        "selector": {
            "k8s-app": "heapster",
            "version": "v6"
        },
        "template": {
            "metadata": {
                "labels": {
                    "k8s-app": "heapster",
                    "version": "v6"
                }
            },
            "spec": {
                "serviceAccount": "heapster",
                "containers": [
                    {
                        "image": "kubernetes/heapster:v0.18.0",
                        "name": "heapster",
                        "command": [
                            "/heapster",
                            "--source=kubernetes:http://10.62.98.245:8080?inClusterConfig=false&useServiceAccount=true&auth=",
                            "--sink=influxdb:http://monitoring-influxdb:8086",
                            "-v=20"
                        ]
                    }
                ]
            }
        }
    }
}

Next, I need to change each service manifest. Change the service type to ‘NodePort’. So that without GCD load balancer, I can still access them from outside. Restart all services.

"spec": {
        "type": "NodePort",

Access Grafana by the external NodePort mapping. Username and password is admin:admin. After you login Grafana, add a data source of type InfluxDB 0.8.x, url http://monitoring-influxdb:8086, database name ‘k8s’, database user name and password is root:root.

If you meet below error, heapster cannot resolve monitoring-influxdb, but busybox still can. That’s weird.

$ kubectl logs heapster -k30m4
...
failed to sync data to sinks - encountered the following errors: Post http://monitoring-influxdb:8086/db/k8s/series?u=root&p=root&time_precision=s: dial tcp: lookup monitoring-influxdb: no such host ;
Post http://monitoring-influxdb:8086/db/k8s/series?u=root&p=root&time_precision=m: dial tcp: lookup monitoring-influxdb: no such host

# try troubleshooting. why busybox can resolve dns, but heapster cannot?
$ kubectl exec heapster-k30m4 -- nslookup monitoring-influxdb
Server:    (null)
nslookup: can't resolve 'monitoring-influxdb': Try again
Address 1: ::1 localhost
Address 2: 127.0.0.1 localhost

$ kubectl exec busybox -- nslookup monitoring-influxdb
Server:    10.254.0.10
Address 1: 10.254.0.10

Name:      monitoring-influxdb
Address 1: 10.254.113.143

$ kubectl exec heapster-k30m4 -- nslookup monitoring-influxdb 10.254.0.10
Server:    10.254.0.10
Address 1: 10.254.0.10

nslookup: can't resolve 'monitoring-influxdb': Try again

$ kubectl exec busybox -- nslookup monitoring-influxdb 10.254.0.10
Server:    10.254.0.10
Address 1: 10.254.0.10

Name:      monitoring-influxdb
Address 1: 10.254.113.143

# check skydns log, found
$ kubectl logs kube-dns-v9-f9j9m skydns
...
skydns: can not forward, name too short (less than 2 labels): `monitoring-influxdb.'

I didn’t found the cause and solution. But there is a walkaround, let’s write hostname directly to /etc/hosts.

kubectl exec heapster-fk6xo -- /bin/sh -c 'echo ${MONITORING_INFLUXDB_SERVICE_HOST} monitoring-influxdb >> /etc/hosts'
kubectl exec infludb-grafana-9wosh -- /bin/sh -c 'echo ${MONITORING_INFLUXDB_SERVICE_HOST} monitoring-influxdb >> /etc/hosts'

Now if you check heapster log, it should be working. Let’s get back to Grafana to setup the dashboard, following this guide. Example queries here but quite outdated. Some queries that can work on the influxdb are

select derivative(value/1000000) from "cpu/usage_ns_cumulative" where $timeFilter and container_name = 'pxc-node1' group by time($interval) order asc
select mean(value/1024/1024) from "memory/usage_bytes_gauge" where $timeFilter and container_name = 'pxc-node1' group by time($interval) order asc

Tips on how to use influxdb (on its web portal)

  • The root:root is cluster admin. Create a database user (admin permission) and login with it before you access database.
  • Don’t set database user to be of the same name with cluster admin. Otherwise strange permission will occur. (Gosh this takes me greate time to troubleshoot)
  • User list series to see what tables we have
  • Series name with special chars, e.g. cpu/limit_gauge, should be wrapped in quotes, like select * "cpu/limit_gauge" limit 10
  • Field name in where clause should be wrapped in single quotes, e.g. where container_name = 'pxc-node1'

By the time I setup Heapster, container network io metrics cannot be collected (always zero)

  • Know issue pending fix. On centos docker stats <container-id> can neither catch network io.

Disk io metrics cannot be collected in Heapster either

  • Know issue. Heapster currrently doesn’t pull disk io.
  • Raw disk IO can be retrieved by querying kubelet stats api, e.g. curl 'http://<kubelet-ip>:10255/stats/default/pxc-node2/e2cf51fc-586e-11e5-9a9e-005056957d30/pxc-node2' | python -m json.tool. The id number in that url can be found in docker ps container name.
  • Another way is to query docker daemon socket, e.g. echo -e "GET /containers/<container-id>/stats HTTP/1.1\r\n" | nc -U /var/run/docker.sock | grep -e "^[[:space:]]*{.*}". See this article.
  • To understand what the diskio metrics mean, note that disk partition is named with Major:Minor number (lsblk to see). See here.

Run Docker Registry

A docker registry is needed if you want to build your own images and run them on k8s. I launched the docker registry on my master node, just one line of code, refer to the official guide.

docker run -d -p 5000:5000 --name registry registry:2

The docker registry should be running now. To allow dockerd to use our “insecure” docker registry, we need to add --insecure-registry <docker-registry>:5000 to /etc/sysconfig/docker::OPTIONS on each node. Restart dockerd then.

Enable Ceph for K8S Volume

Ceph provides persistent volume service for k8s. I will next enable k8s to use ceph as volume backend.

Install Ceph

I fetched a new centos7 VM to install a 1-node ceph on it. First, make sure

  • Kubernetes master node and the ceph node can ssh each other without key
  • Kubernetes master node can resolve the hostname of the ceph node
  • The ceph-deploy tool should be run on a separate node from where you install ceph

Install the ceph-deploy tool

# on kube master node
yum install -y ceph-deploy
mkdir -p ~/workspace/ceph/
cd ~/workspace/ceph

Install and launch ceph on the ceph node. Following the official tutorial

# on kube master node
# clean old installation
ceph-deploy purgedata <ceph-node>
ceph-deploy forgetkeys
#ceph-deploy purge    # to purge the ceph packages too

# install ceph
ceph-deploy install <ceph-node>

# create the cluster
ceph-deploy new <ceph-node>

# change ceph config
vim ./ceph.conf
...    # add `osd pool default size = 1`

# launch monitor
ceph-deploy mon create-initial

# launch osd
ceph-deploy disk list <ceph-node>
ceph-deploy osd prepare <ceph-node>:sdb
ceph-deploy osd activate <ceph-node>:/dev/sdb1

# push admin keys to the ceph node so that I can login without specifying monitor address and key file
ceph-deploy admin <ceph-node>

Verify ceph healthy status.

# on the ceph node
# test ceph healthy status
ceph health
ceph pg dump

# test ceph read write
ceph osd pool create data 128
echo hello world $(date) > test.txt
rados put test test.txt --pool data
rados get test test.txt.out --pool data
cat test.txt.out

Next, on each k8s node enable ceph rbd client

# on kube master node
# install ceph so that `rbd` is installed
ceph-deploy install 127.0.0.1 <kubelet1-ip> <kubelet2-ip> <kubelet3-ip>

# copy ceph admin keys to each node
cd ~/workspace/ceph
ceph-deploy admin 127.0.0.1 <kubelet1-ip> <kubelet2-ip> <kubelet3-ip>

To verify rbd is working

# on a randomly picked up k8s node
rbd create foo --size 1024
modprobe rbd
rbd map foo

mkfs.ext4 -m0 /dev/rbd/rbd/foo
mkdir /mnt/test-ceph-block-device
mount /dev/rbd/rbd/foo /mnt/test-ceph-block-device
cd /mnt/test-ceph-block-device
echo hello world $(date) > test.txt
cd ..
umount /dev/rbd/rbd/foo

# on another k8s node
modprobe rbd
rbd map foo
mkdir /mnt/test-ceph-block-device
mount /dev/rbd/rbd/foo /mnt/test-ceph-block-device
cd /mnt/test-ceph-block-device
cat test.txt    # here should print what we `echo` before
cd ..
umount /dev/rbd/rbd/foo

Enable Ceph in K8S

Following the official guide. First make sure you have /etc/ceph/ceph.client.admin.keyring on every k8s node, os that k8s can authenticate to ceph.

We need to create the volume vol1 and mkfs it before k8s can use.

# on kube master node
rbd create vol1 --size 1024 --pool data
mkfs.ext4 -m0 /dev/rbd/data/vol1
rbd unmap /dev/rbd/data/vol1

Next, we create a busybox pod which mounts the volume. The pod manifest is as follows. Start it by kubectl create -f busyboxv.yaml

# busyboxv.yaml
apiVersion: v1
kind: Pod
metadata:
  name: busyboxv
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busyboxv
    volumeMounts: 
      - mountPath: /mnt/rbd
        name: rbdpd
  volumes:
      - name: rbdpd
        rbd: 
          monitors: 
            - <ceph-node>:6789
          pool: data
          image: vol1
          user: admin
          keyring: /etc/ceph/ceph.client.admin.keyring
          fsType: ext4
          readOnly: false
  restartPolicy: Always

We write some file inside the busyboxv pod volume

# on kube master node
kubectl exec busyboxv -- sh -c 'echo hello world $(date) > /mnt/rbd/hello.txt'
kubectl exec busyboxv -- sh -c 'cat /mnt/rbd/hello.txt'

Next, kill the pod. Test whether the data in the volume persists

# on kube master node
kubectl delete pod busyboxv
sleep 5
rbd map vol1 --pool data
mkdir -p /mnt/rbd/data/vol1
mount /dev/rbd/data/vol1 /mnt/rbd/data/vol1
cat /mnt/rbd/data/vol1/hello.txt
umount /dev/rbd/data/vol1
rbd unmap /dev/rbd/data/vol1

So the k8s pod can use ceph volumes this way. We have persistent data storage now.

Thinking of K8S Drawbacks

Kubernetes looks fancy and easy at the first glance. But during my experience these days, I found there many key features missing on k8s if you want to setup a complete web service, especially when you don’t run it on GCE (Google Cloud Engine):

  • What if I want to use an external or custom load balancer, such as HAProxy? Openstack has LBaaS. But I wish to see it in k8s. K8s service does something of load balancing but it is not a full-featured load balancer. When you run k8s on GCE, GCE gives you external load balancer. But if you run k8s yourself, you don’t have it. By searching on google I found people already start to build their own LB for k8s.

  • Many people run k8s on virtual machines. The VM overlay network and flannel overlay network, overlay the actual traffic twice. This results in performance degradation. Openstack Kuryr is trying to solve this problem. By that time, running k8s on Magnum would definitely be an advantage.

  • Multi-tenancy requires network separation. By k8s network model, if you use flannel, it actually connects every container as if they are in the same network. This is not acceptable for enterprise security. Openstack Neutron, however, allows user to create multiple at least L3 separated networks and connect them by routers.

  • The classic k8s services model hides a group of pods behind a service, and expose one IP to its user. But many real-life services actually don’t fit into that model. Examples below. A solution for this is to use headless service. See[1][2]. However, users still need to implement their own service discovery, and perform IP registration. It is not really automated.

    • Memcached or Redis as the caching cluster. The client performs consistent hashing and decide which Memcached or Redis instance to access. The client needs to know IP of each instance. If you hide the whole Memcached or Redis cluster under one service, it would be not possible. One walkaround is to attach each instance to a service.

    • Mysql Galera cluster. In a similar way, the client needs to know each IP of the mysql instance, so that it can failover when one instance is down. In the k8s service model, if you hide the whole Mysql Galera cluster under one service (also on IP), it would not be possible.



Create an Issue or comment below