Kubernetes¶

kubectl¶

get context¶

[2019-05-22 10:02] jgebhardt@jd-laptop: ~ $ kubectl config get-contexts
CURRENT   NAME             CLUSTER          AUTHINFO         NAMESPACE
          k8s-int          k8s-int          k8s-int
*         k8s-jgebhardt    k8s-jgebhardt    k8s-jgebhardt
          minikube         minikube         minikube

switch context¶

[2019-05-22 10:02] jgebhardt@jd-laptop: ~ $ kubectl config use-context k8s-int
Switched to context "k8s-int".

switch namespace¶

[2020-02-08 14:07:01 (k8s.jd:default)] jgebhardt@jd-laptop: ~ $ kubectl config set-context --current --namespace=gitlab-managed-apps
Context "k8s.jd" modified.

use kubectl with evil loadbalancer config¶

kubectl --insecure-skip-tls-verify --kubeconfig=cluster1.conf get nodes
kubectl --insecure-skip-tls-verify --kubeconfig=cluster2.conf get nodes
kubectl --insecure-skip-tls-verify --kubeconfig=cluster3.conf get nodes

port forwarding¶

kubectl port-forward <pod> <port>

Issues encountered¶

Pod Networking failing¶

Recently Pods encountered difficulties to reach each other.
Viewing logs you could see that these containers couldn't reach the api either.
As it turns out calico on the minions somehow dropped it IP.
By restarting docker on these nodes calico gets back to work.

Strange events

ubuntu@jump01:~$ kl get events --all-namespaces
NAMESPACE     LAST SEEN   FIRST SEEN   COUNT     NAME                                         KIND          SUBOBJECT                          TYPE      REASON                    SOURCE                               MESSAGE
[...]
monitoring    16s         16s          1         alertmanager-main-0.1560323e91212d48         Pod           spec.containers{config-reloader}   Normal    Created                   kubelet, node-2-k8s   Created container
monitoring    15s         15s          1         alertmanager-main-0.1560323eb1eac58d         Pod           spec.containers{config-reloader}   Normal    Started                   kubelet, node-2-k8s   Started container
monitoring    10s         10s          1         alertmanager-main-0.1560323fed89810e         Pod           spec.containers{alertmanager}      Warning   Unhealthy                 kubelet, node-2-k8s   Liveness probe failed: Get http://10.233.93.11:9093/api/v1/status: dial tcp 10.233.93.11:9093: getsockopt: connection refused
monitoring    1s          11s          3         alertmanager-main-0.1560323face43dee         Pod           spec.containers{alertmanager}      Warning   Unhealthy                 kubelet, node-2-k8s   Readiness probe failed: Get http://10.233.93.11:9093/api/v1/status: dial tcp 10.233.93.11:9093: getsockopt: connection refused

Pods can not reach other

ubuntu@jump01:~$ kl logs -n monitoring prometheus-k8s-0 prometheus
[...]
level=error ts=2018-10-22T16:52:07.65126041Z caller=notifier.go:473 component=notifier alertmanager=http://10.233.1.197:9093/api/v1/alerts count=1 msg="Error sending alert" err="context deadline exceeded"
level=error ts=2018-10-22T16:52:07.651293742Z caller=notifier.go:473 component=notifier alertmanager=http://10.233.1.194:9093/api/v1/alerts count=1 msg="Error sending alert" err="Post http://10.233.1.194:9093/api/v1/alerts: dial tcp 10.233.1.194:9093: i/o timeout"
[...]

No more calico tunl address on minions

ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-1-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.1.192/32 brd 10.233.1.192 scope global tunl0
    valid_lft forever preferred_lft forever
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0

restart docker on affected nodes

ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s sudo systemctl restart docker
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s sudo systemctl restart docker
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.93.13/32 brd 10.233.93.13 scope global tunl0
    valid_lft forever preferred_lft forever
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.97.70/32 brd 10.233.97.70 scope global tunl0
    valid_lft forever preferred_lft forever
ubuntu@jump01:~$

Node unable to send updates to apiserver¶

Recently I stumbled upon many failing api-requests. One node tried to send status updates but api refuses these updates:

log

Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461473   12567 kubelet_node_status.go:377] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/addresses\":[{\"type\":\"InternalIP\"},{\"type\":\"InternalIP\"},{\"type\":\"Hostname\"}],\"$setElementOrder/conditions\":[{\"type\":\"OutOfDisk\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"addresses\":[{\"address\":\"10.64.43.8\",\"type\":\"InternalIP\"},{\"address\":\"10.150.28.15\",\"type\":\"InternalIP\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"Ready\"}]}}" for node "node-2-k8s": Node "node-2-k8s" is invalid: status.addresses[1]: Duplicate value: core.NodeAddress{Type:"InternalIP", Address:"10.150.28.15"}
Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461646   12567 kubelet_node_status.go:377] Error updating node status, will retry: error getting node "node-2-k8s": Get https://10.150.28.9:6443/api/v1/nodes/node-2-k8s?timeout=10s: write tcp 10.150.28.15:45940->10.150.28.9:6443: use of closed network connection
Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461665   12567 kubelet_node_status.go:366] Unable to update node status: update node status exceeds retry count
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461703   12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: watch of *v1.Pod ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Unexpected watch close - watch lasted less than a second and no items received
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461709   12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: watch of *v1.Node ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Unexpected watch close - watch lasted less than a second and no items received
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461764   12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: watch of *v1.Service ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Unexpected watch close - watch lasted less than a second and no items received

This node have two interfaces. It semse that the apiserver gets confused about this: GitHub Issue #54492

By adding the --node-ip flag the problem vanished...

PV and PVC hanging in Terminate state forever¶

There might be left over finalizers. By removing them PV/PVC gets deleted immediately. See this GitHub Issue #69697

remove finalizer

kubectl edit pv <pvname>
# remove section containing "finalizer"

Renew Certificates prior to 1.13¶

Backup your config

tar -cvf kube.conf.tar /etc/kubernetes/

Renew all certificates

This will remove all certificates and all files for authentification

mv /etc/kubernetes/admin.conf{,.old}
mv /etc/kubernetes/controller-manager.conf{,.old}
mv /etc/kubernetes/kubelet.conf{,.old}
mv /etc/kubernetes/scheduler.conf{,.old}

find /etc/kubernetes/pki/ -mindepth 1 -type f -not -iname "*ca*" -and -not -iname "*sa*" -print -delete

Shutdown all Containers and kubelet and reissue all certificates

systemctl stop kubelet; systemctl restart docker;

kubeadm init phase certs all --apiserver-advertise-address 10.42.0.15 --apiserver-cert-extra-sans 10.42.0.11,10.42.0.26,10.42.0.32 --ignore-preflight-errors=all --node-name $HOSTNAME

Renew etcd certificates

For some reason etcd-certificates get issued the wrong way. So reissue them...

find /etc/kubernetes/pki/etcd/ -mindepth 1 -type f -not -iname "*ca*" -and -not -iname "*sa*" -print -delete
kubeadm init phase certs etcd-peers  --ignore-preflight-errors=all --node-name $HOSTNAME
systemctl stop kubelet; systemctl restart docker; systemctl start kubelet;

Restore previous config

Configuration get messed up. Restore your backed up manifests and restart kubelet

tar -C / -xvf /root/kube.conf.tar etc/kubernetes/manifests/
systemctl restart kubelet

Pin packages update and reboot

cat > /etc/apt/preferences.d/kubelet <EOF
Package: kube*
Pin: version 1.12.7*
Pin-Priority: 1000
EOF

apt update; apt -y dist-upgrade; reboot

move cni from weave to calico¶

Remove weave¶

kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Update pod cidr¶

kubeadm init phase control-plane controller-manager --pod-network-cidr=192.168.0.0/16

Install calico¶

manually

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml

with helm

helm repo add projectcalico https://docs.projectcalico.org/charts
helm install calico projectcalico/tigera-operator --version v3.20.0

If you want to use another cidr than calico's default. Change it in its DaemonSet

...
- name: CALICO_IPV4POOL_CIDR
value: "<your_podCIDR>"
...

Links¶

Last update: August 5, 2021