Kubernetes¶
kubectl¶
get context¶
[2019-05-22 10:02] jgebhardt@jd-laptop: ~ $ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
k8s-int k8s-int k8s-int
* k8s-jgebhardt k8s-jgebhardt k8s-jgebhardt
minikube minikube minikube
switch context¶
[2019-05-22 10:02] jgebhardt@jd-laptop: ~ $ kubectl config use-context k8s-int
Switched to context "k8s-int".
switch namespace¶
[2020-02-08 14:07:01 (k8s.jd:default)] jgebhardt@jd-laptop: ~ $ kubectl config set-context --current --namespace=gitlab-managed-apps
Context "k8s.jd" modified.
use kubectl with evil loadbalancer config¶
kubectl --insecure-skip-tls-verify --kubeconfig=cluster1.conf get nodes
kubectl --insecure-skip-tls-verify --kubeconfig=cluster2.conf get nodes
kubectl --insecure-skip-tls-verify --kubeconfig=cluster3.conf get nodes
port forwarding¶
kubectl port-forward <pod> <port>
Issues encountered¶
Pod Networking failing¶
Recently Pods encountered difficulties to reach each other.
Viewing logs you could see that these containers couldn't reach the api either.
As it turns out calico on the minions somehow dropped it IP.
By restarting docker on these nodes calico gets back to work.
Strange events
ubuntu@jump01:~$ kl get events --all-namespaces
NAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
[...]
monitoring 16s 16s 1 alertmanager-main-0.1560323e91212d48 Pod spec.containers{config-reloader} Normal Created kubelet, node-2-k8s Created container
monitoring 15s 15s 1 alertmanager-main-0.1560323eb1eac58d Pod spec.containers{config-reloader} Normal Started kubelet, node-2-k8s Started container
monitoring 10s 10s 1 alertmanager-main-0.1560323fed89810e Pod spec.containers{alertmanager} Warning Unhealthy kubelet, node-2-k8s Liveness probe failed: Get http://10.233.93.11:9093/api/v1/status: dial tcp 10.233.93.11:9093: getsockopt: connection refused
monitoring 1s 11s 3 alertmanager-main-0.1560323face43dee Pod spec.containers{alertmanager} Warning Unhealthy kubelet, node-2-k8s Readiness probe failed: Get http://10.233.93.11:9093/api/v1/status: dial tcp 10.233.93.11:9093: getsockopt: connection refused
Pods can not reach other
ubuntu@jump01:~$ kl logs -n monitoring prometheus-k8s-0 prometheus
[...]
level=error ts=2018-10-22T16:52:07.65126041Z caller=notifier.go:473 component=notifier alertmanager=http://10.233.1.197:9093/api/v1/alerts count=1 msg="Error sending alert" err="context deadline exceeded"
level=error ts=2018-10-22T16:52:07.651293742Z caller=notifier.go:473 component=notifier alertmanager=http://10.233.1.194:9093/api/v1/alerts count=1 msg="Error sending alert" err="Post http://10.233.1.194:9093/api/v1/alerts: dial tcp 10.233.1.194:9093: i/o timeout"
[...]
No more calico tunl address on minions
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-1-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.233.1.192/32 brd 10.233.1.192 scope global tunl0
valid_lft forever preferred_lft forever
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
restart docker on affected nodes
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s sudo systemctl restart docker
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s sudo systemctl restart docker
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-2-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.233.93.13/32 brd 10.233.93.13 scope global tunl0
valid_lft forever preferred_lft forever
ubuntu@jump01:~$ ssh -i .ssh/priv.pem node-3-k8s ip a s dev tunl0
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.233.97.70/32 brd 10.233.97.70 scope global tunl0
valid_lft forever preferred_lft forever
ubuntu@jump01:~$
Node unable to send updates to apiserver¶
Recently I stumbled upon many failing api-requests. One node tried to send status updates but api refuses these updates:
log
Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461473 12567 kubelet_node_status.go:377] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/addresses\":[{\"type\":\"InternalIP\"},{\"type\":\"InternalIP\"},{\"type\":\"Hostname\"}],\"$setElementOrder/conditions\":[{\"type\":\"OutOfDisk\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"addresses\":[{\"address\":\"10.64.43.8\",\"type\":\"InternalIP\"},{\"address\":\"10.150.28.15\",\"type\":\"InternalIP\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2018-10-25T10:16:09Z\",\"type\":\"Ready\"}]}}" for node "node-2-k8s": Node "node-2-k8s" is invalid: status.addresses[1]: Duplicate value: core.NodeAddress{Type:"InternalIP", Address:"10.150.28.15"}
Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461646 12567 kubelet_node_status.go:377] Error updating node status, will retry: error getting node "node-2-k8s": Get https://10.150.28.9:6443/api/v1/nodes/node-2-k8s?timeout=10s: write tcp 10.150.28.15:45940->10.150.28.9:6443: use of closed network connection
Oct 25 10:16:09 node-2-k8s kubelet[12567]: E1025 10:16:09.461665 12567 kubelet_node_status.go:366] Unable to update node status: update node status exceeds retry count
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461703 12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: watch of *v1.Pod ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Unexpected watch close - watch lasted less than a second and no items received
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461709 12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: watch of *v1.Node ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Unexpected watch close - watch lasted less than a second and no items received
Oct 25 10:16:09 node-2-k8s kubelet[12567]: W1025 10:16:09.461764 12567 reflector.go:341] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: watch of *v1.Service ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Unexpected watch close - watch lasted less than a second and no items received
This node have two interfaces. It semse that the apiserver gets confused about this: GitHub Issue #54492
By adding the --node-ip
flag the problem vanished...
PV and PVC hanging in Terminate state forever¶
There might be left over finalizers. By removing them PV/PVC gets deleted immediately. See this GitHub Issue #69697
remove finalizer
kubectl edit pv <pvname>
# remove section containing "finalizer"
Renew Certificates prior to 1.13¶
Backup your config
tar -cvf kube.conf.tar /etc/kubernetes/
Renew all certificates
This will remove all certificates and all files for authentification
mv /etc/kubernetes/admin.conf{,.old}
mv /etc/kubernetes/controller-manager.conf{,.old}
mv /etc/kubernetes/kubelet.conf{,.old}
mv /etc/kubernetes/scheduler.conf{,.old}
find /etc/kubernetes/pki/ -mindepth 1 -type f -not -iname "*ca*" -and -not -iname "*sa*" -print -delete
Shutdown all Containers and kubelet and reissue all certificates
systemctl stop kubelet; systemctl restart docker;
kubeadm init phase certs all --apiserver-advertise-address 10.42.0.15 --apiserver-cert-extra-sans 10.42.0.11,10.42.0.26,10.42.0.32 --ignore-preflight-errors=all --node-name $HOSTNAME
Renew etcd certificates
For some reason etcd-certificates get issued the wrong way. So reissue them...
find /etc/kubernetes/pki/etcd/ -mindepth 1 -type f -not -iname "*ca*" -and -not -iname "*sa*" -print -delete
kubeadm init phase certs etcd-peers --ignore-preflight-errors=all --node-name $HOSTNAME
systemctl stop kubelet; systemctl restart docker; systemctl start kubelet;
Restore previous config
Configuration get messed up. Restore your backed up manifests and restart kubelet
tar -C / -xvf /root/kube.conf.tar etc/kubernetes/manifests/
systemctl restart kubelet
Pin packages update and reboot
cat > /etc/apt/preferences.d/kubelet <EOF
Package: kube*
Pin: version 1.12.7*
Pin-Priority: 1000
EOF
apt update; apt -y dist-upgrade; reboot
move cni from weave to calico¶
Remove weave¶
kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Update pod cidr¶
kubeadm init phase control-plane controller-manager --pod-network-cidr=192.168.0.0/16
Install calico¶
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
helm repo add projectcalico https://docs.projectcalico.org/charts
helm install calico projectcalico/tigera-operator --version v3.20.0
If you want to use another cidr than calico's default. Change it in its DaemonSet
...
- name: CALICO_IPV4POOL_CIDR
value: "<your_podCIDR>"
...