Error updating view attempt to reopen an open container

Kubernetes, DC/OS, Swarm) Kubernetes What happened: Node is marked as Not Ready and a cascading effect takes places on my k8s cluster What you expected to happen: How to reproduce it (as minimally and precisely as possible): Not sure Anything else we need to know: In /var/log/of my k8s master server, I can see that one node has been marked as Not Ready. A node is almost at random marked as Not Ready by my master.

Is there a reason logged somewhere as to why this was done? However, I was also unable to ssh into my node when it was marked as notready.

error updating view attempt to reopen an open container-40error updating view attempt to reopen an open container-88

Dec 7 k8s-agent-B3E1E1AA-1 docker[23202]: E1207 .770011 23237 kubelet_node_status.go:302] Error updating node status, will retry: failed to get node address from cloud provider: instance not found Dec 7 k8s-master-B3E1E1AA-0 docker[1806]: E1207 .770756 1874 kubelet_node_status.go:302] Error updating node status, will retry: failed to get node address from cloud provider: instance not found Sent: Wednesday, December 20, 2017 PM To: Azure/acs-engine Cc: Brendan Burns; Mention Subject: Re: [Azure/acs-engine] Reason why node is marked as Not Ready (#863) I also back to the logs from our previous incident where we lost 3 out of 4 nodes.Can you send me the subscription/resource id of a VM that corresponds to k8s-master-B3E1E1AA-0 I'd like to cross-check with the ARM logs.Thanks --brendan Sent: Wednesday, December 20, 2017 PM To: Azure/acs-engine Cc: Brendan Burns; Mention Subject: Re: [Azure/acs-engine] Reason why node is marked as Not Ready (#863) I also back to the logs from our previous incident where we lost 3 out of 4 nodes.Since the problem started at the same time on 3 different clusters it must be linked to something external to Kubernetes. like migrate Data Disks provisionned on Azure, DNS rules, ingress ?My clusters are not OOMed ou CPU overloaded : See : kubernetes/kubernetes#43516 We are also facing similar issues since this morning/yesterday evening (GMT ) on our production cluster. Because i've a lot of critical applications Kafka, Mongo etc etc.

Leave a Reply