Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

kubernetes - Hashicorp vault on k8s: getting error 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity

I'm deploying ha vault on k8s (EKS) and getting this error on one of the vault pods, which I think is causing other pods to fail also : This is the output of the kubectl get events:
search for : nodes are available: 1 Insufficient memory

26m         Normal    Created                        pod/vault-1                                 Created container vault
26m         Normal    Started                        pod/vault-1                                 Started container vault
26m         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m40s       Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-1                                 Successfully assigned vault-foo/vault-1 to ip-10-101-0-103.ec2.internal
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48s         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48s         Normal    Created                        pod/vault-1                                 Created container vault
99s         Normal    Started                        pod/vault-1                                 Started container vault
60s         Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
27m         Normal    TaintManagerEviction           pod/vault-2                                 Cancelling deletion of Pod vault-foo/vault-2
28m         Warning   FailedScheduling               pod/vault-2                                 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu.
28m         Warning   FailedScheduling               pod/vault-2                                 0/5 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
27m         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-103.ec2.internal
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
27m         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
27m         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
26m         Normal    Created                        pod/vault-2                                 Created container vault
26m         Normal    Started                        pod/vault-2                                 Started container vault
26m         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m26s       Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m36s       Warning   FailedScheduling               pod/vault-2                                 0/7 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
114s        Warning   FailedScheduling               pod/vault-2                                 0/8 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
104s        Warning   FailedScheduling               pod/vault-2                                 0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
93s         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-82.ec2.internal
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
83s         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
81s         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
38s         Normal    Created                        pod/vault-2                                 Created container vault
37s         Normal    Started                        pod/vault-2                                 Started container vault
38s         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
4s          Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-qwsmz    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-qwsmz to ip-10-101-2-91.ec2.internal
2m37s       Normal    Pulling                        pod/vault-agent-injector-d54bdc675-qwsmz    Pulling image "hashicorp/vault-k8s:latest"
2m36s       Normal    Pulled                         pod/vault-agent-injector-d54bdc675-qwsmz    Successfully pulled image "hashicorp/vault-k8s:latest"
2m36s       Normal    Created                        pod/vault-agent-injector-d54bdc675-qwsmz    Created container sidecar-injector
2m35s       Normal    Started                        pod/vault-agent-injector-d54bdc675-qwsmz    Started container sidecar-injector
28m         Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-wz9ws    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-wz9ws to ip-10-101-0-87.ec2.internal
28m         Normal    Pulled                         pod/vault-agent-injector-d54bdc675-wz9ws    Container image "hashicorp/vault-k8s:latest" already present on machine
28m         Normal    Created                        pod/vault-agent-injector-d54bdc675-wz9ws    Created container sidecar-injector
28m         Normal    Started                        pod/vault-agent-injector-d54bdc675-wz9ws    Started container sidecar-injector
3m22s       Normal    Killing                        pod/vault-agent-injector-d54bdc675-wz9ws    Stopping container sidecar-injector
3m22s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Readiness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: connection refused
3m18s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Liveness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: no route to host
28m         Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-wz9ws
2m38s       Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-qwsmz
28m         Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
2m38s       Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
28m         Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
28m         Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
26m         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
3m24s       Normal    DeletingLoadBalancer           service/vault-ui                            Deleting load balancer
3m23s       Warning   PortNotAllocated               service/vault-ui                            Port 32476 is not allocated; repairing
3m23s       Warning   ClusterIPNotAllocated          service/vault-ui                            Cluster IP 172.20.216.143 is not allocated; repairing
3m22s       Warning   FailedToUpdateEndpointSlices   service/vault-ui                            Error updating Endpoint Slices for Service vault-foo/vault-ui: failed to update vault-ui-crtg4 EndpointSlice for Service vault-foo/vault-ui: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "vault-ui-crtg4": the object has been modified; please apply your changes to the latest version and try again
3m16s       Warning   FailedToUpdateEndpoint         endpoints/vault-ui                          Failed to update endpoint vault-foo/vault-ui: Operation cannot be fulfilled on endpoints "vault-ui": the object has been modified; please apply your changes to the latest version and try again
2m52s       Normal    DeletedLoadBalancer            service/vault-ui                            Deleted load balancer
2m39s       Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
2m36s       Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
96s         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
28m         Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful
2m40s       Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
2m38

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There are several issue here and they are all represented by the error messages like:

0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.

You got 9 Nodes but none of them are available for scheduling due to a different set of conditions. Note that each Node can be affected by multiple issues and so the numbers can add up to more than what you have on total nodes.

Let's break them down one by one:

  • Insufficient memory: Execute kubectl describe node <node-name> to check how much free memory is available there. Check the requests and limits of your pods. Note that Kubernetes will block the full amount of memory a pod requests regardless how much this pod uses.

  • Insufficient cpu: Analogical as above.

  • node(s) didn't match pod affinity/anti-affinity: Check your affinity/anti-affinity rules.

  • node(s) didn't satisfy existing pods anti-affinity rules: Same as above.

  • node(s) had volume node affinity conflict: Happens when pod was not able to be scheduled because it cannot connect to the volume from another Availability Zone. You can fix this by creating a storageclass for a single zone and than use that storageclass in your PVC.

  • node(s) were unschedulable: This is because the node is marked as Unschedulable. Which leads us to the next issue below:

  • node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate: This corresponds to the NodeCondition Ready = False. You can use kubectl describe node to check taints and kubectl taint nodes <node-name> <taint-name>- in order to remove them. Check the Taints and Tolerations for more details.

Also there is a GitHub thread with a similar issue that you may find useful.

Try checking/eliminating those issue one by one (starting from the first listed above) as they can make a "chain reaction" in some scenarios.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...