LI80538: CASSANDRA OPERATOR POD IS BEING CONSTANTLY RESTARTED DUE TO OOM

APAR status

Closed as Vendor Solution.

Error description

You may see Cassandra operator pod being constantly restarted .
This is due to the  Cassandra operator getting killed due to
Out Of Memory
In the example below, you can see that doing a describe of
cassandra operator shows that, it restarted 428 times with the
reason 'OOMKilled'
----
State:          Running
      Started:      Wed, 19 Dec 2018 09:36:42 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 19 Dec 2018 09:28:12 +0000
      Finished:     Wed, 19 Dec 2018 09:36:30 +0000
    Ready:          True
    Restart Count:  428
    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     100m
      memory:  128Mi
------
This forces the Cassandra operator to constantly restart.

Local fix

The workaround is to increase the memory of cassandra operator.
To do so, follow the below steps:
1- Grab cassandra operator deployment name
OPERATOR_DEPLOYMENT_NAME=kubectl get deployments -n
$APIC_NAMESPACE | grep 'operator' | awk '{print $1}'
2- Edit the deployment and change limits and requests for
memory from 128Mi --> 256Mi
kubectl edit deployment $OPERATOR_DEPLOYMENT_NAME -n
$APIC_NAMESPACE
3- After editing the deployment, operator pod restarts and will
come up with new memory assignmennt.
This will stop the cassandra operator pod from killing itself
due to OOM.

Problem summary

k8 1.12 has an internal bug which causes OOM on cassandra
operator pod.

Bumped up the memory of cassandra operator to 256MB which should
be sufficient for the operator to operate on a cassandra cluster
running 3 pods but in Kubernetes 1.13 the main issue is fixed as
to why the memory utilization is high

Problem conclusion

Fix targeted for release v2018.4.1.2. Customers are advised to
upgrade their k8 infrastructure to version 1.13.

Temporary fix

Comments

Code change to allocate additional memory for a known
kubernetes 1.12 issue.

APAR Information

APAR number
LI80538
Reported component name
API CONNECT ENT
Reported component ID
5725Z2201
Reported release
18X
Status
CLOSED ISV
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-01-14
Closed date
2019-02-27
Last modified date
2019-02-27

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"18X","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
29 September 2021

Tips

LI80538: CASSANDRA OPERATOR POD IS BEING CONSTANTLY RESTARTED DUE TO OOM

Subscribe

APAR status

Closed as Vendor Solution.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

Document Information

Share your feedback

Need support?