APAR status
Closed as Vendor Solution.
Error description
You may see Cassandra operator pod being constantly restarted . This is due to the Cassandra operator getting killed due to Out Of Memory In the example below, you can see that doing a describe of cassandra operator shows that, it restarted 428 times with the reason 'OOMKilled' ---- State: Running Started: Wed, 19 Dec 2018 09:36:42 +0000 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 19 Dec 2018 09:28:12 +0000 Finished: Wed, 19 Dec 2018 09:36:30 +0000 Ready: True Restart Count: 428 Limits: cpu: 100m memory: 128Mi Requests: cpu: 100m memory: 128Mi ------ This forces the Cassandra operator to constantly restart.
Local fix
The workaround is to increase the memory of cassandra operator. To do so, follow the below steps: 1- Grab cassandra operator deployment name OPERATOR_DEPLOYMENT_NAME=kubectl get deployments -n $APIC_NAMESPACE | grep 'operator' | awk '{print $1}' 2- Edit the deployment and change limits and requests for memory from 128Mi --> 256Mi kubectl edit deployment $OPERATOR_DEPLOYMENT_NAME -n $APIC_NAMESPACE 3- After editing the deployment, operator pod restarts and will come up with new memory assignmennt. This will stop the cassandra operator pod from killing itself due to OOM.
Problem summary
k8 1.12 has an internal bug which causes OOM on cassandra operator pod. Bumped up the memory of cassandra operator to 256MB which should be sufficient for the operator to operate on a cassandra cluster running 3 pods but in Kubernetes 1.13 the main issue is fixed as to why the memory utilization is high
Problem conclusion
Fix targeted for release v2018.4.1.2. Customers are advised to upgrade their k8 infrastructure to version 1.13.
Temporary fix
Comments
Code change to allocate additional memory for a known kubernetes 1.12 issue.
APAR Information
APAR number
LI80538
Reported component name
API CONNECT ENT
Reported component ID
5725Z2201
Reported release
18X
Status
CLOSED ISV
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-01-14
Closed date
2019-02-27
Last modified date
2019-02-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"18X","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
29 September 2021