IBM Support

LI80538: CASSANDRA OPERATOR POD IS BEING CONSTANTLY RESTARTED DUE TO OOM

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as Vendor Solution.

Error description

  • You may see Cassandra operator pod being constantly restarted .
    This is due to the  Cassandra operator getting killed due to
    Out Of Memory
    In the example below, you can see that doing a describe of
    cassandra operator shows that, it restarted 428 times with the
    reason 'OOMKilled'
    ----
    State:          Running
          Started:      Wed, 19 Dec 2018 09:36:42 +0000
        Last State:     Terminated
          Reason:       OOMKilled
          Exit Code:    137
          Started:      Wed, 19 Dec 2018 09:28:12 +0000
          Finished:     Wed, 19 Dec 2018 09:36:30 +0000
        Ready:          True
        Restart Count:  428
        Limits:
          cpu:     100m
          memory:  128Mi
        Requests:
          cpu:     100m
          memory:  128Mi
    ------
    This forces the Cassandra operator to constantly restart.
    

Local fix

  • The workaround is to increase the memory of cassandra operator.
    To do so, follow the below steps:
    1- Grab cassandra operator deployment name
    OPERATOR_DEPLOYMENT_NAME=kubectl get deployments -n
    $APIC_NAMESPACE | grep 'operator' | awk '{print $1}'
    2- Edit the deployment and change limits and requests for
    memory from 128Mi --> 256Mi
    kubectl edit deployment $OPERATOR_DEPLOYMENT_NAME -n
    $APIC_NAMESPACE
    3- After editing the deployment, operator pod restarts and will
    come up with new memory assignmennt.
    This will stop the cassandra operator pod from killing itself
    due to OOM.
    

Problem summary

  • k8 1.12 has an internal bug which causes OOM on cassandra
    operator pod.
    
    Bumped up the memory of cassandra operator to 256MB which should
    be sufficient for the operator to operate on a cassandra cluster
    running 3 pods but in Kubernetes 1.13 the main issue is fixed as
    to why the memory utilization is high
    

Problem conclusion

  • Fix targeted for release v2018.4.1.2. Customers are advised to
    upgrade their k8 infrastructure to version 1.13.
    

Temporary fix

Comments

  • Code change to allocate additional memory for a known
    kubernetes 1.12 issue.
    

APAR Information

  • APAR number

    LI80538

  • Reported component name

    API CONNECT ENT

  • Reported component ID

    5725Z2201

  • Reported release

    18X

  • Status

    CLOSED ISV

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-01-14

  • Closed date

    2019-02-27

  • Last modified date

    2019-02-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"18X","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
29 September 2021