Monday 24 March 2014

Dynamic change of resources in AIX servers generates core dumps Error

Hi,
  
    We are observing core dump errors whenever we are increasing the cores or memory dynamically through HMC.

Example
 
    CORE FILE NAME
/u01/app/grid/log/ecdb1/crfmond/core
PROGRAM NAME
osysmond.bin
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
??
PROCESSOR ID
  hw_fru_id: 0
  hw_cpu_id: 2

Cause

Case I:

AIX perfstat_disk() library call has some known issues and according to this, perfstat_disk() core dumps as per above call stack.


Case II:

The problem is caused by dynamic changing the number of CPUs. Currently it is not supported in osysmond.bin for dynamic change of configuration.
Solution

Case I:

These symptom are reported in Bug 16492284 and has been closed as a duplicate of unpublished base bug 14541391. This base bug has been fixed in version 12.1

Case II:

After changing any configuration like adding/deleting cpus/disks/network cards, restart the ora.crf resource:
1) stop the crf stack

    <GRID_HOME>/bin/crsctl stop res ora.crf -init

2) Start the crf stack

    <GRID_HOME>/bin/crsctl start res ora.crf -init

There is neither CRS nor database downtime required for above ora.crf start/stop operation as recommended by oracle support.

Reference: Oracle Support Knowledge Base