Saturday, 18 January 2014

Troubleshoting RAC Load Balance not happening properly.

Hi,
  Recently we have experienced a problem on RAC loadbalancing. In one of our environment, we have two node RAC. We observed client sessions were not getting distributed properly. All load was exerted on one node hence cpu% was going beyond 90% and other node was 10-15% load.

We did a series of tests and diagnose the problem. I am sharing this, hope it will work for you as well.


Step-1:
    Check for the scan listener running on nodes
     export GRID_HOME
    $ps -ef|grep tns

   In my case, node1 is having listener_scan1 and node2 is having listener_scan2 and listener_scan3

 Now I tested on node1
  $lsnrctl status listener_scan1

important of output showing

Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.10)(PORT=1521)))
Services Summary...
Service "orcl" has 1 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
Service "oraclXDB" has 2 instance(s).
  Instance "oracl1", status READY, has 1 handler(s) for this service...
The command completed successfully


on node2

$lsnrctl status listener_scan2

Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1521)))
Services Summary...
Service "orcl" has 2 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 2 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
  Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully


on node2
$lsnrctl status listener_scan3

Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.11)(PORT=1521)))
Services Summary...
Service "orcl" has 1 instance(s).
  Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 2 instance(s).
  Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully

Here you can see listener_scan1 and listener_scan3 identifies only one instance on which they are running, but scan2 can identify all the instances in the cluster.

Hence clearly it indicates scan1 and scan3 are not getting load  information from pmon of all nodes in the cluster. So it is clear that we need to register all the instance(pmon) to  all listeners.

In my case:
I did

on node1 database
sql>alter system register.
same on node2 as well

then I bounced back the listener

$lsnrctl reload scan_names in their corresponding instances.

Now after that I cleary see that listener status can display all the instances in the cluster like listener_scan2

Hope this can give some idea resolving such issues.

Any query, please feel free to mail manojpalbabu@gmail.com