Sharing Experiences ~~~: March 2014

Monday 24 March 2014

Dynamic change of resources in AIX servers generates core dumps Error

Hi,

    We are observing core dump errors whenever we are increasing the cores or memory dynamically through HMC.

Example

    CORE FILE NAME
/u01/app/grid/log/ecdb1/crfmond/core
PROGRAM NAME
osysmond.bin
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
??
PROCESSOR ID
hw_fru_id: 0
hw_cpu_id: 2

Cause

Case I:

AIX perfstat_disk() library call has some known issues and according to this, perfstat_disk() core dumps as per above call stack.

Case II:

The problem is caused by dynamic changing the number of CPUs. Currently it is not supported in osysmond.bin for dynamic change of configuration.
Solution

Case I:

These symptom are reported in Bug 16492284 and has been closed as a duplicate of unpublished base bug 14541391. This base bug has been fixed in version 12.1

Case II:

After changing any configuration like adding/deleting cpus/disks/network cards, restart the ora.crf resource:
1) stop the crf stack

    <GRID_HOME>/bin/crsctl stop res ora.crf -init

2) Start the crf stack

    <GRID_HOME>/bin/crsctl start res ora.crf -init

There is neither CRS nor database downtime required for above ora.crf start/stop operation as recommended by oracle support.

Reference: Oracle Support Knowledge Base

Wednesday 19 March 2014

Possibly Best options for generating nmon reports in AIX 6.1

Hi,

Here are some tips for generating nmon reports.

/usr/bin/topas_nmon -AdfPtV^ -s 30 -c 2880
/usr/bin/topas_nmon -f -t -d -A -O -L -N -P -V -T -^ -s 60 -c 1440

A--Summarizes the Async I/O (AIO server) processes.
d--Displays the I/O information of disks.
f--Specifies that the output is in spreadsheet format
P--Includes the Paging Space section in the recording file.
t--Includes the top processes in the output
V--ncludes disk volume group section.
^--Includes the Fibre Channel (FC) sections.
n--network statistics
c--processor statistics on bar graphs

Silent mode of Oracle Database Installation using manully created response file

######Silent Mode of Installation of database ignoring prereqs

#####Command ./runInstaller -silent -ignorePrereq -showProgress -responseFile /home/oracle/myresp.rsp

copy the below contents to myresp.rsp file and save then use the above command
---------------------------------- ---------------------------------------------
oracle.install.responseFileVersion=11.2.0
oracle.install.option=INSTALL_DB_AND_CONFIG
ORACLE_HOSTNAME=edc-chn-uat
UNIX_GROUP_NAME=oinstall
INVENTORY_LOCATION=/u01/app/oraInventory
SELECTED_LANGUAGES=en
ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
ORACLE_BASE=/u01/app/oracle
oracle.install.db.InstallEdition=EE
oracle.install.db.EEOptionsSelection=false
oracle.install.db.DBA_GROUP=dba
oracle.install.db.OPER_GROUP=oinstall
oracle.install.db.config.starterdb.type=GENERAL_PURPOSE
oracle.install.db.config.starterdb.globalDBName=ORCL
oracle.install.db.config.starterdb.SID=orcl
oracle.install.db.config.starterdb.characterSet=AL32UTF8
oracle.install.db.config.starterdb.memoryOption=true
oracle.install.db.config.starterdb.memoryLimit=700
oracle.install.db.config.starterdb.installExampleSchemas=true
oracle.install.db.config.starterdb.enableSecuritySettings=true
oracle.install.db.config.starterdb.password.ALL=manoj123
oracle.install.db.config.starterdb.control=DB_CONTROL
oracle.install.db.config.starterdb.automatedBackup.enable=false
oracle.install.db.config.starterdb.storageType=FILE_SYSTEM_STORAGE
oracle.install.db.config.starterdb.fileSystemStorage.dataLocation=/u02/oracle/orcl/oradata
oracle.install.db.config.starterdb.fileSystemStorage.recoveryLocation=/u01/app/oracle/flash_recovery_area
SECURITY_UPDATES_VIA_MYORACLESUPPORT=false
DECLINE_SECURITY_UPDATES=true
oracle.installer.autoupdates.option=SKIP_UPDATES

It works absolutely fine provided that set database prerequisites before installing DB as per oracle suggestion.

All the options given in database software default template. You can also use them. Oracle given response files are present inside dbsoft/install/ directory

Certain parameters and that are to be taken care before 11gR2 GRID INFRASTRUCTURE installation

Here are some parameters, ownership, permissions that are to be taken care before 11gR2 Grid Infrastructure Installation in AIX on Power Servers.

Network Parameters settings
-------------------------
/usr/sbin/no -o udp_sendspace=65536
/usr/sbin/no -o udp_recvspace=655360
/usr/sbin/no -o tcp_sendspace=65536
/usr/sbin/no -o tcp_recvspace=65536
/usr/sbin/no -o rfc1323=1
/usr/sbin/no -o sb_max=1310720
/usr/sbin/no -o ipqmaxlen=512

Kernel Parameters
------------------------
vmo -p -o minperm%=3
vmo -p -o maxperm%=90
vmo -p -o maxclient%=90
vmo -p -o lru_file_repage=0
vmo -p -o strict_maxclient=1
vmo -p -o strict_maxperm=0
ioo –a aio_maxreqs=65536

Adding capabilities to the grid user as suggested by oracle
---------------------------------------------------
#chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE grid

Disk Parameters settings for ASM storage
---------------------------------------------
#/usr/sbin/chdev -l hdisk2 -a pv=clear
#/usr/sbin/chdev -l hdisk3 -a pv=clear

#chdev -l hdisk2 -a reserve_policy=no_reserve
#chdev -l hdisk3 -a reserve_policy=no_reserve

#chown grid:asmadmin /dev/rhdisk2
#chmod 660 /dev/rhdisk2
#chown grid:asmadmin /dev/rhdisk3
#chmod 660 /dev/rhdisk3

#chdev -l hdisk2 -a reserve_policy=no_reserve
#chdev -l hdisk3 -a reserve_policy=no_reserve

newly added virtual Network to a RHEL-VM using RHEVM was not showing in ifconfig command

Newly added virtual Network to a RHEL-VM using RHEVM was not showing in ifconfig command
==================================================================
ifconfig command was showing only eth0 adn loopback interface

Check for ethtool eth0

[root@testvm1 ~]# ethtool eth0
Settings for eth0:
        Link detected: yes

[root@testvm1 ~]# ethtool eth1
Settings for eth1:
        Link detected: yes

but in ifconfig it is showing only

[root@testvm1 ~]# ifconfig
eth0      Link encap:Ethernet HWaddr 00:1A:4A:28:20:1B
          inet addr:10.40.9.51 Bcast:10.40.9.255 Mask:255.255.255.0
          inet6 addr: fe80::21a:4aff:fe28:201b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:103629 errors:0 dropped:0 overruns:0 frame:0
          TX packets:122 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:10988003 (10.4 MiB) TX bytes:17383 (16.9 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Solution:

create a new configuration file for eth1 (ifcfg-eth1)

then restart the service

#service network restart

Tuesday 18 March 2014

Enterprise Manager Dbcontrol for a clustered environment(11gR2 RAC)

Here are the steps for configuring Enterprise Manager dbcontrol for 11gR2 clustered environment.

step-1 export DBUNIQNAME
you can get it from show parameter db_unique_name

$export ORACLE_UNQNAME=RACDB

SYSTEM
Check for DBUNIQUE name
=======
SELECT name, db_unique_name FROM v$database;

check the service status If dbcontrol is running on any cluster node
=================
Note: if dbcontrol repository is there for one node or dbcontrol is running on any cluster node you need to drop it.

$emctl status dbconsole -- check for dbcontrol running on any node
$emca -deconfig dbcontrol db -repos drop -- dropping dbcontrol running on single node
$emctl status agent -- check the status of agent running on any node
$emca -repos drop -cluster

db user passwd unlock
=====================
alter user SYSMAN identified by SYSMAN account unlock;
alter user DBSNMP identified by DBSNMP account unlock;

Check for passwd file
======================
select * from V$PWFILE_USERS;

USERNAME                       SYSDB SYSOP SYSAS
—————————— —– —– —–
SYS                            TRUE TRUE FALSE

db user for ASM
==============
export GRID HOME AND SID FOR ASM

sqlplus sys/system as sysasm
create user asmsnmp identified by asmsnmp;
grant sysdba to asmsnmp;
grant sysasm to asmsnmp;
grant sysoper to asmsnmp;

SQL> select * from gv$pwfile_users;

   INST_ID USERNAME                       SYSDB SYSOP SYSAS
---------- ------------------------------ ----- ----- -----
         2 SYS                                           TRUE TRUE TRUE
         2 ASMSNMP                       TRUE TRUE TRUE
         1 SYS                                           TRUE TRUE TRUE
         1 ASMSNMP                               TRUE TRUE TRUE

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS
------------------------------ ----- ----- -----
SYS                         TRUE TRUE FALSE
ASMSNMP                         TRUE FALSE FALSE

FIX:
======
alter user sys identified by SYSTEM;

passwd file for oracle_home/dbs for sysuser
=================
passwords for rdbms home
--------------
orapwd file=orapwRACDB1 password=SYSTEM entries=5 force=y ignorecase=y
orapwd file=orapwRACDB2 password=SYSTEM entries=5 force=y ignorecase=y

password for grid home
----------------------
inside grid_home/dbs
orapwd file=orapw+ASM password=SYSTEM entries=5 force=y ignorecase=y

before creating gridcontrol for cluster please check the followings
1) valid username/password.
2) Database should be up.
3) Scan listener should be up.
4) Database service should be registered with scan listener.
5) Password file should be configured correctly.

Now configure emctl for cluster
--------------------------------
if you are creating for the first time
$emca -config dbcontrol db -repos create -cluster

if you want to recreate the emcontrol again(drop and create)
$emca -config dbcontrol db -repos recreate -cluster

This will ask you few questions like
DBNME
port number
clusternmae
ASMSNMP passwd
SYS Passwrd
DBSNMP passwd

keep these ready and give input.

100% it will be successfull.

Please feel free to ask @manojpalbabu@gmail.com

Sunday 9 March 2014

Troubleshooting Error 1017 received logging on to the standby (Error 1034 received logging on to the standby. PING[ARC2]: Heartbeat failed to connect to standby 'STBYKOL'. Error is 1034)

Hi Guys,

Here is an Interesting case and I was stuck with the problem since last 2 days and a half. In fact I was almost done with everything to resolve the mentioned problem but I could not able to solve almost.

Then I left it as I became exhausted, But I kept on thinking Where could be the problem, at last I got it.

Here was my case:
--------------------

I have 2 node rac as primary and a standby was intended to create on single node.

I changed parameter file, created passwords, tns entries accordingly. Then I did duplicate target and restored database as standby. After doing everything I saw, RAC node1 was able to send archive logs to dest_2 which is my standby but unfortunately node2 was unable to enter into standy database.

Here are

Problem Statement:
----------------------------

Standby was receiving node1's log files but node2 was unable to send logfiles due to the below error

Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.

Steps for diagnosing
----------------------------
step-1
when I checked query on node 2

sql>select DEST_ID,DEST_NAME,STATUS,BINDING,ERROR from v$ARCHIVE_DEST where status<>'INACTIVE';

all log file location were valid

Sql> SELECT DESTINATION, STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID=2;

dest2 for archive log dest is valid

sql> select error_code,message from v$dataguard_status;

Error 1034 received logging on to the standby

PING[ARC2]: Heartbeat failed to connect to standby 'STBYDB'. Error is 1034.

Then I have done the below steps to resolve the problem

1.) alter system set log_archive_dest_state_2=defer scope=both sid='*';

(on primary RAC any node )

Sql>recover managed standby database cancel; (on standby side)

2) alter database set SEC_CASE_SENSITIVE_LOGON=FALSE scope=both sid='*';

(on RAC side)

3) shutdown the standby database infact if possible primary also
then remove password files for all primary rac nodes and standby nodes and then created the password file on their respective server

orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=system entries=5 force=y ignorecase=Y

Then start the primary db (rac nodes by srvctl start database -d primaryDB )

4) alter system set LOG_ARCHIVE_DEST_2='SERVICE=STBYKOL ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STBYDB' scope=both sid='*';

(on any primary rac node and standby db)

5) alter system set log_archive_dest_state_2=enable scope=both sid='*'; (on any primary RAC node)

6) recover standby database using current logfile disconnect; (on standby database)

After doing this I observed the alert log and it worked

To check the database syncing

on both side check the command

sql>select current_scn from v$database;

The value of primary and standby would be almost same

Hope It will help you guys

If any query don't forget to mail me at viewssharings.blogspot.in@gmail.com

Sharing Experiences ~~~