Posts Tagged Oracle Restart
I am presently teaching 11gR2 New Features for DBAs on a Private Event, where one attendee asked how we could retrieve and possibly modify the restart attempts, ohasd is trying for a resource, governed by Oracle Restart. My first guess was that this should be possible similar as with the Grid Infrastructure clusterware utility crsctl, but the commands are not listed in the Online Documentation about Oracle Restart. So I investigated a little and that is what I found out:
$ crsctl status res ora.orcl.db -f NAME=ora.orcl.db TYPE=ora.database.type STATE=OFFLINE TARGET=ONLINE ACL=owner:oracle:--x,pgrp:dba:--x,other::r--,group:oinstall:r-x,user:oracle:rwx ACTION_FAILURE_TEMPLATE= ACTION_SCRIPT= ACTIVE_PLACEMENT=1 AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX% AUTO_START=restore CARDINALITY=1 CARDINALITY_ID=0 CHECK_INTERVAL=1 CHECK_TIMEOUT=600 CLUSTER_DATABASE=false CREATION_SEED=16 CURRENT_RCOUNT=0 DB_UNIQUE_NAME=orcl DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) DEGREE=1 DESCRIPTION=Oracle Database resource ENABLED=1 FAILOVER_DELAY=0 FAILURE_COUNT=0 FAILURE_HISTORY= FAILURE_INTERVAL=60 FAILURE_THRESHOLD=1 GEN_AUDIT_FILE_DEST=/u01/app/oracle/admin/orcl/adump GEN_USR_ORA_INST_NAME=orcl HOSTING_MEMBERS= ID=ora.orcl.db INCARNATION=0 INSTANCE_FAILOVER=0 LAST_FAULT=0 LAST_RESTART=0 LAST_SERVER= LOAD=1 LOGGING_LEVEL=1 MANAGEMENT_POLICY=AUTOMATIC NLS_LANG= NOT_RESTARTING_TEMPLATE= OFFLINE_CHECK_INTERVAL=0 ORACLE_HOME=/u01/app/oracle/acfsmounts/acfs_db1 PLACEMENT=balanced PROFILE_CHANGE_TEMPLATE= RESTART_ATTEMPTS=2 ROLE=PRIMARY SCRIPT_TIMEOUT=60 SERVER_POOLS= SPFILE=+DATA/orcl/spfileorcl.ora START_DEPENDENCIES=hard(ora.DATA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DATA.dg,ora.FRA.dg) START_TIMEOUT=600 STATE_CHANGE_TEMPLATE= STATE_CHANGE_VERS=0 STATE_DETAILS= STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.FRA.dg) STOP_TIMEOUT=600 UPTIME_THRESHOLD=1h USR_ORA_DB_NAME=orcl USR_ORA_DOMAIN= USR_ORA_ENV= USR_ORA_FLAGS= USR_ORA_INST_NAME=orcl USR_ORA_OPEN_MODE=open USR_ORA_OPI=false USR_ORA_STOP_MODE=immediate VERSION=220.127.116.11.0
We see amongst others the highlighted Restart Attempts and the Uptime Threshold of the Database resource orcl with their default values. We can also modify them as follows:
$ crsctl modify res ora.orcl.db -attr "RESTART_ATTEMPTS=1"
Now if within 1 hour, the Database Instance fails more than 1 times, ohasd will no longer try to restart it. This can be watched in the ohasd logfile. I open a second terminal to monitor the ohasd logfile and kill the smon of the orcl instance from the first terminal. After the second kill -9 on the smon process, ohasd refuses to restart the instance, with this entry:
$ cat /u01/app/oracle/product/11.2.0/grid/log/edd2r6p0/ohasd/ohasd.log | grep attempt 2011-11-08 16:05:26.944: [ CRSPE] No Local Restart of [ora.orcl.db 1 1], restart attempt exhausted
The Uptime Threshold will start anew after the command
$ srvctl start database -d orcl
Just playing around with Data Guard & Oracle Restart in the current release 11gR2 – it’s really cool! Look at Joel Goodmans introduction into Oracle Restart, if you never heared of it before. At first, I installed Oracle Database 11gR2 Grid Infrastructure for Standalone Server to get Oracle Restart, then Oracle Database 11gR2 Software only. I did not use ASM in this setup – everything is running inside a VM on my notebook with only one single hard drive really, so ASM is of not much help here 🙂
Listener and Database must be registered manually if NETCA and DBCA have not been used to create them. I did that on the command line, therefore the following steps.
First registering the listener:
$ srvctl add listener -o /u01/app/oracle/product/11.2.0/dbhome_1
$ srvctl add database -d prima -o /u01/app/oracle/product/11.2.0/dbhome_1 -r PRIMARY
Above added the Primary to the Oracle Local Registry (OLR)
$ srvctl add database -d physt -o /u01/app/oracle/product/11.2.0/dbhome_1 -r PHYSICAL_STANDBY -s MOUNT
Above added the Physical Standby to the OLR. -s is the startmode which defaults to OPEN. My Standby shall get mounted automatically. I would choose OPEN if I am inclined to use the 11g New Feature Real-Time Query – supposed I had purchased Active Data Guard.
I will now add an Application Service to my Configuration – with the same purpose demonstrated already in this posting, but without having to use DBMS_SERVICE. Instead, Oracle Restart will take care for the start of the services in my Data Guard environment according to the assigned Database Role. The creation of a Database Trigger for that purpose is no longer necessary. Instead, the usage of Oracle Restart is recommended:
$ srvctl add service -d prima -s prod -l PRIMARY -e SELECT -m BASIC
-l is the Database Role in which this service prod is supposed to get started, -e is the Failover Type and -m the Failover Method. Next I configure Failover Retries (-z) and Failover Delays (-w):
$ srvctl modify service -d prima -s prod -z 180 -w 1
The same has to be done for the Standby Database also – the -l PRIMARY parameter takes care that the service will not be offered unless this database becomes Primary:
$ srvctl add service -d physt -s prod -l PRIMARY -e SELECT -m BASIC $ srvctl modify service -d physt -s prod -z 180 -w 1
The nect command is going to start the service actually. We need to do this only for the first time manually – again without needing DBMS_SERVICE for it:
$ srvctl start service -d prima -s prod
Now let’s investigate the properties of our new service:
$ srvctl config service -d prima -s prod
Service name: prod Service is enabled Cardinality: SINGLETON Disconnect: false Service role: PRIMARY Management policy: AUTOMATIC DTP transaction: false AQ HA notifications: false Failover type: SELECT Failover method: BASIC TAF failover retries: 180 TAF failover delay: 1 Connection Load Balancing Goal: LONG Runtime Load Balancing Goal: NONE TAF policy specification: NONE
I have prepared a connect descriptor prod in the same way as in this posting shown already. Using it to connect as system user. Now I kill the SMON process of my Primary Instance (Don’t do that at home). Then a select on v$instance in my user system session waits for the reestablishing of the connection (maximum 180 seconds with my settings above) – after less than 5 seconds, the ohasd (Oracle High Availabilty Services Daemon) recognizes this and restarts the Primary Instance. After less than 20 seconds, I see my select on v$instance come through. This was a Transparent Application Failover without another Instance or Database in the game 🙂 Really cool, isn’t it?
Now let’s do a switchover and see how Oracle Restart will take care to start the prod service on the new Primary and how the connection again is reestablished there:
$ dgmgrl sys/oracle@prima
DGMGRL for Linux: Version 18.104.22.168.0 - Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> show configuration
Configuration - myconf
Protection Mode: MaxAvailability
prima - Primary database
physt - Physical standby database
Fast-Start Failover: DISABLED
DGMGRL> switchover to physt
Performing switchover NOW, please wait...
New primary database "physt" is opening...
Operation requires shutdown of instance "prima" on database "prima"
Shutting down instance "prima"...
ORA-01109: database not open
ORACLE instance shut down.
Operation requires startup of instance "prima" on database "prima"
Starting instance "prima"...
ORACLE instance started.
Switchover succeeded, new primary is "physt"
The switchover was done as usual. Now what about the service and the session?
$ lsnrctl status LSNRCTL for Linux: Version 22.214.171.124.0 - Production on 06-SEP-2010 08:45:25 Copyright (c) 1991, 2009, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 126.96.36.199.0 - Production Start Date 06-SEP-2010 07:40:29 Uptime 0 days 1 hr. 4 min. 57 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/oracle/product/11.2.0/dbhome_1/ network/admin/listener.ora Listener Log File /u01/app/oracle/product/11.2.0/dbhome_1/log/diag/ tnslsnr/uhesse/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=uhesse)(PORT=1521))) Services Summary... Service "logst_DGMGRL" has 1 instance(s). Instance "logst", status UNKNOWN, has 1 handler(s) for this service... Service "physt" has 1 instance(s). Instance "physt", status READY, has 1 handler(s) for this service... Service "physt_DGB" has 1 instance(s). Instance "physt", status READY, has 1 handler(s) for this service... Service "physt_DGMGRL" has 1 instance(s). Instance "physt", status UNKNOWN, has 1 handler(s) for this service... Service "prima" has 1 instance(s). Instance "prima", status READY, has 1 handler(s) for this service... Service "prima_DGB" has 1 instance(s). Instance "prima", status READY, has 1 handler(s) for this service... Service "prima_DGMGRL" has 1 instance(s). Instance "prima", status UNKNOWN, has 1 handler(s) for this service... Service "prod" has 1 instance(s). Instance "physt", status READY, has 1 handler(s) for this service... The command completed successfully
The service was started on the new Primary by Oracle Restart automatically. Now the session:
SYSTEM@prod > select instance_name from v$instance;
SYSTEM@prod > select username,FAILED_OVER,FAILOVER_METHOD,FAILOVER_TYPE from v$session where username='SYSTEM';
USERNAME FAI FAILOVER_M FAILOVER_TYPE
------------------------------ --- ---------- -------------
SYSTEM YES BASIC SELECT
The session was again “unsinkable” and survived the switchover by failing over to the new Primary. When I reboot my server, ohasd together with DMON now take care that Listener, Primary, Standby and Service all get started in the correct order.