High Availability | Pickleball spielen

Rolling Upgrade with Transient Logical Standby

Veröffentlicht von Uwe Hesse in TOI am August 10, 2011

rolling

You want to upgrade to a new release with minimum downtime. You have a Physical Standby Database in place. That Physical Standby can be used for a Rolling Upgrade, becoming a Transient Logical Standby only during the upgrade operation. This posting shows a demonstration of that feature while upgrading from 11.2.0.1 to 11.2.0.2.

The same should be working also to upgrade from 11g to 12c – I have heard success stories about it recently at the DOAG annual conference but did not test it myself.

My setup involves a Primary and a Physical Standby with the Protection Mode Maximum Availability, both on the same server. The new script physru – available from MOS – really makes it easy:

[oracle@uhesse ~]$ ./physru sys prima physt prima physt 11.2.0.2.0
Please enter the sysdba password:
### Initialize script to either start over or resume execution
Aug 10 09:22:44 2011 [0-1] Identifying rdbms software version
Aug 10 09:22:45 2011 [0-1] database prima is at version 11.2.0.1.0
Aug 10 09:22:45 2011 [0-1] database physt is at version 11.2.0.1.0
Aug 10 09:22:46 2011 [0-1] verifying flashback database is enabled at prima and physt
Aug 10 09:22:47 2011 [0-1] verifying available flashback restore points
Aug 10 09:22:47 2011 [0-1] verifying DG Broker is disabled
Aug 10 09:22:48 2011 [0-1] looking up prior execution history
Aug 10 09:22:48 2011 [0-1] purging script execution state from database prima
Aug 10 09:22:48 2011 [0-1] purging script execution state from database physt
Aug 10 09:22:48 2011 [0-1] starting new execution of script

### Stage 1: Backup user environment in case rolling upgrade is aborted
Aug 10 09:22:49 2011 [1-1] creating restore point PRU_0000_0001 on database physt
Aug 10 09:22:49 2011 [1-1] backing up current control file on physt
Aug 10 09:22:49 2011 [1-1] created backup control file /u01/app/oracle/product/11.2.0.1/db_1/dbs/PRU_0001_physt_f.f
Aug 10 09:22:49 2011 [1-1] creating restore point PRU_0000_0001 on database prima
Aug 10 09:22:50 2011 [1-1] backing up current control file on prima
Aug 10 09:22:50 2011 [1-1] created backup control file /u01/app/oracle/product/11.2.0.1/db_1/dbs/PRU_0001_prima_f.f

NOTE: Restore point PRU_0000_0001 and backup control file PRU_0001_physt_f.f
      can be used to restore physt back to its original state as a
      physical standby, in case the rolling upgrade operation needs to be aborted
      prior to the first switchover done in Stage 4.

### Stage 2: Create transient logical standby from existing physical standby
Aug 10 09:22:51 2011 [2-1] verifying RAC is disabled at physt
Aug 10 09:22:51 2011 [2-1] verifying database roles
Aug 10 09:22:51 2011 [2-1] verifying physical standby is mounted
Aug 10 09:22:52 2011 [2-1] verifying database protection mode
Aug 10 09:22:52 2011 [2-1] verifying transient logical standby datatype support
Aug 10 09:22:54 2011 [2-2] starting media recovery on physt
Aug 10 09:23:00 2011 [2-2] confirming media recovery is running
Aug 10 09:23:01 2011 [2-2] waiting for v$dataguard_stats view to initialize
Aug 10 09:23:08 2011 [2-2] waiting for apply lag on physt to fall below 30 seconds
Aug 10 09:23:09 2011 [2-2] apply lag is now less than 30 seconds
Aug 10 09:23:09 2011 [2-2] stopping media recovery on physt
Aug 10 09:23:09 2011 [2-2] executing dbms_logstdby.build on database prima
Aug 10 09:23:19 2011 [2-2] converting physical standby into transient logical standby
Aug 10 09:23:24 2011 [2-3] opening database physt
Aug 10 09:23:29 2011 [2-4] configuring transient logical standby parameters for rolling upgrade
Aug 10 09:23:31 2011 [2-4] starting logical standby on database physt
Aug 10 09:23:39 2011 [2-4] waiting until logminer dictionary has fully loaded
Aug 10 09:28:45 2011 [2-4] dictionary load 16% complete
Aug 10 09:28:56 2011 [2-4] dictionary load 75% complete
Aug 10 09:29:18 2011 [2-4] dictionary load is complete
Aug 10 09:29:20 2011 [2-4] waiting for v$dataguard_stats view to initialize
Aug 10 09:29:20 2011 [2-4] waiting for apply lag on physt to fall below 30 seconds
Aug 10 09:29:21 2011 [2-4] apply lag is now less than 30 seconds

NOTE: Database physt is now ready to be upgraded.  This script has left the
      database open in case you want to perform any further tasks before
      upgrading the database.  Once the upgrade is complete, the database must
      opened in READ WRITE mode before this script can be called to resume the
      rolling upgrade.

NOTE: If physt was previously a RAC database that was disabled, it may be
      reverted back to a RAC database upon completion of the rdbms upgrade.
      This can be accomplished by performing the following steps:

          1) On instance physt, set the cluster_database parameter to TRUE.
          eg: SQL> alter system set cluster_database=true scope=spfile;

          2) Shutdown instance physt.
          eg: SQL> shutdown abort;

          3) Startup and open all instances for database physt.
          eg: srvctl start database -d physt

That was the first step – my in the moment Logical Standby is now ready to upgrade. This can be done with dbua or manually, as usual:

[oracle@uhesse ~]$ sqlplus sys/oracle@physt as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 10 09:31:23 2011

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select name,dbid,database_role from v$database;

NAME        DBID DATABASE_ROLE
--------- ---------- ----------------
PRIMA      1990933310 LOGICAL STANDBY

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

Notice that the name and DBID of the Logical Standby is the same as on the Primary, which is unusual for a Logical Standby and caused by the new KEEP IDENTITY clause, introduced to make Transient Logical Standby possible. I have installed the new sources of 11.2.0.2 as an Out-of-Place-Upgrade in a separate Oracle Home and switch the environment now to it:

[oracle@uhesse ~]$ cp /u01/app/oracle/product/11.2.0.1/db_1/dbs/orapw* /u01/app/oracle/product/11.2.0.2/db_1/dbs/
[oracle@uhesse ~]$ cp /u01/app/oracle/product/11.2.0.1/db_1/dbs/spfile*.ora /u01/app/oracle/product/11.2.0.2/db_1/dbs/
[oracle@uhesse ~]$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0.2/db_1
[oracle@uhesse ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@uhesse ~]$ export ORACLE_SID=physt
[oracle@uhesse ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed Aug 10 09:35:21 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area  668082176 bytes
Fixed Size            2229480 bytes
Variable Size          448793368 bytes
Database Buffers      213909504 bytes
Redo Buffers            3149824 bytes
Database mounted.
Database opened.

Next step is the Upgrade, done by catupgrd.sql – the most time-consuming part, but no downtime at all yet for my end users, because prima (my „Production Database“) is still running happily. Be aware that during this rolling upgrade, the Primary is not synchronized with the Standby (Redo Transport is interrupted), though. This is reflected on the Primary:

SQL> select protection_level,protection_mode from v$database;

PROTECTION_LEVEL     PROTECTION_MODE
-------------------- --------------------
RESYNCHRONIZATION    MAXIMUM AVAILABILITY

Back to my Standby that is upgraded:

SQL> @?/rdbms/admin/catupgrd
[... Output suppressed ...]
SQL>
SQL> Rem *********************************************************************
SQL> Rem END catupgrd.sql
SQL> Rem *********************************************************************

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
[oracle@uhesse ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed Aug 10 09:52:10 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area  668082176 bytes
Fixed Size            2229480 bytes
Variable Size          452987672 bytes
Database Buffers      209715200 bytes
Redo Buffers            3149824 bytes
Database mounted.
Database opened.
SQL> @?/rdbms/admin/utlrp

TIMESTAMP
--------------------------------------------------------------------------------
COMP_TIMESTAMP UTLRP_BGN  2011-08-10 09:52:53

DOC>   The following PL/SQL block invokes UTL_RECOMP to recompile invalid
DOC>   objects in the database. Recompilation time is proportional to the
DOC>   number of invalid objects in the database, so this command may take
DOC>   a long time to execute on a database with a large number of invalid
DOC>   objects.
DOC>
DOC>   Use the following queries to track recompilation progress:
DOC>
DOC>   1. Query returning the number of invalid objects remaining. This
DOC>      number should decrease with time.
DOC>         SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
DOC>
DOC>   2. Query returning the number of objects compiled so far. This number
DOC>      should increase with time.
DOC>         SELECT COUNT(*) FROM UTL_RECOMP_COMPILED;
DOC>
DOC>   This script automatically chooses serial or parallel recompilation
DOC>   based on the number of CPUs available (parameter cpu_count) multiplied
DOC>   by the number of threads per CPU (parameter parallel_threads_per_cpu).
DOC>   On RAC, this number is added across all RAC nodes.
DOC>
DOC>   UTL_RECOMP uses DBMS_SCHEDULER to create jobs for parallel
DOC>   recompilation. Jobs are created without instance affinity so that they
DOC>   can migrate across RAC nodes. Use the following queries to verify
DOC>   whether UTL_RECOMP jobs are being created and run correctly:
DOC>
DOC>   1. Query showing jobs created by UTL_RECOMP
DOC>         SELECT job_name FROM dba_scheduler_jobs
DOC>        WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>
DOC>   2. Query showing UTL_RECOMP jobs that are running
DOC>         SELECT job_name FROM dba_scheduler_running_jobs
DOC>        WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>#

PL/SQL procedure successfully completed.

TIMESTAMP
--------------------------------------------------------------------------------
COMP_TIMESTAMP UTLRP_END  2011-08-10 09:55:56

DOC> The following query reports the number of objects that have compiled
DOC> with errors (objects that compile with errors have status set to 3 in
DOC> obj$). If the number is higher than expected, please examine the error
DOC> messages reported with each object (using SHOW ERRORS) to see if they
DOC> point to system misconfiguration or resource constraints that must be
DOC> fixed before attempting to recompile these objects.
DOC>#

OBJECTS WITH ERRORS
-------------------
          0

DOC> The following query reports the number of errors caught during
DOC> recompilation. If this number is non-zero, please query the error
DOC> messages in the table UTL_RECOMP_ERRORS to see if any of these errors
DOC> are due to misconfiguration or resource constraints that must be
DOC> fixed before objects can compile successfully.
DOC>#

ERRORS DURING RECOMPILATION
---------------------------
              0

Function created.

PL/SQL procedure successfully completed.

Function dropped.

PL/SQL procedure successfully completed.

I go back to the first terminal where I called physru and call it again with the very same command for the second time:

[oracle@uhesse ~]$ ./physru sys prima physt prima physt 11.2.0.2.0
Please enter the sysdba password:
### Initialize script to either start over or resume execution
Aug 10 09:59:44 2011 [0-1] Identifying rdbms software version
Aug 10 09:59:45 2011 [0-1] database prima is at version 11.2.0.1.0
Aug 10 09:59:46 2011 [0-1] database physt is at version 11.2.0.2.0
Aug 10 09:59:48 2011 [0-1] verifying flashback database is enabled at prima and physt
Aug 10 09:59:48 2011 [0-1] verifying available flashback restore points
Aug 10 09:59:49 2011 [0-1] verifying DG Broker is disabled
Aug 10 09:59:50 2011 [0-1] looking up prior execution history
Aug 10 09:59:50 2011 [0-1] last completed stage [2-4] using script version 0001
Aug 10 09:59:50 2011 [0-1] resuming execution of script

### Stage 3: Validate upgraded transient logical standby
Aug 10 09:59:51 2011 [3-1] database physt is no longer in OPEN MIGRATE mode
Aug 10 09:59:51 2011 [3-1] database physt is at version 11.2.0.2.0

### Stage 4: Switch the transient logical standby to be the new primary
Aug 10 09:59:53 2011 [4-1] waiting for physt to catch up (this could take a while)
Aug 10 09:59:53 2011 [4-1] starting logical standby on database physt
Aug 10 10:00:02 2011 [4-1] waiting for v$dataguard_stats view to initialize
Aug 10 10:02:42 2011 [4-1] waiting for apply lag on physt to fall below 30 seconds
Aug 10 10:03:13 2011 [4-1] apply lag is now less than 30 seconds
Aug 10 10:03:14 2011 [4-2] switching prima to become a logical standby
Aug 10 10:03:21 2011 [4-2] prima is now a logical standby
Aug 10 10:03:21 2011 [4-3] waiting for standby physt to process end-of-redo from primary
Aug 10 10:03:24 2011 [4-4] switching physt to become the new primary
Aug 10 10:03:28 2011 [4-4] physt is now the new primary

### Stage 5: Flashback former primary to pre-upgrade restore point and convert to physical
Aug 10 10:03:34 2011 [5-1] shutting down database prima
Aug 10 10:03:48 2011 [5-1] mounting database prima
Aug 10 10:03:58 2011 [5-2] flashing back database prima to restore point PRU_0000_0001
Aug 10 10:04:00 2011 [5-3] converting prima into physical standby
Aug 10 10:04:02 2011 [5-4] shutting down database prima

NOTE: Database prima has been shutdown, and is now ready to be started
      using the newer version Oracle binary.  This script requires the
      database to be mounted (on all active instances, if RAC) before calling
      this script to resume the rolling upgrade.

During this call of the script we have a downtime (less than 20 seconds) in which no end-user can use the Production Database. It is already over – physt is now Primary with the new Release running. I go on and start the ex-Primary (prima) with the new Release sources:

[oracle@uhesse ~]$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0.2/db_1
[oracle@uhesse ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@uhesse ~]$ export ORACLE_SID=prima
[oracle@uhesse ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed Aug 10 10:06:37 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area  668082176 bytes
Fixed Size            2229480 bytes
Variable Size          398461720 bytes
Database Buffers      264241152 bytes
Redo Buffers            3149824 bytes
Database mounted.
SQL> select name,dbid,database_role from v$database;

NAME        DBID DATABASE_ROLE
--------- ---------- ----------------
PRIMA      1990933310 PHYSICAL STANDBY

Third call of the script will flashback the ex-Primary (prima) to the guaranteed restore point, created at the first call of the script, and recover from there to the present state of the new Primary (physt) – thereby „upgrading“ it with Redo Apply. The point is: You only need to call catupgrd.sql once during the whole operation.

[oracle@uhesse ~]$ ./physru sys prima physt prima physt 11.2.0.2.0
Please enter the sysdba password:
### Initialize script to either start over or resume execution
Aug 10 10:08:02 2011 [0-1] Identifying rdbms software version
Aug 10 10:08:02 2011 [0-1] database prima is at version 11.2.0.2.0
Aug 10 10:08:02 2011 [0-1] database physt is at version 11.2.0.2.0
Aug 10 10:08:04 2011 [0-1] verifying flashback database is enabled at prima and physt
Aug 10 10:08:04 2011 [0-1] verifying available flashback restore points
Aug 10 10:08:05 2011 [0-1] verifying DG Broker is disabled
Aug 10 10:08:05 2011 [0-1] looking up prior execution history
Aug 10 10:08:06 2011 [0-1] last completed stage [5-4] using script version 0001
Aug 10 10:08:06 2011 [0-1] resuming execution of script

### Stage 6: Run media recovery through upgrade redo
Aug 10 10:08:08 2011 [6-1] upgrade redo region identified as scn range [245621, 510586]
Aug 10 10:08:08 2011 [6-1] starting media recovery on prima
Aug 10 10:08:15 2011 [6-1] confirming media recovery is running
Aug 10 10:09:04 2011 [6-1] waiting for media recovery to initialize v$recovery_progress
Aug 10 10:09:35 2011 [6-1] monitoring media recovery's progress
Aug 10 10:09:38 2011 [6-2] last applied scn 241298 is approaching upgrade redo start scn 245621
Aug 10 10:10:14 2011 [6-3] recovery of upgrade redo at 01% - estimated complete at Aug 10 10:36:21
Aug 10 10:10:30 2011 [6-3] recovery of upgrade redo at 16% - estimated complete at Aug 10 10:15:57
Aug 10 10:10:46 2011 [6-3] recovery of upgrade redo at 25% - estimated complete at Aug 10 10:14:39
Aug 10 10:11:02 2011 [6-3] recovery of upgrade redo at 33% - estimated complete at Aug 10 10:14:16
Aug 10 10:11:18 2011 [6-3] recovery of upgrade redo at 45% - estimated complete at Aug 10 10:13:31
Aug 10 10:11:34 2011 [6-3] recovery of upgrade redo at 48% - estimated complete at Aug 10 10:13:50
Aug 10 10:11:51 2011 [6-3] recovery of upgrade redo at 56% - estimated complete at Aug 10 10:13:44
Aug 10 10:12:07 2011 [6-3] recovery of upgrade redo at 60% - estimated complete at Aug 10 10:13:51
Aug 10 10:12:24 2011 [6-3] recovery of upgrade redo at 68% - estimated complete at Aug 10 10:13:44
Aug 10 10:12:40 2011 [6-3] recovery of upgrade redo at 70% - estimated complete at Aug 10 10:14:00
Aug 10 10:12:56 2011 [6-3] recovery of upgrade redo at 74% - estimated complete at Aug 10 10:14:07
Aug 10 10:13:13 2011 [6-3] recovery of upgrade redo at 82% - estimated complete at Aug 10 10:14:00
Aug 10 10:13:29 2011 [6-3] recovery of upgrade redo at 87% - estimated complete at Aug 10 10:14:02
Aug 10 10:13:46 2011 [6-3] recovery of upgrade redo at 96% - estimated complete at Aug 10 10:13:56
Aug 10 10:14:04 2011 [6-4] media recovery has finished recovering through upgrade

### Stage 7: Switch back to the original roles prior to the rolling upgrade

NOTE: At this point, you have the option to perform a switchover
     which will restore prima back to a primary database and
     physt back to a physical standby database.  If you answer 'n'
     to the question below, prima will remain a physical standby
     database and physt will remain a primary database.

Do you want to perform a switchover? (y/n):

It may be not necessary to switchover again, especially if the two Databases are on the same server. This would avoid another short downtime. I do it in spite:

Do you want to perform a switchover? (y/n): y

Aug 10 10:50:09 2011 [7-1] continuing
Aug 10 10:50:12 2011 [7-2] waiting for v$dataguard_stats view to initialize
Aug 10 10:50:12 2011 [7-2] waiting for apply lag on prima to fall below 30 seconds
Aug 10 10:50:13 2011 [7-2] apply lag is now less than 30 seconds
Aug 10 10:50:14 2011 [7-3] switching physt to become a physical standby
Aug 10 10:50:17 2011 [7-3] physt is now a physical standby
Aug 10 10:50:17 2011 [7-3] shutting down database physt
Aug 10 10:50:19 2011 [7-3] mounting database physt
Aug 10 10:50:35 2011 [7-4] waiting for standby prima to process end-of-redo from primary
Aug 10 10:50:37 2011 [7-5] switching prima to become the new primary
Aug 10 10:50:38 2011 [7-5] prima is now the new primary
Aug 10 10:50:38 2011 [7-5] opening database prima
Aug 10 10:50:44 2011 [7-6] starting media recovery on physt
Aug 10 10:50:50 2011 [7-6] confirming media recovery is running

### Stage 8: Statistics
script start time:                                           10-Aug-11 09:22:49
script finish time:                                          10-Aug-11 10:50:54
total script execution time:                                       +00 01:28:05
wait time for user upgrade:                                        +00 00:30:30
active script execution time:                                      +00 00:57:35
transient logical creation start time:                       10-Aug-11 09:22:53
transient logical creation finish time:                      10-Aug-11 09:23:23
primary to logical switchover start time:                    10-Aug-11 10:03:13
logical to primary switchover finish time:                   10-Aug-11 10:03:30
primary services offline for:                                      +00 00:00:17
total time former primary in physical role:                        +00 00:46:02
time to reach upgrade redo:                                        +00 00:00:17
time to recover upgrade redo:                                      +00 00:04:09
primary to physical switchover start time:                   10-Aug-11 10:50:09
physical to primary switchover finish time:                  10-Aug-11 10:50:43
primary services offline for:                                      +00 00:00:34

SUCCESS: The physical rolling upgrade is complete

Some nice statistics at the end, documenting especially the very short downtime from end-user perspective. Now wasn’t that cool? 🙂

Keep in mind that this setup was very vanilla, though. There could be more effort involved if the Primary has tables that are unsupported with Logical Standby, especially.

Addendum: See this nice posting by Guenadi Jilevski that gives an example for Transient Logical Standby with RAC Primary & Standby