Fast-Start Failover for Maximum Protection in #Oracle 12c

Fast-Start Failover is supported with Maximum Protection in 12cR2. Also Multiple Observers can now monitor the same Data Guard Configuration simultaneously. I will show both in this article. Starting with a (Multitenant) Primary in Maximum Protection mode with two Standby Databases. It is still not recommended to have the highest protection mode configured with only one standby. So this is my starting point:

DGMGRL> show configuration;

Configuration - myconf

Protection Mode: MaxProtection
Members:
cdb1 - Primary database
cdb1sb - Physical standby database
cdb1sb2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS (status updated 57 seconds ago)

All three databases have flashback turned on. I want to have a setup like this in the end:

FSFO with Max Protection and 2 Observers

This is how it’s been configured:

DGMGRL> edit database cdb1 set property faststartfailovertarget='cdb1sb,cdb1sb2';
Property "faststartfailovertarget" updated
DGMGRL> edit database cdb1sb set property faststartfailovertarget='cdb1,cdb1sb2';
Property "faststartfailovertarget" updated
DGMGRL> edit database cdb1sb2 set property faststartfailovertarget='cdb1,cdb1sb';
Property "faststartfailovertarget" updated
DGMGRL> enable fast_start failover;
Enabled.

On host uhesse4:

[oracle@uhesse4 ~]$ dgmgrl sys/oracle@cdb1
DGMGRL for Linux: Release 12.2.0.1.0 - Production on Fri Jan 13 17:20:52 2017

Copyright (c) 1982, 2016, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "cdb1"
Connected as SYSDBA.
DGMGRL> start observer number_one;
[W000 01/13 17:21:04.85] FSFO target standby is cdb1sb
[W000 01/13 17:21:07.05] Observer trace level is set to USER
[W000 01/13 17:21:07.05] Try to connect to the primary.
[W000 01/13 17:21:07.05] Try to connect to the primary cdb1.
[W000 01/13 17:21:07.05] The standby cdb1sb is ready to be a FSFO target
[W000 01/13 17:21:09.06] Connection to the primary restored!
[W000 01/13 17:21:13.07] Disconnecting from database cdb1.

On host uhesse3:

[oracle@uhesse3 ~]$ dgmgrl sys/oracle@cdb1
DGMGRL for Linux: Release 12.2.0.1.0 - Production on Fri Jan 13 17:22:16 2017

Copyright (c) 1982, 2016, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "cdb1"
Connected as SYSDBA.
DGMGRL> start observer number_two;
[W000 01/13 17:22:32.68] FSFO target standby is cdb1sb
[W000 01/13 17:22:34.85] Observer trace level is set to USER
[W000 01/13 17:22:34.85] Try to connect to the primary.
[W000 01/13 17:22:34.85] Try to connect to the primary cdb1.
[W000 01/13 17:22:34.85] The standby cdb1sb is ready to be a FSFO target
[W000 01/13 17:22:36.86] Connection to the primary restored!
[W000 01/13 17:22:40.86] Disconnecting from database cdb1.

This is now the state of the configuration:

DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxProtection
  Members:
  cdb1    - Primary database
    cdb1sb  - (*) Physical standby database 
    cdb1sb2 - Physical standby database 

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 33 seconds ago)

DGMGRL> show fast_start failover;

Fast-Start Failover: ENABLED

  Threshold:          15 seconds
  Target:             cdb1sb
  Candidate Targets:  cdb1sb,cdb1sb2
  Observers:      (*) number_two
                      number_one
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

That protects against the failure of any two components in the configuration with automatic failover and zero data loss! For example the first standby may fail and then the primary. We failover to the second standby that becomes the new fast-start failover target:

[oracle@uhesse2 ~]$ ps -ef | grep smon
oracle   15087     1  0 17:40 ?        00:00:00 ora_smon_cdb1sb
oracle   15338  9765  0 17:49 pts/2    00:00:00 grep --color=auto smon
[oracle@uhesse2 ~]$ kill -9 15087

Above crashed the first standby. This is what the Observers report:

[W000 01/13 17:49:34.24] Failed to ping the standby.
[W000 01/13 17:49:37.25] Failed to ping the standby.
[W000 01/13 17:49:40.25] Failed to ping the standby.
[W000 01/13 17:49:43.25] Failed to ping the standby.
[W000 01/13 17:49:46.26] Failed to ping the standby.
[W000 01/13 17:49:46.26] Standby database has changed to cdb1sb2.
[W000 01/13 17:49:47.26] Try to connect to the primary.
[W000 01/13 17:49:47.26] Try to connect to the primary cdb1.
[W000 01/13 17:49:48.34] The standby cdb1sb2 is ready to be a FSFO target
[W000 01/13 17:49:53.35] Connection to the primary restored!
[W000 01/13 17:49:57.35] Disconnecting from database cdb1.

This is the state of the configuration now:

DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxProtection
  Members:
  cdb1    - Primary database
    Error: ORA-16778: redo transport error for one or more members

    cdb1sb2 - (*) Physical standby database 
    cdb1sb  - Physical standby database 
      Error: ORA-1034: ORACLE not available

Fast-Start Failover: ENABLED

Configuration Status:
ERROR   (status updated 14 seconds ago)

Notice that the Fast-Start Failover indicator (*) now points to cdb1sb2. Now the primary fails:

[oracle@uhesse1 ~]$ ps -ef | grep smon
oracle   21334     1  0 17:41 ?        00:00:00 ora_smon_cdb1
oracle   22077  5043  0 17:52 pts/0    00:00:00 grep --color=auto smon
[oracle@uhesse1 ~]$ kill -9 21334

This is what the Observers report:

[W000 01/13 17:52:54.04] Primary database cannot be reached.
[W000 01/13 17:52:54.04] Fast-Start Failover threshold has not exceeded. Retry for the next 15 seconds
[W000 01/13 17:52:55.05] Try to connect to the primary.
[W000 01/13 17:52:57.13] Primary database cannot be reached.
[W000 01/13 17:52:58.13] Try to connect to the primary.
[W000 01/13 17:53:06.38] Primary database cannot be reached.
[W000 01/13 17:53:06.38] Fast-Start Failover threshold has not exceeded. Retry for the next 3 seconds
[W000 01/13 17:53:07.39] Try to connect to the primary.
[W000 01/13 17:53:09.46] Primary database cannot be reached.
[W000 01/13 17:53:09.46] Fast-Start Failover threshold has expired.
[W000 01/13 17:53:09.46] Try to connect to the standby.
[W000 01/13 17:53:09.46] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 01/13 17:53:09.46] Check if the standby is ready for failover.
[S019 01/13 17:53:09.47] Fast-Start Failover started...

17:53:09.47  Friday, January 13, 2017
Initiating Fast-Start Failover to database "cdb1sb2"...
[S019 01/13 17:53:09.47] Initiating Fast-start Failover.
Performing failover NOW, please wait...
Failover succeeded, new primary is "cdb1sb2"
17:53:23.68  Friday, January 13, 2017

After having restarted the two crashed databases, they become automatically reinstated and the configuration then looks like this:

DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxProtection
  Members:
  cdb1sb2 - Primary database
    cdb1    - (*) Physical standby database 
    cdb1sb  - Physical standby database 

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 7 seconds ago)

DGMGRL> show fast_start failover;

Fast-Start Failover: ENABLED

  Threshold:          15 seconds
  Target:             cdb1
  Candidate Targets:  cdb1,cdb1sb
  Observers:      (*) number_two
                      number_one
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

Switching back to make cdb1 primary – this is of course optional:

DGMGRL> switchover to cdb1;
Performing switchover NOW, please wait...
Operation requires a connection to database "cdb1"
Connecting ...
Connected to "cdb1"
Connected as SYSDBA.
New primary database "cdb1" is opening...
Operation requires start up of instance "cdb1sb2" on database "cdb1sb2"
Starting instance "cdb1sb2"...
ORACLE instance started.
Database mounted.
Connected to "cdb1sb2"
Switchover succeeded, new primary is "cdb1"
DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxProtection
  Members:
  cdb1    - Primary database
    cdb1sb  - (*) Physical standby database 
    cdb1sb2 - Physical standby database 

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 29 seconds ago)

I think this enhancement is really a big deal!

12c New Features, Data Guard

Dieser Eintrag wurde erstellt am Januar 13, 2017, 19:40 und wurde abgelegt unter TOI. Du kannst die Antworten auf diesen Beitrag über RSS 2.0 verfolgen. Du kannst eine Antwort schreiben oder einen Trackback von deiner eigenen Seite schicken.

#1 von Rafael Vieira am Januar 13, 2017 - 20:30

Hey Uwe, how are you doing ?
Sorry for mu english
I would like to know if is possible to do a switchover to fast start failover target (in maximum protection or no) ? If yes, the old primary become the new fast start failover target?
#2 von Ahmed Abdel Fattah am Januar 14, 2017 - 09:35

Thanks Uwe for sharing this nice new feature.

Really it is a powerful addition in 12.2, is it packported to 11gR2? Any workaround to get more than one observer actively working for one data guard configuration.?
#3 von Uwe Hesse am Januar 18, 2017 - 10:47

Rafael Vieira, yes, switchover is allowed – actually you see me doing it in the article. Which database is going to be the new Fast-Start Failover target is a property of the new primary. You see me configuring this at the beginning of the demo 🙂
#4 von Uwe Hesse am Januar 18, 2017 - 10:49

Ahmed Abdel Fattah, you’re welcome 🙂 At the moment this is a 12cR2 Feature and I don’t know whether it’s going to be backported to earlier releases.
#5 von Raul Kaubi am Juli 22, 2018 - 22:37

Hi

Correct me if I am wrong, but after the failover has been initiated, won’t the old primary database be useless after that, meaning that I need to recreate new database from current active database (old standby)..?

I mean, switchover is the procedure, that llows us to switch back to back from primary to standby..

Regards
Raul
#6 von Uwe Hesse am Juli 30, 2018 - 09:43

Raul, Switchover is the way to change Roles between Primary and Standby if there is no problem with the availability of the Primary. You do it because you like to at the moment for whatever reason. Failover is what you do if the Primary is no longer available, respectively what can be done automatically by the Observer. Depending on the nature of the Primary outage, this may cause a re-creation of a new standby afterwards. If the Ex-Primary is only „stale“ after the Failover, it can be reinstated withoutg the need to re-create. You see the observer doing that automatically in this article. This is possible because the Ex-Primary is not damaged in the scenario.