Posts Tagged High Availability

Drop an ASM Disk that contains a Voting Disk?

That was a question I got during my present Oracle 11gR2 RAC accelerated course in Duesseldorf: What happens if we drop an ASM Disk that contains a Voting Disk? My answer was: “I suppose that is not allowed” but my motto is “Don’t believe it, test it!” and that is what I did. That is actually one of the good things about doing a course at Oracle University: We can just check out things without affecting critical production systems here in our course environment:

[grid@host01 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   48d3710843274f88bf1eb9b3b5129a7d (ORCL:ASMDISK01) [DATA]
 2. ONLINE   354cfa8376364fd2bfaa1921534fe23b (ORCL:ASMDISK02) [DATA]
 3. ONLINE   762ad94a98554fdcbf4ba5130ac0384c (ORCL:ASMDISK03) [DATA]
Located 3 voting disk(s).

We are on 11.2.0.1 here. The Voting Disk being part of an ASM Diskgroup was an 11gR2 New Feature that I introduced in this posting already. Now let’s try to drop ASMDISK01:

[grid@host01 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Wed Jun 13 17:18:21 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select name,group_number from v$asm_diskgroup;

NAME                           GROUP_NUMBER
------------------------------ ------------
DATA                                      1
ACFS                                      2
FRA                                       3

SQL> select name from v$asm_disk where group_number=1;

NAME
------------------------------
ASMDISK01
ASMDISK02
ASMDISK03
ASMDISK04

SQL> alter diskgroup data drop disk 'ASMDISK01';

Diskgroup altered.

It just did it without error message! We look further:

SQL> select name from v$asm_disk where group_number=1;

NAME
------------------------------
ASMDISK02
ASMDISK03
ASMDISK04

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options
[grid@host01 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   354cfa8376364fd2bfaa1921534fe23b (ORCL:ASMDISK02) [DATA]
 2. ONLINE   762ad94a98554fdcbf4ba5130ac0384c (ORCL:ASMDISK03) [DATA]
 3. ONLINE   3f0bf16b6eb64f3cbf440a3c2f0da2fd (ORCL:ASMDISK04) [DATA]
Located 3 voting disk(s).

It just moved the Voting Disk silently to another ASM Disk of that Diskgroup.  When I try to drop another ASM Disk from that Diskgroup, the command seems to be silently ignored, because 3 ASM Disks are required here to keep the 3 Voting Disks. Similar behavior with External Redundancy:

[grid@host01 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576      9788     9645                0            9645              0             N  ACFS/
MOUNTED  NORMAL  N         512   4096  1048576      7341     6431              438            2996              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576      4894     4755                0            4755              0             N  FRA/

I will move the Voting Disk to the FRA Diskgroup. It is a bug of 11.2.0.1 that the Voting_files flag is not Y for the DATA Diskgroup here, by the way.

[grid@host01 ~]$ sudo crsctl replace votedisk +FRA
Successful addition of voting disk 4d586fbecf664f8abf01d272a354fa67.
Successful deletion of voting disk 354cfa8376364fd2bfaa1921534fe23b.
Successful deletion of voting disk 762ad94a98554fdcbf4ba5130ac0384c.
Successful deletion of voting disk 3f0bf16b6eb64f3cbf440a3c2f0da2fd.
Successfully replaced voting disk group with +FRA.
CRS-4266: Voting file(s) successfully replaced
[grid@host01 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   4d586fbecf664f8abf01d272a354fa67 (ORCL:ASMDISK10) [FRA]
Located 1 voting disk(s).
[grid@host01 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Wed Jun 13 17:36:06 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup fra drop disk 'ASMDISK10';

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options
[grid@host01 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   0b051cf6e6a14ff1bf31ef7bc66098e0 (ORCL:ASMDISK11) [FRA]
Located 1 voting disk(s).

Not sure whether I would dare that all in a production system, though :-)

Conclusion: We can drop ASM Disks that contain Voting Disks as long as there are enough Disks left in the Diskgroup to retain the same number of Voting Disks (each inside a separate Failure Group) afterwards. Apparently – but: “Don’t believe it, test it!”

About these ads

, , ,

2 Comments

After Test-Failover, make NEW Primary Standby again

Maybe we want to test Failover, although the Primary is working fine. After the failover succeeded, we have an OLD Primary then and a NEW Primary. There is a well documented way to convert the OLD Primary into a Standby. This procedure is called Reinstate. This posting shows how to make the NEW Primary a Standby again. The OLD Primary will keep on running as Primary – with all productive users connected there still. A special case why we may want to do that is because we tested Failover to a Snapshot Standby that has no network connection to the Primary.

The initial configuration:

DGMGRL> show configuration

Configuration - myconf

  Protection Mode: MaxPerformance
  Databases:
    prima - Primary database
    physt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

The version is still 11.2.0.1 like in the previous posting. I will now failover to physt while prima keeps running. Attention: If there is a productive service started on the NEW Primary, make sure to stop it. Else new productive connections will access the NEW Primary! We will deliberately cause a Split Brain situation here with two Primary Databases. This may cause problems in a productive environment and is not recommended.

DGMGRL> failover to physt;
Performing failover NOW, please wait...
Error: ORA-16600: not connected to target standby database for failover

Failed.
Unable to failover
DGMGRL> exit
[oracle@uhesse ~]$ dgmgrl sys/oracle@physt
DGMGRL for Linux: Version 11.2.0.1.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> failover to physt;
Performing failover NOW, please wait...
Failover succeeded, new primary is "physt"

The error above shows that we cannot failover, connected to the (still working) Primary but must connect to the Standby first. Now there are two Primary Databases:

[oracle@uhesse ~]$ sqlplus sys/oracle@physt as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Mon Jan 16 10:25:51 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PRIMARY

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> select database_role from v$database;

DATABASE_ROLE
----------------
PRIMARY

I want to make the NEW Primary a Standby again. Similar to a Reinstate, that needs Flashback Logs. My two Databases generate Flashback Logs, so that requirement is met.

SQL> connect sys/oracle@physt as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area  313159680 bytes
Fixed Size            2212936 bytes
Variable Size          264244152 bytes
Database Buffers       41943040 bytes
Redo Buffers            4759552 bytes
Database mounted.
SQL> flashback database to before resetlogs;

Flashback complete.

SQL> alter database convert to physical standby;

Database altered.

This modified the controlfile and puts the Instance in NOMOUNT. We need to restart into MOUNT:

SQL> shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area  313159680 bytes
Fixed Size            2212936 bytes
Variable Size          264244152 bytes
Database Buffers       41943040 bytes
Redo Buffers            4759552 bytes
Database mounted.
SQL> recover managed standby database using current logfile disconnect;
Media recovery complete.
SQL> select status,sequence# from v$managed_standby where process='MRP0';

STATUS          SEQUENCE#
------------ ----------
WAIT_FOR_LOG         12

We want to see here APPLYING LOG – the redo is not yet transmitted from the OLD Primary.

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> select sequence# from v$log where status='CURRENT';

 SEQUENCE#
----------
    13

SQL> alter system set log_archive_dest_2='service=physt db_unique_name=physt';

System altered.

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> connect sys/oracle@physt as sysdba
Connected.
SQL> select status,sequence# from v$managed_standby where process='MRP0';

STATUS          SEQUENCE#
------------ ----------
APPLYING_LOG         15

Everything is fine now on the Database Layer: OLD Primary is still Primary, NEW Primary is again Standby, applying Redo from the OLD Primary. Only the Data Guard Broker is confused now – we need to create a new configuration:

[oracle@uhesse ~]$ dgmgrl sys/oracle@prima
DGMGRL for Linux: Version 11.2.0.1.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration;
ORA-16795: the standby database needs to be re-created

Configuration details cannot be determined by DGMGRL
[oracle@uhesse ~]$ sqlplus sys/oracle@physt as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Mon Jan 16 10:46:13 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system set dg_broker_start=false;

System altered.

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> alter system set dg_broker_start=false;

System altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@uhesse ~]$ rm $ORACLE_HOME/dbs/dr*

The above deleted the Broker Config Files. Now we create a new Broker Configuration:

[oracle@uhesse ~]$ sqlplus sys/oracle@physt as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Mon Jan 16 10:48:22 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system set dg_broker_start=true;

System altered.

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> alter system set dg_broker_start=true;

System altered.

[oracle@uhesse ~]$ dgmgrl sys/oracle@prima
DGMGRL for Linux: Version 11.2.0.1.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> create configuration mycon as primary database is prima connect identifier is prima;
Configuration "mycon" created with primary database "prima"
DGMGRL> add database physt as connect identifier is physt maintained as physical;
Database "physt" added
DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;

Configuration - mycon

  Protection Mode: MaxPerformance
  Databases:
    prima - Primary database
    physt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

That was it :-)

, , ,

1 Comment

Failover to Snapshot Standby

In a recent Data Guard course that I tought, the question came up whether it is possible to failover to a Snapshot Standby if the Primary is lost. My answer was that this is of course possible but will take a longer time, because first the Snapshot Standby need to be converted back to Physical Standby and then all the collected Redo Protocol on the Standby site needs to be applied.

However, when we tried to do it, the Broker refused to do the convert and the failover, complaining about the Primary not being reachable. We needed to do the convert without the Broker then. Today, when I wanted to reproduce the issue, it worked like a charm, though:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0    Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

This is the same version we had in the classroom. Now the configuration:

DGMGRL> show configuration

Configuration - myconf

  Protection Mode: MaxPerformance
  Databases:
    prima - Primary database
    physt - Snapshot standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

I will now shutdown the Primary like there is an emergency on the Primary Site:

[oracle@uhesse ~]$ sqlplus sys/oracle@prima as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Jan 14 09:42:47 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown abort
ORACLE instance shut down.

Now with the Broker, a convert fails:

[oracle@uhesse ~]$ dgmgrl sys/oracle@physt
DGMGRL for Linux: Version 11.2.0.1.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> convert database physt to physical standby;
Converting database "physt" to a Physical Standby database, please wait...
Error: ORA-01034: ORACLE not available
Error: ORA-16625: cannot reach database "prima"

Failed.
Failed to convert database "physt"

That was until this point the same as in the class. But now my failover succeeds, while it did not in the class:

DGMGRL> failover to physt;
Converting database "physt" to a Physical Standby database, please wait...
Operation requires shutdown of instance "physt" on database "physt"
Shutting down instance "physt"...
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "physt" on database "physt"
Starting instance "physt"...
ORACLE instance started.
Database mounted.
Continuing to convert database "physt" ...
Operation requires shutdown of instance "physt" on database "physt"
Shutting down instance "physt"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "physt" on database "physt"
Starting instance "physt"...
ORACLE instance started.
Database mounted.
Database "physt" converted successfully
Performing failover NOW, please wait...
Failover succeeded, new primary is "physt"

All is fine now. It worked as it was supposed to. But it was a little scary, though, that it wasn’t possible to do the very same in the class – not sure what was different there. My advice based upon this experience would be to test the failover to a Snapshot Standby that has no connection to the Primary. Also, consider to configure two Standby Databases and use only one as Snapshot Standby. And notice the manual convert in case:

SQL> alter database convert to physical standby;

, , ,

6 Comments

Follow

Get every new post delivered to your Inbox.

Join 2,429 other followers

%d bloggers like this: