Archiv für die Kategorie TOI

Real-Time Query and Automatic Block Media Recovery in 11gR2

Yesterday, I posted about dealing with Block Corruption in general. Special case is Block Corruption in a Data Guard Environment, where we introduced the New Feature Automatic Block Media Recovery (ABMR). Yes, another cool abbrevation from the  the Oracle Realm 🙂

To demonstrate this, I have created a Data Guard Configuration as described in my whitepaper 11g Data Guard on the command line, that you may get from the Downloads page. It’s still 11gR1 but can easily be adopted for 11gR2. Will publish an 11gR2 version of it there soon. My scenario looks like this:

SYS@prima > select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
DGMGRL> show configuration;

Configuration - myconf

 Protection Mode: MaxAvailability
 Databases:
 prima - Primary database
 physt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database physt;

Database - physt

 Role:            PHYSICAL STANDBY
 Intended State:  APPLY-ON
 Transport Lag:   0 seconds
 Apply Lag:       0 seconds
 Real Time Query: ON
 Instance(s):
 physt

Database Status:
SUCCESS

The first 11g New Feature related to Block Corruption in a Data Guard Environment is less striking than ABMR and similar as restoring blocks from Flashback Logs: If we have a Physical Standby Database present, the corrupted blocks can be restored from there very fast. I corrupted the emp table of scott in the same way as in the previous posting. After using RMAN to validate as shown also already, I do the Blockrecovery

RMAN> blockrecover corruption list;

Starting recover at 17-NOV-10
using channel ORA_DISK_1
finished standby search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 17-NOV-10

I need to emphasize that this restore of blocks from the Physical Standby does not require Real-Time Query there – so we don’t need to purchase Active Data Guard for that. But ABMR does. First ABMR scenario: Block Corruption happens on the Primary Database. I do the same steps as shown in the previous posting to corrupt the block containing the emp table of scott. Now scott connects and selects on the emp table. After a short (about 1 sec) break, the select produces the correct result set! The alert log file of the Primary Database records the following:

Wed Nov 17 08:54:29 2010
Hex dump of (file 4, block 131) in trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_5169.trc
Corrupt block relative dba: 0x01000083 (file 4, block 131)
Bad header found during buffer read
Data in bad block:
 type: 0 format: 2 rdba: 0xffc00000
 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000000
 check value in block header: 0xff66
 block checksum disabled
Reading datafile '/home/oracle/prima/users01.dbf' for corruption at rdba: 0x01000083 (file 4, block 131)
Reread (file 4, block 131) found same corrupt data (no logical check)
Starting background process ABMR
Wed Nov 17 08:54:29 2010
ABMR started with pid=33, OS id=5174
Automatic block media recovery service is active.
Automatic block media recovery requested for (file# 4, block# 131)
Wed Nov 17 08:54:30 2010
Automatic block media recovery successful for (file# 4, block# 131)
Automatic block media recovery successful for (file# 4, block# 131)
WARNING: AutoBMR fixed mismatched on-disk single block ffc00000 with in-mem rdba 1000083.

 

An end user will probably not even notice the Block Corruption on the Primary Database. Second ABMR scenario: Block Corruption happens on the Physical Standby Database. I destroy the block of the emp table in the same way as on the Primary before. Scott now connects on the Physical Standby and does a select on the emp table. Again a short break – no error message but the correct result set is returned! Alert Log of the Physical Standby records:

Wed Nov 17 08:58:10 2010
Hex dump of (file 4, block 131) in trace file /home/oracle/physt/diag/rdbms/physt/physt/trace/physt_ora_5281.trc
Corrupt block relative dba: 0x01000083 (file 4, block 131)
Bad header found during buffer read
Data in bad block:
 type: 0 format: 2 rdba: 0xffc00000
 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000000
 check value in block header: 0xff66
 block checksum disabled
Reading datafile '/home/oracle/physt/users01.dbf' for corruption at rdba: 0x01000083 (file 4, block 131)
Reread (file 4, block 131) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 4, block# 131)
Wed Nov 17 08:58:11 2010
Automatic block media recovery requested for (file# 4, block# 131)
Automatic block media recovery successful for (file# 4, block# 131)
Automatic block media recovery successful for (file# 4, block# 131)

Conclusion: We have just seen another benefit of Real-Time Query. Apart from it’s main purpose of making it possible to offload Read-Only (or even Read-Mostly) Applications to the Physical Standby Database, it is now in 11gR2 also leveraged to resolve Block Corruption fast and transparently. We call this 11gR2 New Feature Automatic Block Media Recovery (ABMR)

 

 

, , , , , , ,

14 Kommentare

Dealing with Oracle Database Block Corruption

recover

Media errors don’t always destroy files completely. Sometimes, only small parts of the file are damaged respectively corrupted. It may even not be noticed by end users or admins for a while. This article shows how to detect block corruption and recover from it.  The demo is done on 11g but the shown techniques work in the same way for 12c also. I have corrupted blocks on my demo database affecting the emp table of the user scott:

SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/home/oracle/prima/users01.dbf'

This shows that not the whole tablespace is affected:

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT';

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
DEPT                           USERS
EMP                            USERS

Whenever we get these kind of error messages, we need to check all the blocks. Typically, error messages about block corruption come up during an RMAN backup, but I like to defer that a little to show an 11g New Feature before. Checking all blocks now:

RMAN> validate check logical database;

Starting validate at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              17594        38400           277491
 File Name: /home/oracle/prima/system01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              13854
 Index      0              4487
 Other      0              2465

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              20381        25600           277631
 File Name: /home/oracle/prima/sysaux01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              869
 Index      0              957
 Other      0              3393

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              541          22784           277631
 File Name: /home/oracle/prima/undotbs01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              0
 Index      0              0
 Other      0              22243

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    FAILED 0              1133         1280            271968
 File Name: /home/oracle/prima/users01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              10
 Index      0              0
 Other      1              137

validate found one or more corrupt blocks
See trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_18316.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              612
Finished validate at 16-NOV-10

We have already a couple of 11g New Features here: The syntax has changed from backup validate (since 9i) to just validate (11g) – probably to make clear that this does not perform a backup but a check of corrupted blocks instead. Before 11g, the command did not show the verbose list of checked respectively corrupted blocks like we see above.

The addition check logical will also check for logical block corruption, which is not done by default.

Checking all the blocks here is more efficient than doing an immediate recovery of the one block mentioned in the error message above. There may be many more not spotted yet. Same is true for an ordinary backup that would interrupt at the first spotted corrupted block as we will see later on.

The validate command populated the view v$database_block_corruption, that is now internally read by RMAN in order to repair all the found corrupted blocks. The next 11g New Feature here is: It will take the block out of the Flashback Logs, if present there!

RMAN> blockrecover corruption list;

Starting recover at 16-NOV-10
using channel ORA_DISK_1
searching flashback logs for block images
finished flashback log search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 16-NOV-10

I was so bold that I did not even take a backup before – to make sure this new feature must be used:

RMAN> list backup;

specification does not match any backup in the repository

I’m going to take a backup now, but before that, I cause again block corruption. So we will see that RMAN stops at the first noticed corrupted block. No Third-Party-Tool would recognize the block corruption, BTW, so we have another reason to actually use RMAN here. If we say backup check logical database instead of just backup database, RMAN will also check for logical block corruption during the backup.

[oracle@uhesse-pc skripte]$ rman target sys/oracle@prima

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Nov 16 15:22:14 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PRIMA (DBID=1967518488)

RMAN> backup database;

Starting backup at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=110 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
channel ORA_DISK_1: finished piece 1 at 16-NOV-10
piece handle=/home/oracle/flashback/PRIMA/backupset/2010_11_16/o1_mf_ncsnf_TAG20101116T152221_6g54wzkb_.bkp tag=TAG20101116T152221 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf

Again the same sequence as above validate check logical database & blockrecover corruption list will solve the problem. During the whole process, the users tablespace remains online and usable, except the emp table of scott.

Conclusion: We have a powerful tool with RMAN to spot and repair corrupted blocks by using intact versions of the corrupted blocks from backup (since 9i already) or even from Flashback Logs (since 11g) – which is probably faster – while keeping up the availability of the affected tablespace.

, , , ,

25 Kommentare

Real-Time Query for Read-Mostly Applications

Here is one nice little demonstration from my last 11gR2 Data Guard class that I’d like to share with the Oracle Community. It is about enabling Read-Mostly Applications to run on a Physical Standby database, leveraging the 11g New Feature Real-Time Query (part of the Active Data Guard Option). One of the main 11g New Features is the possibility to run Reports anytime on our Physical Standby while it is doing Redo Apply. I have posted about that already here. Now what if we have an Application that is mainly reading but needs to write also to a small amount? That is a problem at first sight:

DGMGRL> show configuration

Configuration - myconf

 Protection Mode: MaxAvailability
 Databases:
 prima - Primary database
 physt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database physt

Database - physt

 Role:            PHYSICAL STANDBY
 Intended State:  APPLY-ON
 Transport Lag:   0 seconds
 Apply Lag:       0 seconds
 Real Time Query: ON
 Instance(s):
 physt

Database Status:
SUCCESS

This is my (11gR2) Data Guard Configuration. Now my „Read-Mostly Application“ – enter Scott:

SYS@prima > create user scott identified by tiger;

User created.
SYS@prima > alter user scott quota unlimited on users;

User altered.

SYS@prima > grant create session,create table to scott;

Grant succeeded.
SYS@prima > connect scott/tiger@prima

Connected.
SCOTT@prima>  CREATE TABLE DEPT
 (DEPTNO NUMBER(2),
 DNAME VARCHAR2(14),
 LOC VARCHAR2(13) );

INSERT INTO DEPT VALUES (10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO DEPT VALUES (20, 'RESEARCH',   'DALLAS');
INSERT INTO DEPT VALUES (30, 'SALES',      'CHICAGO');
INSERT INTO DEPT VALUES (40, 'OPERATIONS', 'BOSTON');
commit;
Commit complete.

SCOTT@prima > create table flag (database_role varchar2(30), start_date date, end_date date);

Table created.

My Read-Mostly Applications inserts into the flag-table, then run a „huge“ report and inserts into the flag table again:

SCOTT@prima > @report

PL/SQL procedure successfully completed.


1 row created.


Commit complete.


 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON


1 row created.


Commit complete.

The content of the report.sql script:

SCOTT@prima > host cat report.sql
vari c char(30)
exec select SYS_CONTEXT('USERENV','DATABASE_ROLE') into :c from dual;
insert into flag values(:c, sysdate,null);
commit;
select * from dept;
insert into flag values(:c, null, sysdate);
commit;

This is right now not possible to run on the Physical Standby because of the insert commands:

SCOTT@prima > connect scott/tiger@physt

Connected.

SCOTT@physt > @report

PL/SQL procedure successfully completed.

insert into flag values(:c, sysdate,null)
 *
ERROR at line 1:
ORA-16000: database open for read-only access

Commit complete.

 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON

insert into flag values(:c, null, sysdate)
 *
ERROR at line 1:
ORA-16000: database open for read-only access

Commit complete.

We need to enable that in an Application-Transparent way. Therefore, we create a Dummy-user to switch to silently when Scott accesses the Physical Standby:

SYS@prima > grant create session to rmostly identified by rmostly;

Grant succeeded.

SYS@prima > grant select on scott.dept to rmostly;

Grant succeeded.

SYS@prima > grant all on scott.flag to rmostly;

Grant succeeded.

SYS@prima > grant create synonym to rmostly;

Grant succeeded.

The user rmostly now gets synonyms with the same name as the Original Applications table. The table that need to get inserts is reached via a database link that points to the Primary – modifying it there.

SYS@prima > create public database link prod connect to scott identified by tiger using 'prod';

Database link created.

SYS@prima > connect rmostly/rmostly@prima

Connected.

RMOSTLY@prima > create synonym dept for scott.dept;

Synonym created.

RMOSTLY@prima > create synonym flag for scott.flag@prod;

Synonym created.

The used connect descriptor prod should always lead to the Primary Database even after Role Changes. We will discuss that later on in more detail. Now we need a Logon Trigger that silently switches to the rmostly schema if Scott connects to the Physical Standby:

SYS@prima > CREATE or replace TRIGGER switch_schema_trigger
 AFTER LOGON ON scott.schema
 BEGIN
 IF (SYS_CONTEXT('USERENV','DATABASE_ROLE')
 IN ('PHYSICAL STANDBY'))
 THEN
 execute immediate
 'alter session set current_schema = rmostly';
 END IF;
end;
/
Trigger created.

Also, we need a connect descriptor that always connects Scott to the Physical Standby regardless of Role Changes. Now we are going to setup this part:

SYS@prima > exec dbms_service.create_service('prod','prod')

PL/SQL procedure successfully completed.

SYS@prima > exec dbms_service.create_service('rmostly','rmostly')

PL/SQL procedure successfully completed.

SYS@prima > exec dbms_service.start_service('prod')

PL/SQL procedure successfully completed.

SYS@prima > @physt
Connected.
SYS@physt > exec dbms_service.start_service('rmostly')

PL/SQL procedure successfully completed.

Please notice that the rmostly service was started on the Physical Standby. For now, the services are on the appropriate Database, but the following trigger ensures that this will be still the case after Role Changes like switchover:

SYS@prima > CREATE or replace TRIGGER service_management AFTER STARTUP ON DATABASE
DECLARE
 VROLE VARCHAR(30);
 VOPEN_MODE VARCHAR(30);
BEGIN
 SELECT DATABASE_ROLE INTO VROLE FROM V$DATABASE;
 SELECT OPEN_MODE INTO VOPEN_MODE FROM V$DATABASE;
 IF VROLE = 'PRIMARY' THEN begin
 DBMS_SERVICE.START_SERVICE ('PROD');
 DBMS_SERVICE.STOP_SERVICE ('RMOSTLY');
 end;
 ELSIF VROLE = 'PHYSICAL STANDBY' THEN begin
 IF VOPEN_MODE like 'READ ONLY%' THEN
 DBMS_SERVICE.START_SERVICE ('RMOSTLY');
 END IF;
 DBMS_SERVICE.STOP_SERVICE ('PROD');
 end;
 END IF;
END;
/
Trigger created.

The tnsnames.ora used by the Application Clients looks like this:

$ cat $ORACLE_HOME/network/admin/tnsnames.ora
PROD =
 (DESCRIPTION =
 (ADDRESS_LIST =
 (ADDRESS = (PROTOCOL = TCP)(HOST = uhesse1)(PORT = 1521))
 (ADDRESS = (PROTOCOL = TCP)(HOST = uhesse2)(PORT = 1521))
 )
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = prod)
 )
 )

RMOSTLY =
 (DESCRIPTION =
 (ADDRESS_LIST =
 (ADDRESS = (PROTOCOL = TCP)(HOST = uhesse1)(PORT = 1521))
 (ADDRESS = (PROTOCOL = TCP)(HOST = uhesse2)(PORT = 1521))
 )
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = rmostly)
 )
 )

In effect, an Application Client connecting to the Physical Standby with RMOSTLY can insert into the Primary table while selecting from the Physical Standby. We assume, that the main part of that Application is doing select, so we have still offloaded the major work from the Primary to the Physical Standby:

SCOTT@rmostly > alter session set nls_date_format='hh24:mi:ss';

Session altered.

SCOTT@rmostly > @report

PL/SQL procedure successfully completed.


1 row created.


Commit complete.


 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON


1 row created.


Commit complete.

SCOTT@rmostly > select * from flag;

DATABASE_ROLE                  START_DA END_DATE
------------------------------ -------- --------
PRIMARY                        14:24:35
PRIMARY                                 14:24:35
PHYSICAL STANDBY               14:53:21
PHYSICAL STANDBY                        14:53:21

The SQL prompt BTW changes accordingly because of this little login.sql:

SCOTT@rmostly > host cat login.sql
set lines 800
set pages 300
column name format a50
column file_name format a50
column client_info format a10
column machine format a20
column MASTER_LINK format a10
column parameters format a10
column def_parameters format a10
SET SQLPROMPT "_USER'@'_CONNECT_IDENTIFIER > "

Conclusion: We can even offload Read-Mostly Applications to Physical Standby Databases in 11g with little effort!

, , ,

6 Kommentare