Beiträge getaggt mit flashback

Dealing with Oracle Database Block Corruption

Veröffentlicht von Uwe Hesse in TOI am November 16, 2010

recover

Media errors don’t always destroy files completely. Sometimes, only small parts of the file are damaged respectively corrupted. It may even not be noticed by end users or admins for a while. This article shows how to detect block corruption and recover from it. The demo is done on 11g but the shown techniques work in the same way for 12c also. I have corrupted blocks on my demo database affecting the emp table of the user scott:

SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/home/oracle/prima/users01.dbf'

This shows that not the whole tablespace is affected:

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT';

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
DEPT                           USERS
EMP                            USERS

Whenever we get these kind of error messages, we need to check all the blocks. Typically, error messages about block corruption come up during an RMAN backup, but I like to defer that a little to show an 11g New Feature before. Checking all blocks now:

RMAN> validate check logical database;

Starting validate at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              17594        38400           277491
 File Name: /home/oracle/prima/system01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              13854
 Index      0              4487
 Other      0              2465

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              20381        25600           277631
 File Name: /home/oracle/prima/sysaux01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              869
 Index      0              957
 Other      0              3393

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              541          22784           277631
 File Name: /home/oracle/prima/undotbs01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              0
 Index      0              0
 Other      0              22243

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    FAILED 0              1133         1280            271968
 File Name: /home/oracle/prima/users01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              10
 Index      0              0
 Other      1              137

validate found one or more corrupt blocks
See trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_18316.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              612
Finished validate at 16-NOV-10

We have already a couple of 11g New Features here: The syntax has changed from backup validate (since 9i) to just validate (11g) – probably to make clear that this does not perform a backup but a check of corrupted blocks instead. Before 11g, the command did not show the verbose list of checked respectively corrupted blocks like we see above.

The addition check logical will also check for logical block corruption, which is not done by default.

Checking all the blocks here is more efficient than doing an immediate recovery of the one block mentioned in the error message above. There may be many more not spotted yet. Same is true for an ordinary backup that would interrupt at the first spotted corrupted block as we will see later on.

The validate command populated the view v$database_block_corruption, that is now internally read by RMAN in order to repair all the found corrupted blocks. The next 11g New Feature here is: It will take the block out of the Flashback Logs, if present there!

RMAN> blockrecover corruption list;

Starting recover at 16-NOV-10
using channel ORA_DISK_1
searching flashback logs for block images
finished flashback log search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 16-NOV-10

I was so bold that I did not even take a backup before – to make sure this new feature must be used:

RMAN> list backup;

specification does not match any backup in the repository

I’m going to take a backup now, but before that, I cause again block corruption. So we will see that RMAN stops at the first noticed corrupted block. No Third-Party-Tool would recognize the block corruption, BTW, so we have another reason to actually use RMAN here. If we say backup check logical database instead of just backup database, RMAN will also check for logical block corruption during the backup.

[oracle@uhesse-pc skripte]$ rman target sys/oracle@prima

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Nov 16 15:22:14 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PRIMA (DBID=1967518488)

RMAN> backup database;

Starting backup at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=110 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
channel ORA_DISK_1: finished piece 1 at 16-NOV-10
piece handle=/home/oracle/flashback/PRIMA/backupset/2010_11_16/o1_mf_ncsnf_TAG20101116T152221_6g54wzkb_.bkp tag=TAG20101116T152221 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf

Again the same sequence as above validate check logical database & blockrecover corruption list will solve the problem. During the whole process, the users tablespace remains online and usable, except the emp table of scott.

Conclusion: We have a powerful tool with RMAN to spot and repair corrupted blocks by using intact versions of the corrupted blocks from backup (since 9i already) or even from Flashback Logs (since 11g) – which is probably faster – while keeping up the availability of the affected tablespace.

11g New Features, Backup & Recovery, flashback, High Availability, RMAN

25 Kommentare

Using Flashback in a Data Guard Environment

Veröffentlicht von Uwe Hesse in TOI am August 6, 2010

If Logical Mistakes happen, we can address them with the Flashback techniques, introduced in Oracle Database 10g already, even if in an Data Guard Environment. In case of „Flashback Table To Timestamp“ or „Flashback Table To Before Drop“, there is nothing special to take into account regarding the Standby Database. It will simply replicate these actions accordingly.

If we do „Flashback Database“ instead, that needs a special treatment of the Standby Database. This posting is designed to show you how to do that:

DGMGRL> show configuration

Configuration

 Name:                mycf

 Enabled:             YES

 Protection Mode:     MaxAvailability

 Databases:

 prima - Primary database

 physt - Physical standby database

Fast-Start Failover: DISABLED

Current status for "mycf":

SUCCESS

This is an 11g Database, but the shown technique should work the same with 10g also. Prima & Physt are both creating Flashback Logs:

SQL> connect sys/oracle@prima as sysdba

Connected.

SQL> select database_role,flashback_on from v$database;

DATABASE_ROLE    FLASHBACK_ON

---------------- ------------------

PRIMARY          YES

SQL> connect sys/oracle@physt as sysdba

Connected.

SQL> select database_role,flashback_on from v$database;

DATABASE_ROLE    FLASHBACK_ON

---------------- ------------------

PHYSICAL STANDBY YES

I will now introduce the „Logical Mistake“ on the Primary Database:

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC

---------- -------------- -------------

 10 ACCOUNTING     NEW YORK

 20 RESEARCH       DALLAS

 30 SALES          CHICAGO

 40 OPERATIONS     BOSTON

SQL> drop user scott cascade;

User dropped.

The Redo Protocol gets transmitted with SYNC to the Standby Database and is applied there with Real-Time Apply. In other words: The Logical Mistake has already reached the Standby Database. We could have configured a Delay in the Apply there to address such scenarios. But that is somewhat „old fashioned“; the modern way is to go with flashback. The background behind that is, that in case of a Disaster, hitting the Primary Site, a Delay would cause a longer Failover time. I will now flashback the Primary to get back Scott:

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup mount

ORACLE instance started.

Total System Global Area  313860096 bytes

Fixed Size                  1299624 bytes

Variable Size             230689624 bytes

Database Buffers           75497472 bytes

Redo Buffers                6373376 bytes

Database mounted.

SQL> flashback database to timestamp systimestamp - interval '15' minute;

Flashback complete.

SQL> alter database open resetlogs;

Database altered.

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC

---------- -------------- -------------

 10 ACCOUNTING     NEW YORK

 20 RESEARCH       DALLAS

 30 SALES          CHICAGO

 40 OPERATIONS     BOSTON

There he is again! Until now, that was not different from a Flashback Database Operation without Data Guard. But now my Standby Database is no longer able to do Redo Apply, because it is „in the future of the Primary Database“. We are in step 2) of the below picture now that I added to illustrate the situation.

Now I need to put the Standby it into a time, shortly before the present time of the Primary, in order to restart the Redo Apply successfully:

DGMGRL> show configuration

Configuration

 Name:                mycf

 Enabled:             YES

 Protection Mode:     MaxAvailability

 Databases:

 prima - Primary database

 physt - Physical standby database

Fast-Start Failover: DISABLED

Current status for "mycf":

Warning: ORA-16607: one or more databases have failed

DGMGRL> show database physt statusreport

STATUS REPORT

 INSTANCE_NAME   SEVERITY ERROR_TEXT

 *      ERROR ORA-16700: the standby database has diverged
                         from the primary database

 *      ERROR ORA-16766: Redo Apply is stopped

Please notice that the show statusreport clause is a new feature of 11g. In 10g, you need to look into the Broker Logfile to retrieve that problem.

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> select resetlogs_change# from v$database;

RESETLOGS_CHANGE#

-----------------

SQL> connect sys/oracle@physt as sysdba

Connected.

SQL> flashback database to scn 294221;

Flashback complete.

I subtracted 2 from the Resetlogs Change No. above to make sure that we get the Standby close before the present time of the Primary. Now we need to restart the Redo Apply again:

DGMGRL> show configuration

Configuration

 Name:                mycf

 Enabled:             YES

 Protection Mode:     MaxAvailability

 Databases:

 prima - Primary database

 physt - Physical standby database

Fast-Start Failover: DISABLED

Current status for "mycf":

Warning: ORA-16607: one or more databases have failed

DGMGRL> edit database physt set state=apply-on;

Succeeded.

DGMGRL> show configuration

Configuration

 Name:                mycf

 Enabled:             YES

 Protection Mode:     MaxAvailability

 Databases:

 prima - Primary database

 physt - Physical standby database

Fast-Start Failover: DISABLED

Current status for "mycf":

SUCCESS

That was it!
Conclusion: Flashback is the natural counterpart of Real-Time Apply. You can address Logical Mistakes easily with it and you do not need to Delay the actualization of the Standby Database.

Data Guard, flashback, High Availability

29 Kommentare

Turning Flashback Database on & off with Instance in Status OPEN

Veröffentlicht von Uwe Hesse in TOI am Juni 25, 2010

keep_it_up

Since 11g, you can turn Flashback Database on and off without having to restart the Instance. It can stay in status OPEN, whereas in 10g, you had to go in Status MOUNT to change the Flashback mode, similar as it is still necessary to go to MOUNT in order to change the Archivelog mode.

Here is a short demo on the command line:

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

PL/SQL Release 11.2.0.1.0 - Production

CORE    11.2.0.1.0      Production

TNS for Linux: Version 11.2.0.1.0 - Production

NLSRTL Version 11.2.0.1.0 - Production

SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON

------------ ------------------

ARCHIVELOG   NO

SQL> select status from v$instance;

STATUS

------------

OPEN

I am already in Archivelog mode, but Flashback (Database) is not turned on yet. Instance is OPEN – and stays there:

SQL> alter database flashback on;

Database altered.

SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON

------------ ------------------

ARCHIVELOG   YES

SQL> alter database flashback off;

Database altered.

SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON

------------ ------------------

ARCHIVELOG   NO

Conclusion: If you are about to do a critical operation with your database that you may want to rewind in case, you can turn on Flashback Logging before that operation and turn it off afterwards without limiting the availability of your Database!

Watch me on YouTube explaining the above:

Keep in mind that this affects only your ability to do FLASHBACK DATABASE, not the other forms of Flashback like FLASHBACK QUERY or FLASHBACK TABLE because they do not need Flashback Logs and work regardless of what you see in the column FLASHBACK_ON from V$DATABASE as explained here.

11g New Features, flashback

44 Kommentare