Dealing with Oracle Database Block Corruption in 11g

When we look at physical damage of datafiles, the effects are not always so drastic that the file is completely destroyed. Sometimes, only small parts of the file are damaged resp. corrupted. It may very well not even noticed for a while. This posting is designed to show how to a) detect block corruption and b) recover from it. First requirement for a demonstration is to actually get a block corruption. The following little script damages one block of which the table emp from user scott consists:

sqlplus -s system/manager <<stop
set echo off
set heading off
set feed off
set verify off

spool rowid.sh
select min(dbms_rowid.rowid_block_number(rowid)) from scott.emp;
spool off

spool blksize.sh
select value from v\$parameter where name ='db_block_size';
spool off

spool file.sh
select file_name from dba_data_files
 where tablespace_name = (select tablespace_name from dba_tables
 where table_name = 'EMP'
 and owner = 'SCOTT');
spool off
exit
stop

FILE=`cat file.sh`
BLKSIZ=`cat blksize.sh`
START1=`cat rowid.sh`;
START2=`expr ${START1} + 1`

dd if=`echo ${FILE}` of=tab1 bs=`echo ${BLKSIZ}` count=`echo ${START1}`
dd if=`echo ${FILE}` of=tab2 bs=`echo ${BLKSIZ}` count=1
dd if=`echo ${FILE}` of=tab3 bs=`echo ${BLKSIZ}` skip=${START2}

cat tab1 > file
cat tab2 >> file
cat tab3 >> file

cp file ${FILE}

You don’t need that kind of script in production :-)

[oracle@uhesse-pc skripte]$ ./blockrec.sh

 131

8192

/home/oracle/prima/users01.dbf
131+0 records in
131+0 records out
1073152 bytes (1.1 MB) copied, 0.002112 seconds, 508 MB/s
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 2.9e-05 seconds, 282 MB/s
1149+0 records in
1149+0 records out
9412608 bytes (9.4 MB) copied, 0.02027 seconds, 464 MB/s

The block that contained all the rows of the little emp table is now damaged. The demo is done on 11.2.0.2 but the mentioned 11g New Features are already present in 11.1.0.6

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

Please notice that not the whole tablespace is affected:

SQL> select * from scott.emp;

 EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
 7369 SMITH      CLERK           7902 17-DEC-80        800                    20
 7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
 7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
 7566 JONES      MANAGER         7839 02-APR-81       2975                    20
 7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
 7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
 7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
 7788 SCOTT      ANALYST         7566 09-DEC-82       3000                    20
 7839 KING       PRESIDENT            17-NOV-81       5000                    10
 7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
 7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
 7900 JAMES      CLERK           7698 03-DEC-81        950                    30
 7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
 7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

14 rows selected.

This is still possible if the block is read from the buffer cache. Therefore:

SQL> alter system flush buffer_cache;

System altered.

SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/home/oracle/prima/users01.dbf'

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT';

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
DEPT                           USERS
EMP                            USERS

Whenever we get these kind of error messages, we need to check all the blocks. Typically, we get them during an RMAN backup, but I like to defer that a little to show an 11g New Feature before. Checking all blocks now:

RMAN> validate check logical database;

Starting validate at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              17594        38400           277491
 File Name: /home/oracle/prima/system01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              13854
 Index      0              4487
 Other      0              2465

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              20381        25600           277631
 File Name: /home/oracle/prima/sysaux01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              869
 Index      0              957
 Other      0              3393

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              541          22784           277631
 File Name: /home/oracle/prima/undotbs01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              0
 Index      0              0
 Other      0              22243

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    FAILED 0              1133         1280            271968
 File Name: /home/oracle/prima/users01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              10
 Index      0              0
 Other      1              137

validate found one or more corrupt blocks
See trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_18316.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              612
Finished validate at 16-NOV-10

We have already a couple of 11g New Features here: The syntax has changed from backup validate (since 9i) to just validate (11g) – probably to make clear that this does not perform a backup but a check of corrupted blocks instead. Before 11g, the command did not show the verbose list of checked resp. corrupted blocks like we see above. The addition check logical will also check for logical block corruption, which is not done by default. Checking all the blocks here is more efficient than doing an immediate recovery of the one block mentioned in the error message above. There may be many more not spotted yet. Same is true for an ordinary backup that would interrupt at the first spotted corrupted block as we will see later on. The validate command populated the view v$database_block_corruption, that is now internally read by RMAN in order to repair all the found corrupted blocks. The next 11g New Feature here is: It will take the block out of the Flashback Logs, if present there!

RMAN> blockrecover corruption list;

Starting recover at 16-NOV-10
using channel ORA_DISK_1
searching flashback logs for block images
finished flashback log search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 16-NOV-10

I was so bold that I did not even take a backup before – to make sure this new feature must be used:

RMAN> list backup;

specification does not match any backup in the repository

I’m going to take a backup now, but before that, I trigger again block corruption. So we will see that RMAN stops at the first noticed corrupted block. No Third-Party-Tool would recognize the block corruption, BTW, so we have another reason to actually use RMAN here. If we say backup check logical database instead of just backup database, RMAN will also check for logical block corruption during the backup.

[oracle@uhesse-pc skripte]$ rman target sys/oracle@prima

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Nov 16 15:22:14 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PRIMA (DBID=1967518488)

RMAN> backup database;

Starting backup at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=110 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
channel ORA_DISK_1: finished piece 1 at 16-NOV-10
piece handle=/home/oracle/flashback/PRIMA/backupset/2010_11_16/o1_mf_ncsnf_TAG20101116T152221_6g54wzkb_.bkp tag=TAG20101116T152221 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf

Again the same sequence as above validate check logical database & blockrecover corruption list will solve the problem. During the whole process, the users tablespace remains online and usable, except the emp table of scott.

Conclusion: We have a powerful tool with RMAN to spot and repair corrupted blocks by using intact versions of the corrupted blocks from backup (since 9i already) or even from Flashback Logs (since 11g) – which is probably faster – while keeping up the availability of the affected tablespace.

About these ads

, , , ,

  1. #1 by jason arneil on November 16, 2010 - 17:10

    Hi Uwe,

    As you’ll be aware another great way of dealing with block corruption is to have active dataguard running, then the block corruption can be repaired from the standby, without the user seeing the error.

    jason.

  2. #2 by Uwe Hesse on November 16, 2010 - 17:22

    Jason,
    thank you for mentioning! My plan was to cover that with a separate posting :-)

  3. #3 by Savaş Külah on November 19, 2010 - 17:14

    Hi,

    If 1st block of datafile header have been corrupted, RMAN cannot perform block level recovery on it.

    Also DBVerify(dbv) utility can be used to determine datafile block corruption.

    http://savaskulah.wordpress.com/2010/06/18/how-to-create-a-block-corruption-in-linux/

    Savaş Külah

  4. #4 by Uwe Hesse on November 26, 2010 - 13:13

    Savaş,
    of course, if the Datafile Header is damaged, this is no “ordinary” Block Corruption and cannot be resolved with Blockrecovery. A restore of the whole datafile from backup is needed then, followed by a complete recovery.
    Yes, dbv is another way to detect corrupted blocks – also a legacy export with the conventional path or analyze validate structure would do that. But I consider these as minor options compared to the usage of RMAN to detect corrupted blocks – which is already in so far attractive as RMAN will be used for backups most likely anyway.

  5. #5 by Gokhan Tercan on October 2, 2012 - 18:05

    Hello,

    We performed block recovery using RMAN blockrecover corruption list;
    Recovery completed successfully. Blocks disappeared from v$database_block_corruption
    But after first backup they returned back to v$database_block_corruption as :

    862 452929 1 9898819775590 NOLOGGING
    862 452867 1 9898819775400 NOLOGGING

    When I apply Validate datafile 862; the output is:

    List of Datafiles
    =================
    File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
    —- —— ————– ———— ————— ———-
    862 OK 1 0 1 0
    File Name: +ORADATA/pmuh00/datafile/apps_ts_tx_auto.1142.786154549
    Block Type Blocks Failing Blocks Processed
    ———- ————– —————-
    Data 0 0
    Index 0 0
    Other 0 1

    This output says there is one block marked corrupt. But Status is OK.
    Do you have any idea how these both output can be exist together ?

    Oracle version : 11.2.0.1 / two node RAC with ASM /
    Corruption happened during “alter table move partition” compress.
    Thanks

  6. #6 by Adam Gorge on October 30, 2012 - 12:52

    Very Informative Article!! In case, these steps get failed to fix oracle database block corruption then you should use third party Oracle database recovery software i.e. Stellar Phoenix Oracle Recovery to fix this issue. These software repairs corrupt DBF file as well as all oracle database objects.

  7. #7 by Eduardo on May 24, 2013 - 03:19

    Hi Uwe,

    Thank for share with us your experience in oracle.

    Concerning this posting, can we have the content of the file blockrec.sh.

    Thanks

  8. #8 by Uwe Hesse on May 27, 2013 - 08:25

    You see the content of blockrec.sh displayed in the posting – I wonder whether I should have disclose it, by the way :-) Don’t do that at home!

  9. #9 by myother on July 2, 2013 - 16:43

    Reblogged this on MY DBA Notes.

  10. #10 by Uwe Hesse on July 9, 2013 - 20:12

    Keep on reblogging my postings, I like that :-)

  11. #11 by Gerrit Haase on September 18, 2013 - 18:12

    RMAN> blockrecover corruption list;

    Starting recover at 18-SEP-13
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03002: failure of recover command at 09/18/2013 18:11:13
    RMAN-05009: Block Media Recovery requires Enterprise Edition

  12. #12 by Uwe Hesse on September 19, 2013 - 08:59

    Gerrit, thank you for highlighting that Block Media Recovery is a feature of the Enterprise Edition. It is documented here: http://docs.oracle.com/cd/E11882_01/license.112/e47877/editions.htm#CJACGHEB Tend to forget to mention that because I always deal with EE :-)

  13. #13 by saikat on September 19, 2013 - 14:41

    How to repair inter-block inconsistency which are not identified by RMAN?

  14. #14 by Uwe Hesse on September 19, 2013 - 16:41

    Saikat, generally, RMAN should also be able to identify logical corruption with the VALIDATE CHECK LOGICAL DATABASE command from above. If RMAN cannot detect respectively repair it, you may try dbverify for detection and a complete recovery of the affected datafile afterwards. And you should of course call for Oracle Support :)

  15. #15 by saikat on September 20, 2013 - 07:23

    Hi, Thanks for the reply, actually recently we faced such issue where oracle support analyst mentioned RMAN does not identify inter-block inconsistency…moreoevr, the support analyst said that logical corruptions can’t be repaired using backup/restore/recovery..Only way is to use dbms_repair package…and in that case we will loose the data….Is there any other way to repair logical corruption without loosing data?

  1. Real-Time Query and Automatic Block Media Recovery in 11gR2 « The Oracle Instructor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,715 other followers

%d bloggers like this: