Dealing with Oracle Database Block Corruption

recover

Media errors don’t always destroy files completely. Sometimes, only small parts of the file are damaged respectively corrupted. It may even not be noticed by end users or admins for a while. This article shows how to detect block corruption and recover from it. The demo is done on 11g but the shown techniques work in the same way for 12c also. I have corrupted blocks on my demo database affecting the emp table of the user scott:

SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 131)
ORA-01110: data file 4: '/home/oracle/prima/users01.dbf'

This shows that not the whole tablespace is affected:

SQL> select * from scott.dept;

 DEPTNO DNAME          LOC
---------- -------------- -------------
 10 ACCOUNTING     NEW YORK
 20 RESEARCH       DALLAS
 30 SALES          CHICAGO
 40 OPERATIONS     BOSTON

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT';

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
DEPT                           USERS
EMP                            USERS

Whenever we get these kind of error messages, we need to check all the blocks. Typically, error messages about block corruption come up during an RMAN backup, but I like to defer that a little to show an 11g New Feature before. Checking all blocks now:

RMAN> validate check logical database;

Starting validate at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              17594        38400           277491
 File Name: /home/oracle/prima/system01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              13854
 Index      0              4487
 Other      0              2465

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              20381        25600           277631
 File Name: /home/oracle/prima/sysaux01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              869
 Index      0              957
 Other      0              3393

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              541          22784           277631
 File Name: /home/oracle/prima/undotbs01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              0
 Index      0              0
 Other      0              22243

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    FAILED 0              1133         1280            271968
 File Name: /home/oracle/prima/users01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              10
 Index      0              0
 Other      1              137

validate found one or more corrupt blocks
See trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_18316.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              612
Finished validate at 16-NOV-10

We have already a couple of 11g New Features here: The syntax has changed from backup validate (since 9i) to just validate (11g) – probably to make clear that this does not perform a backup but a check of corrupted blocks instead. Before 11g, the command did not show the verbose list of checked respectively corrupted blocks like we see above.

The addition check logical will also check for logical block corruption, which is not done by default.

Checking all the blocks here is more efficient than doing an immediate recovery of the one block mentioned in the error message above. There may be many more not spotted yet. Same is true for an ordinary backup that would interrupt at the first spotted corrupted block as we will see later on.

The validate command populated the view v$database_block_corruption, that is now internally read by RMAN in order to repair all the found corrupted blocks. The next 11g New Feature here is: It will take the block out of the Flashback Logs, if present there!

RMAN> blockrecover corruption list;

Starting recover at 16-NOV-10
using channel ORA_DISK_1
searching flashback logs for block images
finished flashback log search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 16-NOV-10

I was so bold that I did not even take a backup before – to make sure this new feature must be used:

RMAN> list backup;

specification does not match any backup in the repository

I’m going to take a backup now, but before that, I cause again block corruption. So we will see that RMAN stops at the first noticed corrupted block. No Third-Party-Tool would recognize the block corruption, BTW, so we have another reason to actually use RMAN here. If we say backup check logical database instead of just backup database, RMAN will also check for logical block corruption during the backup.

[oracle@uhesse-pc skripte]$ rman target sys/oracle@prima

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Nov 16 15:22:14 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PRIMA (DBID=1967518488)

RMAN> backup database;

Starting backup at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=110 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/prima/system01.dbf
input datafile file number=00002 name=/home/oracle/prima/sysaux01.dbf
input datafile file number=00003 name=/home/oracle/prima/undotbs01.dbf
input datafile file number=00004 name=/home/oracle/prima/users01.dbf
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 16-NOV-10
channel ORA_DISK_1: finished piece 1 at 16-NOV-10
piece handle=/home/oracle/flashback/PRIMA/backupset/2010_11_16/o1_mf_ncsnf_TAG20101116T152221_6g54wzkb_.bkp tag=TAG20101116T152221 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/16/2010 15:22:22
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/prima/users01.dbf

Again the same sequence as above validate check logical database & blockrecover corruption list will solve the problem. During the whole process, the users tablespace remains online and usable, except the emp table of scott.

Conclusion: We have a powerful tool with RMAN to spot and repair corrupted blocks by using intact versions of the corrupted blocks from backup (since 9i already) or even from Flashback Logs (since 11g) – which is probably faster – while keeping up the availability of the affected tablespace.

11g New Features, Backup & Recovery, flashback, High Availability, RMAN

Dieser Eintrag wurde erstellt am November 16, 2010, 15:45 und wurde abgelegt unter TOI. Du kannst die Antworten auf diesen Beitrag über RSS 2.0 verfolgen. Du kannst eine Antwort schreiben oder einen Trackback von deiner eigenen Seite schicken.

#1 von jason arneil am November 16, 2010 - 17:10

Hi Uwe,

As you’ll be aware another great way of dealing with block corruption is to have active dataguard running, then the block corruption can be repaired from the standby, without the user seeing the error.

jason.
#2 von Uwe Hesse am November 16, 2010 - 17:22

Jason,
thank you for mentioning! My plan was to cover that with a separate posting 🙂
#3 von Savaş Külah am November 19, 2010 - 17:14

Hi,

If 1st block of datafile header have been corrupted, RMAN cannot perform block level recovery on it.

Also DBVerify(dbv) utility can be used to determine datafile block corruption.

How To Create A Block Corruption In Linux?

Savaş Külah
#4 von Uwe Hesse am November 26, 2010 - 13:13

Savaş,
of course, if the Datafile Header is damaged, this is no „ordinary“ Block Corruption and cannot be resolved with Blockrecovery. A restore of the whole datafile from backup is needed then, followed by a complete recovery.
Yes, dbv is another way to detect corrupted blocks – also a legacy export with the conventional path or analyze validate structure would do that. But I consider these as minor options compared to the usage of RMAN to detect corrupted blocks – which is already in so far attractive as RMAN will be used for backups most likely anyway.
#5 von Gokhan Tercan am Oktober 2, 2012 - 18:05

Hello,

We performed block recovery using RMAN blockrecover corruption list;
Recovery completed successfully. Blocks disappeared from v$database_block_corruption
But after first backup they returned back to v$database_block_corruption as :

862 452929 1 9898819775590 NOLOGGING
862 452867 1 9898819775400 NOLOGGING

When I apply Validate datafile 862; the output is:

List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
—- —— ————– ———— ————— ———-
862 OK 1 0 1 0
File Name: +ORADATA/pmuh00/datafile/apps_ts_tx_auto.1142.786154549
Block Type Blocks Failing Blocks Processed
———- ————– —————-
Data 0 0
Index 0 0
Other 0 1

This output says there is one block marked corrupt. But Status is OK.
Do you have any idea how these both output can be exist together ?

Oracle version : 11.2.0.1 / two node RAC with ASM /
Corruption happened during „alter table move partition“ compress.
Thanks
#6 von Adam Gorge am Oktober 30, 2012 - 12:52

Very Informative Article!! In case, these steps get failed to fix oracle database block corruption then you should use third party Oracle database recovery software i.e. Stellar Phoenix Oracle Recovery to fix this issue. These software repairs corrupt DBF file as well as all oracle database objects.
#7 von Eduardo am Mai 24, 2013 - 03:19

Hi Uwe,

Thank for share with us your experience in oracle.

Concerning this posting, can we have the content of the file blockrec.sh.

Thanks
#8 von Uwe Hesse am Mai 27, 2013 - 08:25

You see the content of blockrec.sh displayed in the posting – I wonder whether I should have disclose it, by the way 🙂 Don’t do that at home!
#9 von myother am Juli 2, 2013 - 16:43

Reblogged this on MY DBA Notes.
#10 von Uwe Hesse am Juli 9, 2013 - 20:12

Keep on reblogging my postings, I like that 🙂
#11 von Gerrit Haase am September 18, 2013 - 18:12

RMAN> blockrecover corruption list;

Starting recover at 18-SEP-13
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/18/2013 18:11:13
RMAN-05009: Block Media Recovery requires Enterprise Edition
#12 von Uwe Hesse am September 19, 2013 - 08:59

Gerrit, thank you for highlighting that Block Media Recovery is a feature of the Enterprise Edition. It is documented here: http://docs.oracle.com/cd/E11882_01/license.112/e47877/editions.htm#CJACGHEB Tend to forget to mention that because I always deal with EE 🙂
#13 von saikat am September 19, 2013 - 14:41

How to repair inter-block inconsistency which are not identified by RMAN?
#14 von Uwe Hesse am September 19, 2013 - 16:41

Saikat, generally, RMAN should also be able to identify logical corruption with the VALIDATE CHECK LOGICAL DATABASE command from above. If RMAN cannot detect respectively repair it, you may try dbverify for detection and a complete recovery of the affected datafile afterwards. And you should of course call for Oracle Support 🙂
#15 von saikat am September 20, 2013 - 07:23

Hi, Thanks for the reply, actually recently we faced such issue where oracle support analyst mentioned RMAN does not identify inter-block inconsistency…moreoevr, the support analyst said that logical corruptions can’t be repaired using backup/restore/recovery..Only way is to use dbms_repair package…and in that case we will loose the data….Is there any other way to repair logical corruption without loosing data?
#16 von Prasad Avadhanam am April 28, 2016 - 11:49

I am just wondering on this. If the blocks are corrupted and if there are no backup for the table in question, how would RMAN help?

Regards
-Prasad
#17 von Uwe Hesse am Mai 20, 2016 - 08:39

Saikat, RMAN may spot logical corruption when you say backup check logical or validate check logical. If block recovery doesn’t work, still an ordinary restore and recover may work. DBMS_REPAIR doesn’t actually repair anything but it can mark blocks to skip them for select. In the end, you need a backup from before the corruption (physical or logical) to get back your data – no black magic here.
#18 von Uwe Hesse am Mai 20, 2016 - 08:40

Prasad Avadhanam, yes, thank you for pointing out the obvious: Without a backup you cannot do recovery 🙂
#19 von Rajesh pikku am Mai 31, 2016 - 16:00

very good post. Recentely i had faced same issue.
#20 von Saili am Juni 23, 2016 - 11:46

Hi Uwe,we have an oracle database where repeatedly oracle datafiles are getting corrupted.We are not able to find the root cause after even investigating a lot.When corrupted,simple media recovery and then bringing the datafile online helps.But that is a temporary solution and again another datafile is getting corrupted after some days.
#21 von Saeed am September 13, 2017 - 12:09

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00002
channel ORA_DISK_1: reading from backup piece /backup/full_ORCL_20170831_10401_1.bck
channel ORA_DISK_1: piece handle=/backup/full_ORCL_20170831_10401_1.bck tag=2017-08-31-22-00-03/FULL
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 02:37:46
failover to previous backup

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of repair command at 09/12/2017 22:12:10
RMAN-03015: error occurred in stored script Repair Script
RMAN-06026: some targets not found – aborting restore
RMAN-06023: no backup or copy of datafile 2 found to restore
#22 von Saeed am September 13, 2017 - 12:10

this is my problem what can i do???