Kevin Kempf's Blog

March 26, 2010

More Logical Database Corruption

Filed under: 11g, Oracle — kkempf @ 10:59 am

Days which begin at 1am with a phone call sometimes end well

When the SAN corrupts every host residing on it, it has the makings for an unpleasant day.  Luckily in my case, this didn’t affect my 11i database as it was happily living on another array.  It did affect the 11i application server and many, many ancillary application servers and databases.   A hard crash is never good, and losing archivelogs makes it worse.

Identifying the corruption

While bringing up these smaller databases, I ran logical validation via RMAN as a precaution, and much to my chagrin, I ran into my old friend again, in the system tablespace:

RMAN> run {
allocate channel d1 type disk;
backup validate check logical database;
release channel d1;
}
2> 3> 4> 5>
allocated channel: d1
channel d1: SID=28 device type=DISK

Starting backup at 25-MAR-10
channel d1: starting full datafile backup set
channel d1: specifying datafile(s) in backup set
input datafile file number=00005 name=/u01/labworks/labworksdata/data/labworks01.dbf
input datafile file number=00001 name=/u01/labworks/labworksdata/system/system01.dbf
input datafile file number=00003 name=/u01/labworks/labworksdata/system/sysaux01.dbf
input datafile file number=00006 name=/u01/labworks/labworksdata/data/labworks_indx01.dbf
input datafile file number=00002 name=/u01/labworks/labworksdata/system/undotbs01.dbf
input datafile file number=00004 name=/u01/labworks/labworksdata/data/users01.dbf
channel d1: backup set complete, elapsed time: 00:04:17
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    FAILED 0              54288        131075          5975563226245
File Name: /u01/labworks/labworksdata/system/system01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              58249
Index      0              14807
Other      598            3728

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              0            52608           5975563226259
File Name: /u01/labworks/labworksdata/system/undotbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              0
Index      0              0
Other      0              52608

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              64790        131093          5975563226243
File Name: /u01/labworks/labworksdata/system/sysaux01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              21298
Index      0              18315
Other      0              26669

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              3486         4097            475461
File Name: /u01/labworks/labworksdata/data/users01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              15
Index      0              2
Other      0              593

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              55802        393216          5975563226259
File Name: /u01/labworks/labworksdata/data/labworks01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              335501
Index      0              0
Other      0              1913

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6    OK     0              49670        65536           5975563225369
File Name: /u01/labworks/labworksdata/data/labworks_indx01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              0
Index      0              13816
Other      0              2050

validate found one or more corrupt blocks
See trace file /u01/labworks/labworksdb/11.2.0/diag/rdbms/labworks/LABWORKS/trace/LABWORKS_ora_32399.trc for details
channel d1: starting full datafile backup set
channel d1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel d1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              626
Finished backup at 25-MAR-10
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03008: error while performing automatic resync of recovery catalog
RMAN-20033: control file SEQUENCE# too low

RMAN> exit

Confirming the corruption from the database (fractured, all zero & corrupt!)

I knew I’d need to know what the database v$database_block_corruption view thought was up:

sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Thu Mar 25 14:16:25 2010

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Release 11.2.0.1.0 - 64bit Production

SQL> select * from v$database_block_corruption;

FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
1     113181         64                  0 FRACTURED
1      98825        509                  0 ALL ZERO
1      98824          1                  0 CORRUPT
1      98823          1                  0 ALL ZERO
1      98800         23                  0 CORRUPT

Determining affected objects

Time to see what segments were affected.  This is scary stuff, being in the system tablespace, I was already feeling like this wasn’t going to end well, but when I queried the blocks (or a good sampling of them, you can see from the report above basically blocks 113181-113245 and 98800 – 99324 were affected

SQL> select tablespace_name, segment_type, owner, segment_name
from dba_extents  where file_id=1 and 113181 between block_id and block_id + blocks -1;

no rows selected

Lather, rinse, repeat.  Many times.  I didn’t do every block, but since they were contiguous, I took a reasonable sample before forming the idea that none of these blocks happened to be tied to a segment.

Resolution Options

In Metalink/MOS Doc ID 28814.1 (Handling Block Corruptions in Oracle 7 / 8 / 8i / 9i / 10g / 11g) it states: “An error on an UNUSED Oracle block can be ignored as Oracle will create a new block image should the block need to be used so any existing problem on the block will never get read”.  Sounds great, Oracle confirms my suspicion that this is not serious.  However, I don’t really want to see corrupted every time I look at this database, nor do I want my weekly job to fail for this database every week for something which isn’t really an issue.  Surely, Oracle must have some package or procedure with which you can “reset” unused, logically corrupt blocks so they don’t stay out there as corrupt?  Nothing I could readily find.

Fill it up, Drop it

I decided to go the simpler route (again): create a massive table in the affected tablespace, which would at some point (hopefully) fill those blocks, then drop the table.

SQL> create table logical_corruption tablespace system as select * from sys.access$;

Table created.

SQL> insert into logical_corruption select * from sys.access$;

94990 rows created.
SQL> /

Repeat, until the tablespace has something like 20MB free.

rman target / nocatalog

Recovery Manager: Release 11.2.0.1.0 – Production on Thu Mar 25 14:29:50 2010

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: LABWORKS (DBID=3344015257)
using target database control file instead of recovery catalog

RMAN> backup validate check logical datafile 1;

Starting backup at 25-MAR-10
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=25 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/labworks/labworksdata/system/system01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:26
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              19905        131075          5975564162526
File Name: /u01/labworks/labworksdata/system/system01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data       0              93229
Index      0              14807
Other      0              3131

channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              626
Finished backup at 25-MAR-10

RMAN> exit

Recovery Manager complete.

Finally, check with the data dictionary:

sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Thu Mar 25 14:30:45 2010

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Release 11.2.0.1.0 - 64bit Production

SQL> drop table logical_corruption;

Table dropped.

SQL> select * from v$database_block_corruption;

no rows selected

Parting Shots

I was exceptionally lucky (again) in that the affected segments (or lack of them in this case) of my logical corruption were unimportant or unallocated.  Clearly, this could have been anywhere in my database.  If it had been elsewhere, I’d be using RMAN to perform a block level recovery, and the end result might have been an incomplete database recovery.  If, however, you find yourself in the same situation as me, you can at least rid yourself of annoying OEM and RMAN status’ of logical corruption, using a similar method.

Advertisements

Blog at WordPress.com.