Corrupt database blocks III

July 7, 2013

corruption, SQL

If you don’t know what procrastination is you could start your own blog. Start a post with something like “I worked with … and decided to write three posts about it”. Procrastination is what you experience before the last post is ready; seemingly simple things take forever to get out the door.

This post is important for my own record and shows how to salvage data from a large table with several corrupt blocks. I also had a learning experience I wanted to remember, explaining the redundancy of information here. Previous post was about how to map out tables and the corrupt blocks in a temporary table (that is, not a table created as a global temporary table, but a table you’ll drop when you’re finished.)

Note 61685.1 on My Oracle Support (MOS) shows how to extract data from a corrupt table. This post builds on some of that. Before you begin; data in corrupt blocks are gone. Unless you have special tools and start reading the data files with a tool like DUDE you won’t recover those data. This is about how to recover the data that reside in blocks that have not been marked as corrupt.

One essential function gives you the ROWID from the block number and its file number: DBMS_ROWID.ROWID_CREATE:

select DBMS_ROWID.ROWID_CREATE(1,253071,89,660864,0) from DUAL;  

In the example above 1 means extended rowid; no need to change this, 253071 is the DATA_OBJECT_ID from DBA_OBJECTS, 89 is FILE# (file number) from V$DATABASE_BLOCK_CORRUPTION , and 660864 is BLOCK# (block number) from the same view. The last parameter is for the row number in that block, in this case we want the whole block and therefore starts with row 0.

This is an example of the content of V$DATABASE_BLOCK_CORRUPTION:

FILE#     BLOCK#    BLOCKS  CORRUPTION_CHANGE#  CORRUPTION_TYPE  
89        661112    8       0                   CORRUPT  
89        661048    16      0                   CORRUPT  
89        660984    16      0                   CORRUPT  
89        660920    16      0                   CORRUPT  
89        660864    8       0                   CORRUPT  

In previous post  I created a table SEGMENTS_WITH_CORRUPTION. The following query gives a report of corrupt segments with an interval by rowid using the function above:

select O.OWNER,O.OBJECT_NAME,O.OBJECT_TYPE,  
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) ROWID_OF_FIRST_CORR,  
  DBMS_ROWID.ROWID_CREATE(1,o.DATA_OBJECT_ID,c.file_id,c.start_corrupt+c.corrupt_blocks,0) ROWID_OF_NEXT_CLEAN  
from segments_with_corruption c,DBA_OBJECTS O  
where C.OWNER=O.OWNER  
and c.segment_name=o.object_name;  

```An example of output (slightly redacted so I don't reveal the source)  

OWNER OBJECT_NAME OBJECT_TYPE ROWID_OF_FIRST_CORR ROWID_OF_NEXT_CLEAN
LUSER FUBAR TABLE AAA9yPABZAAChZ4AAA AAA9yPABZAAChaAAAA
LUSER FUBAR TABLE AAA9yPABZAAChY4AAA AAA9yPABZAAChZIAAA
LUSER FUBAR TABLE AAA9yPABZAAChX4AAA AAA9yPABZAAChYIAAA
LUSER FUBAR TABLE AAA9yPABZAAChW4AAA AAA9yPABZAAChXIAAA
LUSER FUBAR TABLE AAA9yPABZAAChWAAAA AAA9yPABZAAChWIAAA

Now, create a new table to hold the good data. Using LUSER.FUBAR as the owner.table_name for the original table:  
  
  

Create table luser.fubar_recover as
Select * from luser.fubar
Where 1=0;

  

Insert into luser.fubar_recover
select *
from luser.fubar
where rowid < ‘AAA9yPABZAAChZ4AAA’;

Insert into luser.fubar_recover
select *
from luser.fubar
where rowid >= ‘AAA9yPABZAAChaAAAA’
and rowid < ‘AAA9yPABZAAChY4AAA’;

In the first example you get all the rows in blocks up to the corrupt one. In the second you get all the rows between two corrupt blocks. In my case I had a large table with several stretches of corrupt and good blocks. I did not want to handcraft all these insert statements, I wanted one statement that could retrieve all the good data from the table.  
  
To achieve that I started with a query that generates interval of corrupt blocks in the order as they are laid out in the table. At this point I was only working on one table and one data file as the query below shows.  

select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
from(
select
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
C.START_CORRUPT
from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
where C.OWNER=O.OWNER
and C.SEGMENT_NAME=O.OBJECT_NAME
and FILE_ID=89 and C.OWNER=’LUSER’ and C.SEGMENT_NAME=‘FUBAR’
order by c.start_corrupt
)
;

The output looked like this:  

FIRST_CORRUPT

FIRST_CLEAN

PREVIOUS_CORRUPT

NEXT_CORRUPT

RN

AAA9yPABZAAChWAAAA

AAA9yPABZAAChWIAAA

AAA9yPABZAAChW4AAA

1

AAA9yPABZAAChW4AAA

AAA9yPABZAAChXIAAA

AAA9yPABZAAChWAAAA

AAA9yPABZAAChX4AAA

2

AAA9yPABZAAChX4AAA

AAA9yPABZAAChYIAAA

AAA9yPABZAAChW4AAA

AAA9yPABZAAChY4AAA

3

AAA9yPABZAAChY4AAA

AAA9yPABZAAChZIAAA

AAA9yPABZAAChX4AAA

AAA9yPABZAAChZ4AAA

4

AAA9yPABZAAChZ4AAA

AAA9yPABZAAChaAAAA

AAA9yPABZAAChY4AAA

5

Oh, well. Layout of the last table could have been better, I experimented with export html-code from SQL Developer that can be pasted into the HTML-tab.  Anyway, it shows stretches of blocks that are corrupt for this file. Now I want to generate WHERE clauses that are used in the select statement that retrieves the good data:  
  
  

with INTERVALS as (
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
from(
select
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
C.START_CORRUPT
from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
where C.OWNER=O.OWNER
and C.SEGMENT_NAME=O.OBJECT_NAME
and FILE_ID=89 and C.OWNER=‘LUSER’ and C.SEGMENT_NAME=‘FUBAR’
order by c.start_corrupt
)
) select ‘where rowid < '’’ || FIRST_CORRUPT || '''' LINE
from INTERVALS
where RN=1
union
select ‘or (rowid >= '’' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || ‘'')’ LINE
from INTERVALS
where RN>1 and NEXT_CORRUPT is not null
union
select ‘or (rowid >= '’' || FIRST_CLEAN || ‘'')’ LINE
from INTERVALS
where NEXT_CORRUPT is null
order by rn
;

This gives the following output:  

LINE
where rowid < ‘AAA9yPABZAAChWAAAA’
or (rowid >= ‘AAA9yPABZAAChXIAAA’ and rowid < ‘AAA9yPABZAAChX4AAA’)
or (rowid >= ‘AAA9yPABZAAChYIAAA’ and rowid < ‘AAA9yPABZAAChY4AAA’)
or (rowid >= ‘AAA9yPABZAAChZIAAA’ and rowid < ‘AAA9yPABZAAChZ4AAA’)
or (rowid >= ‘AAA9yPABZAAChaAAAA)

I can now amend the insert statement by using these where clauses at the end of the select statement:  

insert into LUSER.FUBAR_RECOVER
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid < ‘AAA9yPABZAAChWAAAA’
or (rowid >= ‘AAA9yPABZAAChXIAAA’ and rowid < ‘AAA9yPABZAAChX4AAA’)
or (rowid >= ‘AAA9yPABZAAChYIAAA’ and rowid < ‘AAA9yPABZAAChY4AAA’)
or (rowid >= ‘AAA9yPABZAAChZIAAA’ and rowid < ‘AAA9yPABZAAChZ4AAA’)
or (rowid >= ‘AAA9yPABZAAChaAAAA’);

  
Also note the ROWID hint that tells Oracle how to scan the table. The biggest problem with this query is that it does not work(!) Yes, I'm going to drag you through the learning experience I had when making things automated and slightly more advanced. When executing the SQL above Oracle performs a normal table scan and aborts when it reads the first corrupt block. This can be seen if you look at the execution plan:  
  

explain plan for
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid < ‘AAA9yPABZAAChWAAAA’
or (rowid >= ‘AAA9yPABZAAChXIAAA’ and rowid < ‘AAA9yPABZAAChX4AAA’)
or (rowid >= ‘AAA9yPABZAAChYIAAA’ and rowid < ‘AAA9yPABZAAChY4AAA’)
or (rowid >= ‘AAA9yPABZAAChZIAAA’ and rowid < ‘AAA9yPABZAAChZ4AAA’)
or (rowid >= ‘AAA9yPABZAAChaAAAA’);
select * from table(dbms_xplan.display);

Plan hash value: 3975471795
-———————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-———————————————————————————
| 0 | SELECT STATEMENT | | 11M| 2111M| 838K (1)| 02:47:37 |
|* 1 | TABLE ACCESS FULL| FUBAR | 11M| 2111M| 838K (1)| 02:47:37 |
-———————————————————————————
Predicate Information (identified by operation id):
-————————————————–
1 - filter(ROWID<‘AAA9yPABZAAChWAAAA’ OR ROWID>=‘AAA9yPABZAAChaAAAA’
OR ROWID>=‘AAA9yPABZAAChXIAAA’ AND ROWID<‘AAA9yPABZAAChX4AAA’ OR
ROWID>=‘AAA9yPABZAAChYIAAA’ AND ROWID<‘AAA9yPABZAAChY4AAA’ OR
ROWID>=‘AAA9yPABZAAChZIAAA’ AND ROWID<‘AAA9yPABZAAChZ4AAA’)

  

explain plan for
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid >= ‘AAA9yPABZAAChaAAAA’;
select * from table(dbms_xplan.display);

Plan hash value: 3328636578  
\--------------------------------------------------------------------------------------------  
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |  
\--------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT            |              |  5445K|  1012M|   836K  (1)| 02:47:18 |  
|*  1 |  TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |  
\--------------------------------------------------------------------------------------------  
Predicate Information (identified by operation id):  
\---------------------------------------------------  
   1 - access(ROWID>='AAA9yPABZAAChaAAAA')  

```  
This one uses TABLE ACCESS BY ROWID, but the one that fails uses normal TABLE SCAN. I therefore changed slightly the SQL to generate where clauses:  
  
```
with INTERVALS as (  
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,  
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn  
  from(  
    select   
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,  
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,  
    C.START_CORRUPT  
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O  
    where C.OWNER=O.OWNER  
    and C.SEGMENT_NAME=O.OBJECT_NAME  
    and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR'  
    order by c.start_corrupt  
  )  
)  select 'select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE   
   from INTERVALS  
   where RN=1  
   union  
   select 'union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T  where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE  
   from INTERVALS  
   where RN>1 and NEXT_CORRUPT is not null  
   union  
   select 'union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE  
   from INTERVALS  
   where NEXT_CORRUPT is null  
   order by RN  
  ;  

```  
The resulting SQL used to retrieve good data looks like this:  
  
```
select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid < 'AAA9yPABZAAChWAAAA'  
union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA'  
union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA'  
union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA'  
union select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChaAAAA';  

```  
This one actually runs, but it runs for a long time and it eventually failed later because TEMP ran out of space. The reason for this can also be seen in the execution plan:  
```
Plan hash value: 3469639691  
\------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |  
\------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |              |    11M|  2177M|       |  4680K (78)| 15:36:06 |  
|   1 |  SORT UNIQUE                  |              |    11M|  2177M|  2858M|  4680K (78)| 15:36:06 |  
|   2 |   UNION-ALL                   |              |       |       |       |            |          |  
|*  3 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|       |   836K  (1)| 02:47:18 |  
|*  4 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |  
|*  5 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |  
|*  6 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |  
|*  7 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|       |   836K  (1)| 02:47:18 |  
\------------------------------------------------------------------------------------------------------  
Predicate Information (identified by operation id):  
\---------------------------------------------------  
   3 - access(ROWID<'AAA9yPABZAAChWAAAA')  
   4 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA')  
   5 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA')  
   6 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')  
   7 - access(ROWID>='AAA9yPABZAAChaAAAA')  
   

```  
Look at line 1 and 2 in the plan. I forgot to use UNION ALL instead of UNION and Oracle then goes on to sort this stuff in order to return distinct rows.  The table had over 100M rows and sorting this filled the TEMP tablespace completely. Since I know I want all rows, I changed it to UNION ALL (this is finally a good one you can use):  
  
```
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,  
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn  
  from(  
    select   
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,  
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,  
    C.START_CORRUPT  
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O  
    where C.OWNER=O.OWNER  
    and C.SEGMENT_NAME=O.OBJECT_NAME  
    and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR'  
    order by c.start_corrupt  
  )  
)  select 'select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE   
   from INTERVALS  
   where RN=1  
   union   
   select 'union all select /*+ ROWID(T) */ *  
from LUSER.FUBAR T  where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE  
   from INTERVALS  
   where RN>1 and NEXT_CORRUPT is not null  
   union all  
   select 'union all select /*+ ROWID(T) */ *  
from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE  
   from INTERVALS  
   where NEXT_CORRUPT is null  
   order by RN  
  ;  

```The new execution plan looks like this:  
```
Plan hash value: 439408122  
\---------------------------------------------------------------------------------------------  
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |  
\---------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT             |              |    11M|  2177M|  4181K (81)| 13:56:23 |  
|   1 |  UNION-ALL                   |              |       |       |            |          |  
|*  2 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |  
|*  3 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |  
|*  4 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |  
|*  5 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |  
|*  6 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |  
\---------------------------------------------------------------------------------------------  
Predicate Information (identified by operation id):  
\---------------------------------------------------  
   2 - access(ROWID<'AAA9yPABZAAChWAAAA')  
   3 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA')  
   4 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA')  
   5 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')  
   6 - access(ROWID>='AAA9yPABZAAChaAAAA')  
   

```**To wrap it up**:  
  
If you have a large table with several corrupt blocks and many good ones, you can save the good data by creating a table that maps the bad blocks and then create a select statements that retrieve the rows from all the good blocks.  
  
Best is of course to get your backup and recover routines in order, combine it with a decent fault management so you'll discover the corrupt blocks at an early stage, thereby making it possible to recover the blocks before they have been aged out of your backup system.  
  
If corrupt blocks happens again and again you should see if it happens only with very few tables or many. If the corruption is restricted to few out of many tables it may indicate that it has something to do with how these tables are created and used (thinking about NOLOGGING operations here). In case of frequent and seemingly random corruption I would start thinking about the underlying storage.