How to deal with Tar error ?

Hello Folks,

while making a tar file, got this error

root@clodb:/clodbvg>/opt/freeware/bin/tar cvf -  /oradata | gzip > /clodbvg/bkp_30MAY16.tgz 

/opt/freeware/bin/tar: Removing leading `/' from member names
/oradata/
/oradata/JAVA/
/oradata/JAVA/.toc
/oradata/JAVA/Java131.rte
/oradata/JAVA/Java131.rte.tar
/oradata/Stage11i/
/oradata/lost+found/
/oradata/proddata/
/oradata/proddata/XDOD.dbf
/oradata/proddata/XDOX.dbf
/opt/freeware/bin/tar: /oradata/proddata/applsysd02.dbf: Read error at byte 4767080960, reading 10240 bytes: There is an input or output error.
/oradata/proddata/applsysd03.dbf
/oradata/proddata/applsysd04.dbf
/oradata/proddata/applsysd05.dbf
/oradata/proddata/applsysx01.dbf
/oradata/proddata/applsysx02.dbf

how to troubleshoot this problem? because when I restored the tar file, oracle doesn't work, because applsysd02.dbf contains some tables required by oracle. any suggestions?

root@clodb:/oradata>du -sg /oradata/proddata/applsysd02.dbf
6.74    /oradata/proddata/applsysd02.dbf

You restored from this when tar told you the backup was bad? You intentionally restored a bad backup over good files?

I don't suppose you have a backup of the backup, do you?

@filosophizer:

Can you clarify the problem?

Is this the following your procedure so far?

  1. You restored the backup from above created tar-file(with the shown read error)
  2. Your oracle-DB does not work after restore
  3. Now you want to fix the problem(=non-functional-oracle-db) that arose from the tar-recovery
Read error at byte 4767080960, reading 10240 bytes: There is an input or output error.

Whatever the other issue may be: This message means you probably have a bad storage device(hard disk) which should be replaced and probably better sooner than later.

as always:

# oslevel -s
# rpm -q tar
# lsuser -a fsize fsize_hard root

Agent.KGB

root@clodb:/oradata>oslevel -s
6100-05-01-1016

root@clodb:/oradata>rpm -q tar
tar-1.14-2

root@clodb:/oradata>/opt/freeware/bin/tar
/opt/freeware/bin/tar: You must specify one of the `-Acdtrux' options
Try `/opt/freeware/bin/tar --help' for more information.

root@clodb:/oradata>/opt/freeware/bin/tar --version
tar (GNU tar) 1.14
Copyright (C) 2004 Free Software Foundation, Inc.
This program comes with NO WARRANTY, to the extent permitted by law.
You may redistribute it under the terms of the GNU General Public License;
see the file named COPYING for details.
Written by John Gilmore and Jay Fenlason.

root@clodb:/oradata>lsuser -a fsize fsize_hard root
root fsize=-1 fsize_hard=-1

root@clodb:/oradata>

@stomp
yes, I want to migrate oracle vg (including filesystems) from internal disk to SAN disk, for that reason I used tar to make a tar file of all the filesystems in oraclevg and then untar them on SAN Disk, however, due to error/bad sector in one of the oracle main files which is applsysd02.dbf, after untar (unzip) it doesn't work because the tar/zip was not completely done for applsysd02.dbf, so some tables are missing.

This internal disk cannot be replaced because
1- part of vios
2- storage pool -- assigned to lpar

@corona688

I am trying to make a backup of the good files, which failed because of one file applsysd02.dbf which is over bad sector/whatever the issue is.

Possible Solutions:
1- Would copying the file in question applsysd02.dbf to another filesystem and then taking backup help ?

My first question is what Oracle product is this?

Oracle databases usually have running processes which keep the dB files open all the time so you must use the defined Oracle dB shutdown command to stop those processes and close the dB files before you can run a backup. Additionally, some Oracle products also come with their own Oracle backup utilities so you don't use OS based utilities.

If Oracle processes have the dB open then that could be causing your error.

OK, first things first: this is NOT a " tar error"! This is tar complaining that it can't read a file properly. So the error is with the file somehow (undetermined: everything from a corrupt file system to some problem with the file itself, maybe some DB process still running and locking the file while tar was working on it), not with tar . It is only natural that your database stopped working once you restored such a faulty backup.

I'd first try to find out what is wrong with the file. The following command will do nothing to correct anything but might shed light upon what is wrong (don't do it while the DB is running!):

dd if=/oradata/proddata/applsysd02.dbf of=/dev/null bs=1024

If this command works your file is OK and it might have been a temporary condition (like the DB stilll running). If this produces an error you will see where exactly in the file the problem is (because you read it in 1k-chunks).

I hope this helps.

bakunin

1 Like

Oracle EBS 11.5.9
Database 9i
Two tier config
Database on one server
Apps on another server

@hicksd8
Database in no archive mode, so when I make tar/gunzip, I will shutdown Database and then make tar/zip ;

yes, that's true, you can use RMAN (Oracle Backup Tool) but for that you have to make it in archive log mode...

@Bakunin
thank you,

Database if off, and

root@clodb:/clodbvg>/opt/freeware/bin/tar cvf -  /oradata | gzip > /clodbvg/bkp_30MAY16.tgz
/opt/freeware/bin/tar: Removing leading `/' from member names
/oradata/
.........
.......
/oradata/proddata/apd01.dbf
/oradata/proddata/applsysd01.dbf
/oradata/proddata/applsysd02.dbf
/opt/freeware/bin/tar: /oradata/proddata/applsysd02.dbf: Read error at byte 4767080960, r       eading 10240 bytes: There is an input or output error.
/oradata/proddata/applsysd03.dbf
.....
......
oradata/proddata/zfax01.dbf
/oradata/proddata/zsad01.dbf
/oradata/proddata/zsax01.dbf
/opt/freeware/bin/tar: Error exit delayed from previous errors
root@clodb:/clodbvg>


root@clodb:/oradata>cp /oradata/proddata/applsysd02.dbf /backup/applsysd02.dbf
cp: /oradata/proddata/applsysd02.dbf: There is an input or output error.


root@clodb:/oradata>dd if=/oradata/proddata/applsysd02.dbf of=/dev/null bs=1024
dd: 0511-051 The read failed.
: There is an input or output error.
4655364+0 records in.
4655364+0 records out.
root@clodb:/oradata>


root@clodb:/oradata>lsvg oradbvg
VOLUME GROUP:       oradbvg                  VG IDENTIFIER:  00c7780e00004c000000014977a0c65e
VG STATE:           active                   PP SIZE:        128 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      1439 (184192 megabytes)
MAX LVs:            256                      FREE PPs:       86 (11008 megabytes)
LVs:                4                        USED PPs:       1353 (173184 megabytes)
OPEN LVs:           4                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     2032                     MAX PVs:        16
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      non-relocatable

root@clodb:/oradata>chvg -by oradbvg

root@clodb:/oradata>lsvg oradbvg
VOLUME GROUP:       oradbvg                  VG IDENTIFIER:  00c7780e00004c000000014977a0c65e
VG STATE:           active                   PP SIZE:        128 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      1439 (184192 megabytes)
MAX LVs:            256                      FREE PPs:       86 (11008 megabytes)
LVs:                4                        USED PPs:       1353 (173184 megabytes)
OPEN LVs:           4                        QUORUM:         2 (Enabled)
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     2032                     MAX PVs:        16
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

Question:

Follow up from here:

  1. Listing or viewing errors in Tar
gunzip < test.tgz | tar tvf -

will not list the error which happened while making backup (see above)

/opt/freeware/bin/tar: /oradata/proddata/applsysd02.dbf: Read error at byte 4767080960, reading 10240 bytes: There is an input or output error.

Is there a way to know if the tar file test.tgz has files which had problems ?

In general you should at this point be very carefully what you do. Because your diskstorage is about to die. If you do another 10 or 20 tests then the disk maybe finally dead beyond any affordable repair.

The most important thing to do, is to create an exact 1:1 copy of the problematic disks. If you have that, make another copy from those new created disks. The last disk(s) are the working copy. There you can do experiments with and if the experiments fail. you can create another set of working copies.

Maybe it's time to hire a specialist, who can handle such a situation - depends of course on how important your data is for you.

(I'm not quite sure if AIX-tar works equally like the linux tar, but it may help.)

You can try a test-run with tar(Watch at dash before tvf, which is different from your command):

gunzip <test.tgz  | tar -tvf -

You should see filenames running on your console. When tar stops because of the error, all missing files may be lost.

That's the only thing I can say to this.

Other options require additional work at setup-time of your backup. That would indeed be a good idea for a change for your backup procedures. (Instead of tar, you can use cpio with "CRC"-Format, which is checksumming your data. You can create checksums manually for all backed up files before the backup.

I have already said it and i will repeat it here: THIS IS NOT A TAR -ERROR! It is an error reading the disk, tar just reports that error.

What you need to do includes: repair the filesystem (which might well be beyond repair), get another disk, recreate the correct contents of the file somehow (maybe database methods might help, like rolling back archive logs, etc.).

Look: you have a file and some part of that file is not readable, because it is damaged. Regardless of the tool you use to read it - cat , tar , cpio , whatever - all these tools will fail for the exact same reason: some part of the file is not readable. If you give a book to someone and he complains that page 245 is missing you are not going to "correct the persons reading" - you need to provide the missing page for the complaint to go away. For the same reason there is nothing you can do with tar or the file it produces, only with the reason for the error it reports.

I hope this helps.

bakunin

sorry, if my question was misunderstood, what i meant was, how to check if a tar file has all the files in it without any errors / problems

like in our case, the file applsysd02.dbf had problems when we made the tar

opt/freeware/bin/tar: /oradata/proddata/applsysd02.dbf: Read error at byte 4767080960, reading 10240 bytes: There is an input or output error.

but when checking the contents of the tar file, it doesn't mention that the file applsysd02.dbf had problems



telnet (clodb)


AIX Version 6
Copyright IBM Corporation, 1982, 2010.
login: root
root's Password:
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 6.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************


root@clodb:/clodbvg>gunzip < bkp_30MAY16_oradata.tgz | tar tvf -
drwxrwsrwx 500 502       0 Jun 03 21:01:49 2010 oradata/
drwxrwsrwx 500 502       0 Jun 12 12:43:53 2005 oradata/JAVA/
-rwxrwxrwx 500 502    1560 Jun 12 12:43:53 2005 oradata/JAVA/.toc
-rwxrwxrwx 500 502 29389824 Dec 06 19:51:01 2001 oradata/JAVA/Java131.rte
........
.........
-rw-r--r-- 500 502 3460308992 May 31 01:34:33 2016 oradata/proddata/applsysd01.dbf
-rw-r--r-- 500 502 7235182592 May 31 01:34:35 2016 oradata/proddata/applsysd02.dbf
-rw-r----- 500 502 6501179392 May 31 01:34:36 2016 oradata/proddata/applsysd03.dbf
-rw-r----- 500 502 2306875392 May 31 01:34:36 2016 oradata/proddata/applsysd04.dbf
-rw-r----- 500 502 2306875392 May 31 01:34:36 2016 oradata/proddata/applsysd05.dbf
......
......
......
-rw-r----- 500 502 1572872192 Jan 09 12:52:40 2014 oradata/proddata/temp01.dbf
tar: 0511-169 A directory checksum error on media; 27 not equal to 29210.
root@clodb:/clodbvg>

My question : How can a user find out if his/her tar file has some files with errors/problems or no errors/problems

OK, i indeed have misunderstood the question. Sorry, my bad.

Well, there are diagnostic messages and a return code when you create the tar-file. These are always to be observed when creating an archive non-interactively (like inside a batch). In general it is good style to check return-codes in scripts, regardless of which command it is.

This is bad style:

command1
command2
command3

this is better and safer:

if ! command1 ; then
     echo "command1 failed" >> errorlog
fi
if ! command2 ; then
     echo "command2 failed, aborting." >> errorlog
     exit 1
fi
[...]

If you take your tar-archive this way you will notice upon creation that something was rotten in the state of Denmark.

Now, if you only have the archive and restore files: you can't know (at least not for sure) if the content is corrupt or not. The content of a tar -archive are archived files. tar can find out if the archive itself is damaged (by taking and testing checksums), but if a wrongly read file is archived correctly it will be "correct" from tar s POV.

I hope this helps.

bakunin

Do you really believe that the Oracle database content is corrupt? Wouldn't you detect that in the Oracle error log for day-to-day operation?

I'm not an Oracle DBA but I used to have one work for me when I was a system manager. I remember that in the situation of potentially corrupt data inside the dB he would use an Oracle utility to 'export' all the database out to a flat file (this could take a long time if the dB was huge), empty the dB of data, and then import all the data back in. He would watch for errors during the export and the export utility would take care of most things.

@hicksd8
The Database or Datafile is not corrupt (see here -- posted on Oracle Forums)

SQL> select * from v$database_block_corruption ;
 
 
no rows selected
 
 
SQL>

This is what I am thinking, export all data from applsysd02.dbf to temporary.dbf
delete applsysd02.dbf
create applsysd02.dbf in another filesystem
import data into applsysd02.dbf from temporary.dbf

But I am not sure, if this will work for Oracle Database as applsysd02.dbf contains Apps Top Tier information, so what I wanted is confirmation from DBA -- hence posted on Oracle forums.

May be someone here knows if this works :slight_smile:

Sorry to say that, but this might not be the case. Maybe the database just isn't aware of the corrupted blocks.

No, no: an "export" is the dumping of the DBs (not only a single DB files) contents to a (human-readable) file, basically an ASCII-file. This will dump *all* the contents: table content, constraints, views, dependencies, and what else there is in the DB and cannot be created (like indices). From such a file you rebuild the complete DB by importing it into an empty container.

Does that answer your question?

I hope this helps.

bakunin

2 Likes

Thanks Bakunin, I guess there is nothing that could be done at this point.