my issue is that the directory structure in "arch_all.tar.gz" is different than the directory strutcure in batch1 2 3 and 4 . I need to find missing files in batch1 2 3 and 4.
tar tf all.tar.gz | grep ".xml" > all-xml-file-list
#if you dont have file list from batch ,create it
rm -f batch-file-list
for i in bat*.tar.gz
do
tar tf $i | grep ".xml" >> batch-file-list
done
echo "get missing list"
grep -v -f batch-file-list all-xml-file-list
Note that in tar files i look only for xml files, you might need to modify a bit.
I don't see that greet_sed's suggestion makes any attempt to extract just the last component of any of the pathnames in your two files. You didn't show us how your tar archives are created and you haven't bothered to tell us what operating system or shell you're using. The following awk script should work even if your archives contain directories in addition to regular files, but if your archives only contain regular files, the code could be simplified:
awk -F/ '
!$NF { next
}
NR == FNR {
files[$NF]
next
}
{ delete files[$NF]
}
END { for(file in files)
print file
}
' arch_all combined
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .
I am not a unix expert, can you please give me the code to print the final result with difference in both the list files assuming the first file name is all-xml-file-list.lst and combinted batch file with list is "batch-file-list.lst". Please note the *.lst files do have file names with directory structure in them.
Thanks
---------- Post updated at 05:15 AM ---------- Previous update was at 04:58 AM ----------
Please ignore my previous update. I was able to pull only the file names using your AWK script. Now I am using the compare using grep -f -v option.
Thanks
---------- Post updated at 05:22 AM ---------- Previous update was at 05:15 AM ----------
I am completely at a loss from your above statements. In your first post in this thread you said you had two files (one that you referred to as arch_all and one that you said "Combined file with files from batch1 batch2 batch3 and batch4" which my script assumed was named combined ). If you had given the names of those two files (in that order) as the names of the files on the last line of the script I provided, the output would have been the output you requested! I.e., the names of the files in 1st input file (after discarding the directories in which those files were located) that were not found in the 2nd input file (after discarding the directories in which those files were located). So, what are you now trying to do with grep -v -f that wasn't already done by the code I provided???