Hi Freinds,
I have 2 files . one is source.txt and second one is target.txt. I want to keep source.txt as baseline and compare target.txt. please find the data in 2 files and Expected output.
Source.txt
1|HYD|NAG|TRA|34.5|1234
2|CHE|ESW|DES|36.5|134
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123
5|KIN|NAV|PRA|31.5|135
target.txt
1|HYD|NAG|TRA|34.5|1234
2|CHE|EW|DES|33.5|134
5|KIN|NV|PRA|31.5|136
Expected output should be :
Number of Extra records in Source file : 2
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123
Number of mismatches:
KEYFIELD|COLUMN_NUMBER|SOURCE_VALUE|TARGET_VALUE
2 |3 |ESW |EW
2 |5 |36.5 |33.5
5 |3 |NAV |NV
Please help. I am Very new to Unix shell scripting .
PikK45
July 25, 2012, 2:29am
2
I guess this is already discussed!! I saw almost the same examples too!!
Hi Pikka45 , yes we have done this with the follwing code
paste -d '|' File1.txt File2.txt | awk -F '|' '{c=NF/2;for(i=1;i<=c;i++)if($i!=$(i+c))printf "line %-5s field %s\n",NR,i}'
The above code is used only for the below condition :
1)when we have equal number of records where all the keyfields (let assume first field) present in both the source.txt and target.txt
2) When we have source.txt.count < target.txt.count and all the keyfields (let assume 1st filed) are present in both the source.txt and target.txt.
3) it is not working when source.txt.count > target.txt.count and if there are mismatches .
I hope you understand the above cases. Kindly help if there is any chance to cover the 3rd senario.
---------- Post updated at 12:56 PM ---------- Previous update was at 12:10 PM ----------
Hi Friends, Can anyone look into the Thread .. Plz help..
Chirel
July 25, 2012, 4:20am
4
Hi, check this
#!/bin/bash
>extra.txt
>mismatch.txt
while read sLine; do
OFS="$IFS"
IFS="|"
sTab=( $sLine );
tLine="$(egrep "^"${sTab[0]} target.txt)"
if [ -z "$tLine" ]; then echo "$sLine" >>extra.txt; IFS="$OFS"; continue; fi
tTab=( $tLine );
for (( i = 1 ; i < ${#sTab[@]} ; i++ )); do
[ "${sTab[$i]}" = "${tTab[$i]}" ] || echo "${sTab[0]}|$i|${sTab[$i]}|${tTab[$i]}" >>mismatch.txt
done
IFS="$OFS"
done <source.txt
echo "Number of Extra records in Source file : $(cat extra.txt|wc -l)"
cat extra.txt
echo "Number of mismatches : $(cat mismatch.txt|wc -l)"
cat mismatch.txt
1 Like
@Chirel : Thank you so much it worked as expected. I have one doubt. if i use for some 50,000 records with 50 columns it is taking more time. is there any way we can reduce the timing and increase the performance ? Plz help ..
open my $fh,"<a.txt";
while(<$fh>){
my @tmp = split("[|]",$_);
my @t = @tmp[1..$#tmp];
$hash{$tmp[0]} = \@t;
}
close $fh;
while(<DATA>){
my @tmp = split("[|]",$_);
if(not exists $hash{$tmp[0]}){
print;
}
else{
my @t = @{$hash{$tmp[0]}};
my @diff;
for(my $i=0;$i<=$#t;$i++){
if($t[$i] ne $tmp[$i+1]){
push @diff, ($i+2,$t[$i],$tmp[$i+1]);
}
}
print join "|", ($tmp[0],@diff) if $#diff>=0;
print "\n";
}
}
__DATA__
1|HYD|NAG|TRA|34.5|1234
2|CHE|ESW|DES|36.5|134
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123
5|KIN|NAV|PRA|31.5|135
a.txt
1|HYD|NAG|TRA|34.5|1234
2|CHE|EW|DES|33.5|134
5|KIN|NV|PRA|31.5|136