ynixon
March 25, 2007, 6:52pm
1
Hi,
I am trying to update a MASTER file by a NEW file which may contain fewer records.
The update should use a key (2 first fields), here is a senario:
MASTER:
a;b;0
a;c;0
a;d;0
NEW:
a;c;1
the result should be:
a;b;0
a;c;1
a;d;0
can you recommend me a way to do it?
10x
Y.N.
if you have Python, here's an alternative:
#!/usr/bin/python
new = open("new").readlines()
fi = open("file").readlines()
for li in fi:
li = li.strip().split(";")
for li2 in new:
li2 = li2.strip().split(";")
if li2[0] == li[0] and li2[1] == li[1]:
print ';'.join(li2)
else:
print ';'.join(li)
Plz give a try on this...
awk -F";" 'BEGIN {OFS=";"; i=1; while((getline line < "NEW")>0) arr[i++]=line; } { for(j=1;j<i;j++) { split(arr[j],temp,";"); if (($1==temp[1])&&($2==temp[2])) {$3=temp[3];} }print; }' MASTER
ynixon
March 26, 2007, 8:03am
4
this is working fast
thank you
ynixon
March 27, 2007, 4:13am
5
Plz give a try on this...
awk -F";" 'BEGIN {OFS=";"; i=1; while((getline line < "NEW")>0) arr[i++]=line; } { for(j=1;j<i;j++) { split(arr[j],temp,";"); if (($1==temp[1])&&($2==temp[2])) {$3=temp[3];} }print; }' MASTER
I forgot to mention that I also need to add to the result new records from NEW that does not exist in MASTER.
for example:
MASTER:
a;b;0
a;c;0
a;d;0
NEW:
a;c;1
a;e;2
the result should be:
a;b;0
a;c;1
a;d;0
a;e;2
anbu23
March 27, 2007, 4:37am
6
ynixon:
I forgot to mention that I also need to add to the result new records from NEW that does not exist in MASTER.
for example:
MASTER:
a;b;0
a;c;0
a;d;0
NEW:
a;c;1
a;e;2
the result should be:
a;b;0
a;c;1
a;d;0
a;e;2
awk -F";" ' BEGIN {OFS=";"; while( getline < "NEW" ) arr[$1";"$2]=$3; }
{ if( arr[$1";"$2] !~ /^ *$/ ) { $3=arr[$1";"$2]; print; delete arr[$1";"$2] }
else print
}
END {
for ( key in arr ) {
if( arr[key] !~ /^ *$/ ) { print key";"arr[key] }
} } ' MASTER
ynixon
March 27, 2007, 5:44am
7
It is even working faster
ynixon, here is one more solution:
cut -d";" -f1,2 new_file > temp_new_file
egrep -v -f temp_new_file master_file > temp_egrep_file
cat temp_egrep_file new_file | sort > final_file
reborg
March 27, 2007, 8:14pm
9
awk 'BEGIN{FS=OFS=";"} {arr[$1";"$] = $3 } END{ for ( i in arr ) {print i,arr}}' master newfile
#! /opt/third-party/bin/perl
open(FILE, "<", "m") || die "Unable to open file <$!>\n";
while(<FILE>) {
chomp;
@arr = split(/;/);
$j = "$arr[0];$arr[1]";
$fileHash{$j} = $arr[2];
}
close(FILE);
open(FILE, "<", "n") || die "Unable to open file <$!>\n";
while(<FILE>) {
chomp;
@arr = split(/;/);
$j = "$arr[0];$arr[1]";
if( exists $fileHash{$j} ) {
$fileHash{$j} = $arr[2];
}
else {
$fileHash{$j} = $arr[2];
}
}
close(FILE);
foreach my $key ( keys %fileHash ) {
print "$key;$fileHash{$key}\n";
}
exit 0
this should be even more faster
ynixon
March 28, 2007, 5:54am
11
thank for you all - you are great
I improved the scenario:
the MASTER table contain the last timestamp of the update (field 4)
the NEW table remains the same - without any timestamp
the output timestamp should be updated only if changed
MASTER:
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630
NEW:
a;b;0
a;c;1
a;e;2
the result should be:
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926
#! /opt/third-party/bin/perl
open(FILE, "<", "m") || die "Unable to open file <$!>\n";
while(<FILE>) {
chomp;
@arr = split(/;/);
$j = "$arr[0];$arr[1]";
$fileHash{$j} = "$arr[2];$arr[3]";
}
close(FILE);
open(FILE, "<", "n") || die "Unable to open file <$!>\n";
while(<FILE>) {
chomp;
@arr = split(/;/);
$j = "$arr[0];$arr[1]";
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$var = $arr[2] . ";" . (1900 + $year) . $mon . $mday . $hour . $min . $sec;
if( exists $fileHash{$j} ) {
@val = split(/;/, $fileHash{$j});
$fileHash{$j} = $var if( $val[0] != $arr[2] );
}
else {
$fileHash{$j} = $var;
}
}
close(FILE);
foreach my $key ( keys %fileHash ) {
print "$key;$fileHash{$key}\n";
}
exit 0
anbu23
March 28, 2007, 6:24am
13
awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"}
{ if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
END{ for ( i in arr ) {print i,arr}}' master new
ynixon
March 28, 2007, 6:48am
14
the timestamp is not as expected
ynixon
March 28, 2007, 6:49am
15
can you add a simple sort to the output (within the perl)
anbu23
March 28, 2007, 7:09am
16
How do you want add timestamp?
ynixon
March 28, 2007, 7:17am
17
the timestamp should be updated only if the 3rd parameter is changed
anbu23
March 28, 2007, 7:20am
18
$ cat new
a;b;0
a;c;1
a;e;2
$ cat master
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630
$ awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"}
> { if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
> END{ for ( i in arr ) {print i,arr}}' master new
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926
ynixon
March 28, 2007, 7:21am
19
found the soludtion:
foreach my $key ( sort keys %fileHash ) {
print "$key;$fileHash{$key}\n";
}
Is it right ?
ynixon
March 28, 2007, 7:26am
20
anbu23:
$ cat new
a;b;0
a;c;1
a;e;2
$ cat master
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630
$ awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"}
> { if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
> END{ for ( i in arr ) {print i,arr}}' master new
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926
sorry you are right , It is OK .
how can I sort the output within the awk ?
:rolleyes: what is faster " awk '{.....}' | sort " or in the awk itself ?