update a file by key

ynixon · March 25, 2007, 6:52pm

Hi,
I am trying to update a MASTER file by a NEW file which may contain fewer records.
The update should use a key (2 first fields), here is a senario:
MASTER:
a;b;0
a;c;0
a;d;0

NEW:
a;c;1

the result should be:
a;b;0
a;c;1
a;d;0

can you recommend me a way to do it?

10x
Y.N.

ghostdog74 · March 25, 2007, 8:02pm

if you have Python, here's an alternative:

#!/usr/bin/python
new = open("new").readlines()
fi = open("file").readlines()
for li in fi:
    li = li.strip().split(";")
    for li2 in new:
        li2 = li2.strip().split(";")
        if li2[0] == li[0] and li2[1] == li[1]:
            print ';'.join(li2)
        else:
            print ';'.join(li)

dennis.jacob · March 26, 2007, 1:25am

Plz give a try on this...

awk -F";" 'BEGIN {OFS=";"; i=1; while((getline line < "NEW")>0) arr[i++]=line; }  { for(j=1;j<i;j++) { split(arr[j],temp,";"); if (($1==temp[1])&&($2==temp[2])) {$3=temp[3];} }print; }' MASTER

ynixon · March 26, 2007, 8:03am

this is working fast
thank you

ynixon · March 27, 2007, 4:13am

jacoden:

Plz give a try on this...

awk -F";" 'BEGIN {OFS=";"; i=1; while((getline line < "NEW")>0) arr[i++]=line; }  { for(j=1;j<i;j++) { split(arr[j],temp,";"); if (($1==temp[1])&&($2==temp[2])) {$3=temp[3];} }print; }' MASTER

I forgot to mention that I also need to add to the result new records from NEW that does not exist in MASTER.
for example:

MASTER:
a;b;0
a;c;0
a;d;0

NEW:
a;c;1
a;e;2

the result should be:
a;b;0
a;c;1
a;d;0
a;e;2

anbu23 · March 27, 2007, 4:37am

awk -F";" ' BEGIN {OFS=";"; while( getline < "NEW" ) arr[$1";"$2]=$3; }
{  if( arr[$1";"$2] !~ /^ *$/ ) { $3=arr[$1";"$2]; print; delete arr[$1";"$2] } 
   else print 
}
END { 
for ( key in arr ) {
if( arr[key] !~ /^ *$/ ) { print key";"arr[key] } 
} } ' MASTER

ynixon · March 27, 2007, 5:44am

It is even working faster

Shell_Life · March 27, 2007, 1:53pm

ynixon, here is one more solution:
cut -d";" -f1,2 new_file > temp_new_file
egrep -v -f temp_new_file master_file > temp_egrep_file
cat temp_egrep_file new_file | sort > final_file

reborg · March 27, 2007, 8:14pm

awk 'BEGIN{FS=OFS=";"} {arr[$1";"$] = $3 } END{ for ( i in arr ) {print i,arr}}' master newfile

matrixmadhan · March 27, 2007, 11:17pm

#! /opt/third-party/bin/perl

open(FILE, "<", "m") || die "Unable to open file <$!>\n";

while(<FILE>) {
  chomp;
  @arr = split(/;/);
  $j = "$arr[0];$arr[1]";
  $fileHash{$j} = $arr[2];
}

close(FILE);

open(FILE, "<", "n") || die "Unable to open file <$!>\n";

while(<FILE>) {
  chomp;
  @arr = split(/;/);
  $j = "$arr[0];$arr[1]";
  if( exists $fileHash{$j} ) {
    $fileHash{$j} = $arr[2];
  }
  else {
    $fileHash{$j} = $arr[2];
  }
}

close(FILE);

foreach my $key ( keys %fileHash ) {
  print "$key;$fileHash{$key}\n";
}

exit 0

this should be even more faster

ynixon · March 28, 2007, 5:54am

thank for you all - you are great

I improved the scenario:

the MASTER table contain the last timestamp of the update (field 4)
the NEW table remains the same - without any timestamp
the output timestamp should be updated only if changed

MASTER:
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630

NEW:
a;b;0
a;c;1
a;e;2

the result should be:
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926

matrixmadhan · March 28, 2007, 6:24am

#! /opt/third-party/bin/perl

open(FILE, "<", "m") || die "Unable to open file <$!>\n";

while(<FILE>) {
  chomp;
  @arr = split(/;/);
  $j = "$arr[0];$arr[1]";
  $fileHash{$j} = "$arr[2];$arr[3]";
}

close(FILE);

open(FILE, "<", "n") || die "Unable to open file <$!>\n";

while(<FILE>) {
  chomp;
  @arr = split(/;/);
  $j = "$arr[0];$arr[1]";
  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
  $var = $arr[2] . ";" . (1900 + $year) . $mon . $mday . $hour . $min . $sec;
  if( exists $fileHash{$j} ) {
    @val = split(/;/, $fileHash{$j});
    $fileHash{$j} = $var if( $val[0] != $arr[2] );
  }
  else {
    $fileHash{$j} = $var;
  }
}

close(FILE);

foreach my $key ( keys %fileHash ) {
  print "$key;$fileHash{$key}\n";
}

exit 0

anbu23 · March 28, 2007, 6:24am

awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"} 
{ if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
END{ for ( i in arr ) {print i,arr}}' master new

ynixon · March 28, 2007, 6:48am

the timestamp is not as expected

ynixon · March 28, 2007, 6:49am

can you add a simple sort to the output (within the perl)

anbu23 · March 28, 2007, 7:09am

How do you want add timestamp?

ynixon · March 28, 2007, 7:17am

the timestamp should be updated only if the 3rd parameter is changed

anbu23 · March 28, 2007, 7:20am

$ cat new
a;b;0
a;c;1
a;e;2
$ cat master
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630
$ awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"}
> { if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
> END{ for ( i in arr ) {print i,arr}}' master new
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926

ynixon · March 28, 2007, 7:21am

found the soludtion:

foreach my $key ( sort keys %fileHash ) {
print "$key;$fileHash{$key}\n";
}

Is it right ?

ynixon · March 28, 2007, 7:26am

anbu23:

$ cat new
a;b;0
a;c;1
a;e;2
$ cat master
a;b;0;20060328114630
a;c;0;20060328114630
a;d;0;20060328114630
$ awk -v dt="20070328103926" ' BEGIN{FS=OFS=";"}
> { if( arr[$1";"$2] !~ "^"$3";") arr[$1";"$2]=$3";"( $4 !~ /^ *$/ ? $4 : dt ) }
> END{ for ( i in arr ) {print i,arr}}' master new
a;b;0;20060328114630
a;c;1;20070328103926
a;d;0;20060328114630
a;e;2;20070328103926

sorry you are right , It is OK .
how can I sort the output within the awk ?
:rolleyes: what is faster " awk '{.....}' | sort " or in the awk itself ?