Hi
I am not the best scripter in the world and have run into a issue which you might be able to guide me on...
I have two files.
File1 :
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
File2:
C345, valueX, valueY
D456, valueX, valueY
I basically want to match the lines in the file1 and file2 based on the value of the first col in both files. If these values match then replace the entry in file1 with the entry in file2
Output:
A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB
Please note file1 is a million lines long so loops and greps take a long time!
Thanks a mil in advance
KB
birei
January 9, 2012, 5:12am
2
Hi luckycharm,
One way using perl:
$ cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
$ cat file2
C345, valueX, valueY
D456, valueX, valueY
$ cat script.pl
use warnings;
use strict;
die qq[Usage: perl $0 <file1> <file2>\n] unless @ARGV == 2;
my %file2_key;
open my $file2, "<", pop @ARGV or die qq[ERROR: Cannot open input file\n];
while ( <$file2> ) {
my @f = split /\s*,\s*/, $_, 2;
$file2_key{ $f[0] } = $_;
}
while ( <> ) {
my @f = split /\s*,\s*/, $_, 2;
if ( exists $file2_key{ $f[0] } ) {
print $file2_key{ $f[0] };
next;
}
print;
}
$ perl script.pl file1 file2
A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB
Regards,
Birei
here is your code
input:
# cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
# cat file2
C345, valueX, valueY
D456, valueX, valueY
code:
while read line
do
column1=$( echo $line | awk -F' ' '{print $1}'| sed 's/^[ \t]*//g;s/[ \t]*$//g;s/,$//g' )
linematch=$( cat file2 | grep $column1 )
if [[ "$linematch" != "" ]] ; then
lineno=$( cat file1 | grep -n $column1 | cut -d ':' -f 1 )
echo "`sed "${lineno}d" file1`" > file1
input=$( cat file2 | grep -i $column1 )
echo "`sed $lineno" i\$input" file1`" > file1
fi
done < file1
output after execution:
cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB
with awk
code:
awk -F, 'NR==FNR{a[$1]=$0;next;}a[$1]{$0=a[$1]}1' file2 file1
Jairaj
January 9, 2012, 11:06pm
5
Hi Agrawal,
Please explain how the above awk command works.