Replace line in file with line in another file based on matching string

Hi

I am not the best scripter in the world and have run into a issue which you might be able to guide me on...

I have two files.

File1 :

A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB

File2:

C345, valueX, valueY
D456, valueX, valueY

I basically want to match the lines in the file1 and file2 based on the value of the first col in both files. If these values match then replace the entry in file1 with the entry in file2

Output:

A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB

Please note file1 is a million lines long so loops and greps take a long time!

Thanks a mil in advance

KB

Hi luckycharm,

One way using perl:

$ cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
$ cat file2
C345, valueX, valueY
D456, valueX, valueY
$ cat script.pl
use warnings;
use strict;

die qq[Usage: perl $0 <file1> <file2>\n] unless @ARGV == 2;

my %file2_key;

open my $file2, "<", pop @ARGV or die qq[ERROR: Cannot open input file\n];

while ( <$file2> ) {
        my @f = split /\s*,\s*/, $_, 2;
        $file2_key{ $f[0] } = $_;
}

while ( <> ) {
        my @f = split /\s*,\s*/, $_, 2;
        if ( exists $file2_key{ $f[0] } ) {
                print $file2_key{ $f[0] };
                next;
        }

        print;
}
$ perl script.pl file1 file2
A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB

Regards,
Birei

here is your code :slight_smile:
input:

# cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
# cat file2
C345, valueX, valueY
D456, valueX, valueY

code:

while read line
do
        column1=$( echo $line | awk -F' ' '{print $1}'| sed 's/^[ \t]*//g;s/[ \t]*$//g;s/,$//g' )
        linematch=$( cat file2 | grep $column1 )
        if [[ "$linematch" != "" ]] ; then
                lineno=$( cat file1 | grep -n $column1 | cut -d ':' -f 1 )
                echo "`sed "${lineno}d" file1`" > file1
                input=$( cat file2 | grep -i $column1 )
                echo "`sed $lineno" i\$input" file1`" > file1
        fi
 
done < file1

output after execution:

 cat file1
A123, valueA, valueB
B234, valueA, valueB
C345, valueX, valueY
D456, valueX, valueY
E567, valueA, valueB
F678, valueA, valueB

with awk
code:

awk -F, 'NR==FNR{a[$1]=$0;next;}a[$1]{$0=a[$1]}1' file2 file1

Hi Agrawal,

Please explain how the above awk command works.

  1. awk first buffer the file2 data in the form of key, value pair (associative array).
    and I have used key as col1 and value as entire line.
  2. then awk will check whether array key is exist for column 1 in file1 (joining columns)
    if yes, then replace file1 current value (i.e $0) with current value in array, if not skip it.
  3. finally print the data.