Mapping two files

I have a large comma delimited file from which i have to extract a specific column, compare it against a static file and replace the column value with the matching column from the static file.

The following are the files:

Main file:
mainfile.csv

test1234,,real,,,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,TRY,10000,15/07...
test1234,,real,,,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,FDG,10000,15/07...
test1234,,real,,,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,HGC,10000,15/07...
test1234,,real,,,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,MDL,10000,15/07...

Static file.
reffile

CODE,PRODUCT
UGE72453,TRY
NGU76290,FDG
HRY82662,HGC
LQY35689,MDL

I now want to replace the bolded 26th column in the main file with the code from the static file
The main file has about 1000 lines but only has the aforementioned [products.
I am using the following code but only a

        OLDIFS=$IFS
        IFS=","
        while read pr1 pr2
        do
                for i in `cat mainfile.csv | awk 'BEGIN { FS = ","; } { print $26}'`
                do
                        if [[ "$i" == *"$pr2"* ]]; then
                                 echo $i $pr1
          #                     sed 's/$i/$pr1/' $i;
                        fi
                done
        done < reffile
        IFS=$OLDIFS

When i execute the above code, only some of the lines get their value replaced.

Can you please advise check for pr2 in the for loop and replace by the corresponding pr1 in from the while? I would appreciate if someone can even advise me on a better way of doing this

Try this:

awk -F, 'FNR==NR{C[$2]=$1;next}{$26=C[$26]}1' OFS=, reffile mainfile.csv
1 Like

Using shell script:-

IFS=","
while read pr1 pr2
do
    sed "s/$pr2/$pr1/g" mainfile.csv > tmp; mv tmp mainfile.csv;
done < reffile

Cheers bipi.

I am looking for a way to map the code to the specific column as there may be other columns with the same characters and i would not like them touched.

For eg:

test1234,,real,,TRY97837290,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,TRY     ,10000,15/07...
test1234,,real,,FDG09789238,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,FDG           ,10000,15/07...
test1234,,real,,HGC02749267,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,HGC          ,10000,15/07...
test1234,,real,,MDL92773671,,open,,,,EU,,,,23/03/2012 12:03:00,,,,GB,,,,1187,0,B,MDL           ,10000,15/07...

I suppose the fault is mine. I wasnt clear with my initial requirement as to why i am choosing this avenue of replacing a column. Apologies

@bipinajith, yuck what happens if "TRY" appears in other fields on the line?

Chubler_XL It will mess up the data :smiley:
I posted a solution which will work based on file content initially posted. I'm sorry if that is not what the requester want :frowning:

I am getting the following error when i execute the command

awk: syntax error near line 1
awk: bailing out near line 1

Solaris? Try nawk instead of awk

Yep. Solaris bash

I used nawk and am not getting any errors now but the string is not being replaced.

Please see my updated mainfile.csv.

The field 26 has blankspaces and thus will not exactly match. Thats why i am using

[[ ]] and comparing with *

---------- Post updated at 01:37 AM ---------- Previous update was at 01:34 AM ----------

Can you kindly explain what you're doing here?

 'FNR==NR{C[$2]=$1;next}{$26=C[$26]}1'

If you don't mind the blank spaces being removed when field 26 is updated, this should work for you

nawk -F, '
FNR==NR{C[$2]=$1;next}
{sub(/ *$/,"",$26);$26=C[$26]}1' OFS=, reffile mainfile.csv

As for the explination:

FNR=Current file row number
NR=Global row number
when FNR==NR first file is being processed (ie reffile)

C[$2]=$1;next
Build a lookup array with refcode (2nd field) as index and replace value (1st field) as data

sub(/ *$/,"",$26)
replace any trailing spaces from field 26 (assumption is they are spaces and not tabs)

$26=C[$26]
lookup array with field 26 value and put data back into $26

The code is working now. Thanks a mil for the explanation. It makes a lot of sense now

---------- Post updated at 05:12 AM ---------- Previous update was at 01:52 AM ----------

I am trying to adapt the code so that the replacement is done only if the pattern in $26 matches the pattern C[$2] in the below code.

Where do i insert an IF loop like if ( $26 == C[$2] )

nawk -F, '
FNR==NR{C[$2]=$1;next}
{sub(/ *$/,"",$26);$26=C[$26]}1' OFS=, reffile mainfile.csv

Again you probably want to trim $26 before doing a compare so:

nawk -F, ' 
    FNR==NR{C[$2]=$1;next} 
    {sub(/ *$/,"",$26);if($26==C[$2])$26=C[$26]}1' OFS=, reffile mainfile.csv
1 Like

No joy there.

When i insert the if condition there, the code doesnt work. Nothing is replaced meaning nothing is matched. C[$2] will have a different meaning since NR is now 2 right?