Matching column value from 2 different file using awk and append value from different column

Hi,

I have 2 csv files.

a.csv

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

-------------------------------------------------------------

b.csv

"10.200.6.69_NE=3148723,S=3,CP=-1,PP=16||1|",14SKA043_1352_UNSUTARA_RTN950A_IDU1-3-ISV3-1(TO_14SKA015_UNS),SURAKARTA
"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|",20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),No_Location
"10.200.6.69_NE=3147673,S=1,CP=-1,PP=16||1|",14PWT044_1587_TELUK_RTN310-1-SHXA2-1(TO_14PWT010_PURWOKERTOSLT),BANYUMAS

both file as thousand of lines.

I need to match column 2 from both file, and if they match, append the value of column 1 from file b to file a so it becomes like this

file c

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012,"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"

I have tried the following but to no avail

awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] b.csv a.csv}'

as I understand it, it should read column 2 of file b and store column 1 into the associated array of a, and then match with column 2 of file a, and if match was found, print the row of column a, and also append the array a value to the end of row. I now think I am definitely wrong :(. Sorry really newbie here. There is no error but as I tested it, it did print and append the output but it appends at the middle of the row and overriding the row value and also for all line in a.csv, it appends the same value, as if it did not match at all.

awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] }' b.csv a.csv > c.csv


HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location

Many thanks for the kind helps gents!

It's by sheer coincidence that you get anything from that code snippet. The field separator should be set to comma, but due to an incorrect syntax it isn't. The a array holds one single element only, indexed by the empty string, and is overwritten repeatedly until the first file's last line is encountered. As $13 doesn't exist in file 2 and thus is empty, that only element of a is printed. You seem to want to ignore the commas inside the double quotes, so the entire quoted string will be considered field 1? If so, they need to be replaced upfront, and then converted back after the operations.
Would this come close to what you need:

awk -F\" '{for (i=2; i<=NF; i+=2) gsub (/,/, "\001", $i)} 1' OFS=\" file2 |
awk -F',' 'NR==FNR {a[$2]=$1; next } {print $0, a[$2] }' - file1 |
awk -F\" '{for (i=2; i<=NF; i+=2) gsub ("\001", ",", $i)} 1' OFS=\" 
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012 
1 Like

Hi, Yes!, I realized this today that its the double quote causing the problem. I used your solution and got this result

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=13,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 
"10.200.6.69_NE=3149110,S=11,CP=-1,PP=16||1|"195.675,20LMG027-20LMG011,99.9995,202.6952012

The original line was like this

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

It seems to have appended to the beginning of '2nd,4th and 6th' line rather than at end of line 1,2 and 3

I am very sure when I vi this file, theres only 3 line. Is awk treating it as 6 line instead as it appends the array hash value to the end of the first line, it moved to beggining of 2nd line? Im confused :confused:

Your result is a bit surprising as the proposal has been tested and found to work, as you can see in post#2.
What you show in post#3 lets me suspect you have DOS line terminators (^M, 0x0D, \r, <CR>) in your file. Eliminate those and try again.

1 Like

Yes that was it!. Thank you sooo much!.:D:o. Would you be so kind to explain the command please whenever you're free :). I would really highly appreciate it!. :slight_smile:

awk -F\" '                                      # set field sep. to "
        {for (i=2; i<=NF; i+=2) \               # in every second (= inside quotes) field
         gsub (/,/, "\001", $i)                 # replace the comma with ^A (a placeholder)
        }
1                                               # print the modified line
' OFS=\" file2 |                                # set output field sep. to " (important!)
awk -F',' '                                     # set FS to comma
NR==FNR {a[$2] = $1                             # save $1 (text to append) to array indexed by $2
         next                                   # stop processing the actual line; goto next
        }
        {print $0, a[$2] }                      # print file2's line, followed by the app. txt, if exists 
' - file1 |                                     # work on input stream, i.e. stdin ("-", piped in) and file1
awk -F\" '                                      # undo comma replacements as in step 1
        {for (i=2; i<=NF; i+=2) \
                 gsub ("\001", ",", $i)
        }
1
' OFS=\" 
1 Like

Sorry quick question. Somehow, there are some rows which has double quotes. Seems like the ones with double quotes was ignored. I just noticed. Is there a way to also handle double quotes?

Sorry and many thanks! :o

WHAT? That pipe is DESIGNED to handle double quotes...

Please become accustomed to provide decent context info of your problem.
It is always helpful to carefully and detailedly phrase a request, and to support it with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.