awk script to perform an action similar to vlookup between two csv files in UNIX

Hi,

I am new to awk/unix and am trying to put together an awk script to perform an action similar to vlookup between the two csv files.

Here are the contents of the two files:

File 1:

Date,ParentID,Number,Area,Volume,Dimensions
2014-01-01,ABC,247,83430.33,857.84,8110.76
2014-01-01,DEF,295,2904.60,139.33,102.95
2014-01-01,ABC,146,11392.70,5017.07,800.96
2014-01-01,XYZ,465,589.76,2.55,1145.33

File 2:

ID,DATE__C,Parent__ID
PPPG000000tXZWXIA4,2014-01-01,DEF
TTTG000000tXZWYIA4,2014-01-01,ABC
UUUG000000tXZWbIAO,2014-01-01,XYZ

I am trying to do a lookup between the two files based on the date and the parent ID. If the date AND The parent ID are the same, the "ID" (first column from in file 2) needs to be copied to file 1 (as the first column). If they are not the same, then a blank should be copied to file 1.

For example, the output would be as follows:

File 1:

ID,Date,ParentID,Number,Area,Volume,Dimensions
TTTG000000tXZWYIA4,2014-01-01,ABC,247,83430.33,857.84,8110.76
PPPG000000tXZWXIA4,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-02-01,ABC,146,11392.70,5017.07,800.96
UUUG000000tXZWbIAO,2014-01-01,XYZ,465,589.76,2.55,1145.33

Can someone please assist?

Really appreciate any help!

Thank you,

Yes, what have you wrote so far?

I tried to do something like this:

awk -F, 'FNR==NR{a[$2]=$1;next}$1 in a {$7=a[$1]}1' OFS="," FS="," file2.csv file1.csv

The output is completely wrong though :frowning:

Have not written any awk scripts before and not sure how I could fix my code..

this should work, not if your headings matched in file 2 the hard-coded assign of ID would not be required:

awk -F, 'FNR==NR{a[$2,$3]=$1;next}
{
  if($1 SUBSEP $2 in a) $1=a[$1,$2] OFS $1
  else $1=OFS $1
  if(FNR==1) $1="ID"
}1' OFS="," file2.csv file1.csv

Another approach:

awk -F, 'NR==FNR {a[$2,$3]=$1; next} FNR==1 {$1="ID" FS $1} a[$1,$2] {$1=a[$1,$2] FS $1}1' OFS=, file2 file1

@Franklin52 - I considered that approach as well, however it doesn't put blank entry in field #1 when lookup fails:

1 Like

@Franklin52 and @Chubler_XL thanks for your replies!

File 1 currently does not have a ID column. This ID column has to be inserted. @Franklin52, when I try tried your approach, the outout is exactly file1 without the ID column inserted?

@Chubler_XL, when I try your approach, the output is as follows:

ID,Date,ParentID,Number,Area,Volume,Dimensions
,2014-01-01,ABC,247,83430.33,857.84,8110.76
,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-01-01,ABC,146,11392.70,5017.07,800.96
,2014-01-01,XYZ,465,589.76,2.55,1145.33

Am i missing anything?

Thanks again guys!

My guess is that file2 has some spaces at the end or is in DOS format.

try dos2unix file2.csv or sed -i 's/[[:space:]]*$//' file2.csv

1 Like

@Chubler_XL, I tried dos2unix file2.csv and sed -i 's/[[:space:]]*$//' file2.csv but still receiving the same output..

---------- Post updated at 09:46 AM ---------- Previous update was at 09:22 AM ----------

@Chubler_XL, my apologies!!! you were totally right!! the code worked once I changed the file format :slight_smile:

thanks!! :slight_smile:

---------- Post updated at 12:08 PM ---------- Previous update was at 09:46 AM ----------

@ Chubler_XL, hey guys, just noticed a small issue with the headers with the code..

ID,ParentID,Number,Area,Volume,Dimensions
TTTG000000tXZWYIA4,2014-01-01,ABC,247,83430.33,857.84,8110.76
PPPG000000tXZWXIA4,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-02-01,ABC,146,11392.70,5017.07,800.96
UUUG000000tXZWbIAO,2014-01-01,XYZ,465,589.76,2.55,1145.33

looks like it has just replaced the "Date" with "ID" which leaves the last column header blank..

Make the row 1 assignment

$1="ID" FS $1

(as Franklin52 proposed).