awk script to perform an action similar to vlookup between two csv files in UNIX

Prit_Siv · September 3, 2014, 1:09pm

Hi,

I am new to awk/unix and am trying to put together an awk script to perform an action similar to vlookup between the two csv files.

Here are the contents of the two files:

File 1:

Date,ParentID,Number,Area,Volume,Dimensions
2014-01-01,ABC,247,83430.33,857.84,8110.76
2014-01-01,DEF,295,2904.60,139.33,102.95
2014-01-01,ABC,146,11392.70,5017.07,800.96
2014-01-01,XYZ,465,589.76,2.55,1145.33

File 2:

ID,DATE__C,Parent__ID
PPPG000000tXZWXIA4,2014-01-01,DEF
TTTG000000tXZWYIA4,2014-01-01,ABC
UUUG000000tXZWbIAO,2014-01-01,XYZ

I am trying to do a lookup between the two files based on the date and the parent ID. If the date AND The parent ID are the same, the "ID" (first column from in file 2) needs to be copied to file 1 (as the first column). If they are not the same, then a blank should be copied to file 1.

For example, the output would be as follows:

File 1:

ID,Date,ParentID,Number,Area,Volume,Dimensions
TTTG000000tXZWYIA4,2014-01-01,ABC,247,83430.33,857.84,8110.76
PPPG000000tXZWXIA4,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-02-01,ABC,146,11392.70,5017.07,800.96
UUUG000000tXZWbIAO,2014-01-01,XYZ,465,589.76,2.55,1145.33

Can someone please assist?

Really appreciate any help!

Thank you,

vbe · September 3, 2014, 1:14pm

Yes, what have you wrote so far?

Prit_Siv · September 3, 2014, 1:58pm

I tried to do something like this:

awk -F, 'FNR==NR{a[$2]=$1;next}$1 in a {$7=a[$1]}1' OFS="," FS="," file2.csv file1.csv

The output is completely wrong though

Have not written any awk scripts before and not sure how I could fix my code..

Chubler_XL · September 3, 2014, 5:09pm

this should work, not if your headings matched in file 2 the hard-coded assign of ID would not be required:

awk -F, 'FNR==NR{a[$2,$3]=$1;next}
{
  if($1 SUBSEP $2 in a) $1=a[$1,$2] OFS $1
  else $1=OFS $1
  if(FNR==1) $1="ID"
}1' OFS="," file2.csv file1.csv

Franklin52 · September 3, 2014, 5:14pm

Another approach:

awk -F, 'NR==FNR {a[$2,$3]=$1; next} FNR==1 {$1="ID" FS $1} a[$1,$2] {$1=a[$1,$2] FS $1}1' OFS=, file2 file1

Chubler_XL · September 3, 2014, 5:57pm

@Franklin52 - I considered that approach as well, however it doesn't put blank entry in field #1 when lookup fails:

Prit_Siv · September 3, 2014, 9:53pm

@Franklin52 and @Chubler_XL thanks for your replies!

File 1 currently does not have a ID column. This ID column has to be inserted. @Franklin52, when I try tried your approach, the outout is exactly file1 without the ID column inserted?

@Chubler_XL, when I try your approach, the output is as follows:

ID,Date,ParentID,Number,Area,Volume,Dimensions
,2014-01-01,ABC,247,83430.33,857.84,8110.76
,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-01-01,ABC,146,11392.70,5017.07,800.96
,2014-01-01,XYZ,465,589.76,2.55,1145.33

Am i missing anything?

Thanks again guys!

Chubler_XL · September 3, 2014, 10:18pm

My guess is that file2 has some spaces at the end or is in DOS format.

try dos2unix file2.csv or sed -i 's/[[:space:]]*$//' file2.csv

Prit_Siv · September 4, 2014, 12:08pm

@Chubler_XL, I tried dos2unix file2.csv and sed -i 's/[[:space:]]*$//' file2.csv but still receiving the same output..

---------- Post updated at 09:46 AM ---------- Previous update was at 09:22 AM ----------

@Chubler_XL, my apologies!!! you were totally right!! the code worked once I changed the file format

thanks!!

---------- Post updated at 12:08 PM ---------- Previous update was at 09:46 AM ----------

@ Chubler_XL, hey guys, just noticed a small issue with the headers with the code..

ID,ParentID,Number,Area,Volume,Dimensions
TTTG000000tXZWYIA4,2014-01-01,ABC,247,83430.33,857.84,8110.76
PPPG000000tXZWXIA4,2014-01-01,DEF,295,2904.60,139.33,102.95
,2014-02-01,ABC,146,11392.70,5017.07,800.96
UUUG000000tXZWbIAO,2014-01-01,XYZ,465,589.76,2.55,1145.33

looks like it has just replaced the "Date" with "ID" which leaves the last column header blank..

RudiC · September 4, 2014, 3:25pm

Make the row 1 assignment

$1="ID" FS $1

(as Franklin52 proposed).