Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script.
I need your help to solve my problem.
Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file.

File1:

SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07
/home/intannf/foto/IMG_0677.JPG,2015:02:17 10:47:23
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28
/home/intannf/foto/IMG_0759.JPG,2015:02:17 11:49:09
/home/intannf/foto/IMG_0689.JPG,2015:02:17 10:57:05
/home/intannf/foto/IMG_0709.JPG,2015:02:17 11:13:31
/home/intannf/foto/IMG_0771.JPG,2015:02:17 11:51:22
/home/intannf/foto/IMG_0766.JPG,2015:02:17 11:50:08
/home/intannf/foto/IMG_0745.JPG,2015:02:17 11:36:01

File2:

2015/02/17,01:56:53,0.,-7.77075,110.35782,6.38 
2015/02/17,10:47:23,0.,-7.77075,110.35782,6.34 
2015/02/17,11:49:09,0.,-7.77075,110.35782,6.31 
2015/02/17,11:49:53,0.,-7.77075,110.35782,6.25 
2015/02/17,11:51:22,0.,-7.77075,110.35782,6.20 
2015/02/17,11:53:25,0.,-7.77075,110.35782,6.19 
2015/02/17,11:55:11,0.,-7.77075,110.35782,6.19 
2015/02/17,11:56:35,0.,-7.77075,110.35782,6.80 

Output required:

SourceFile,AirSpeed,GPSLatitude,GPSLongitude,Altitude
/home/intannf/foto/IMG_0677.JPG,0.,-7.77075,110.35782,6.34 
/home/intannf/foto/IMG_0759.JPG,0.,-7.77075,110.35782,6.31 
/home/intannf/foto/IMG_0771.JPG,0.,-7.77075,110.35782,6.20 

Note:
The matched columns of both files are not written in the 'new' csv file i want to create.
In the file1, it has header file, while file2 doesn't have any header file. But, in the new csv file (output required), I want to create header file of each column.
and also, time and date of file1 is in same column (one column and separated by space), while time and date of file2 is in different different column (separated by comma).

Would you help me how to solve it? I am not sure how the shell script would be. But I am still learning. I really need your help. Sorry for my bad english, I hope you understand what I mean. Thank you.

Regards,
Intan

something to start with.
awk -F',' -f refrain.awk file1 file2 where refrain.awk is:

BEGIN {
  OFS=","
}
FNR==NR &&  FNR>1 {f1[$NF]=$1; next}

{
  idx=($1 " " $2)
  gsub("/", ":",idx)
  if (idx in f1)
    print f1[idx], $2,$3,$4,$5
}
1 Like

Hi, vgersh99
Thanks for your help! I have tried to run your code and i think it works. But, i don't know where the output is saved to. I just can't find new csv file of the output. Would you mind to explain it to me?
Thank you.

Intan

It's written to stdout. Redirect it to your new .csv file.

Hi, RudiC
Would you mind to explain it to me?

Intan

Read the man page of your respective shell, chapter "Redirection".

e.g.

awk -F',' -f refrain.awk file1 file2 > outfile

I have tried the code and edited some part, like this:

#! /bin/bash

awk -F, 'BEGIN {OFS=","} FNR==NR && FNR>0 {f1[$NF]=$1;next} {idx=($1 " " $2);gsub("/", ":",idx);if (idx in f1);print f1[idx],$2,$3,$4,$5}' file1.csv file2.csv > file3.csv

echo  "SourceFile,airspeed,gpslatitude,gpslongitude,altitude" > header.csv
cat header.csv file3.csv > file4.csv

But i have problems here. I think those code just rewrite the contents of file2.csv, not matching data of both. What i want is to match data of date and time of both csv file and write the output (matched data) in another csv file.

Note that i am using mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

Would somebody mind to help me please? I really need helps. It is so urgent. Thank you.

Regards,
Intan

It looks like vgersh99 had a simple off by one error in the awk print statement's arguments that you have copied into your current script. Try the following instead:

#!/bin/bash
awk '
BEGIN {	FS = OFS = ","
	print "SourceFile", "AirSpeed", "GPSLatitude", "GPSLongitude", "Altitude"
}
FNR == NR && FNR > 1 {
	f1[$NF] = $1
	next
}
{	idx = $1 " " $2
	gsub("/",  ":", idx)
	if(idx in f1)
		print f1[idx], $3, $4, $5, $6
}' file1.csv file2.csv > file4.csv

With file1.csv containing what you said was in file1 and file2.csv containing what you said was in file2 in post #1 in this thread, this script writes the following into file4.csv :

SourceFile,AirSpeed,GPSLatitude,GPSLongitude,Altitude
/home/intannf/foto/IMG_0677.JPG,0.,-7.77075,110.35782,6.34 
/home/intannf/foto/IMG_0759.JPG,0.,-7.77075,110.35782,6.31 
/home/intannf/foto/IMG_0771.JPG,0.,-7.77075,110.35782,6.20 

which seems to be what you wanted, except there are some trailing spaces caused by copying trailing spaces from your sample file2 .

This should work with awk or gawk or mawk on most systems. If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

You can turn this into a 1-liner if you want to, but I prefer readable code.

1 Like

Hi, Don Cragun

Thanks for your help. I have tried to run your code and it works successfully! Thank you so much!:smiley:

Regards,
Intan