Join 2 CSVs based on 1 key

Hello,
I have 2 csv as follows:

a.csv:
name,phone,adress,car[drive]
xy,1234,asbd
yz,2134,asbdf   
tc,6789,salkdur    
b.csv:
telphone,vehicle
2134,toyota
6789,bmw
1234,honda

What is need is this:

output.csv:
name,phone,adres,car[drive]
xy,1234,asbd,honda
yz,2134,asbdf,toyota
tc,6789,salkdur,bmw

I am able to join (--right) the 2 files based on the key "phone" using "csvjoin", and then copy/pasting appropriate column and deleting the unnecessary one.

Here is what I used:

csvjoin -c 2, 1 a.csv b.csv >out.csv

But I need some code to do these in 1 shot.

Thanks in advance.

Hi,

Please try this:

awk 'NR==FNR{a[$2]=$0;next} { for(i in a) {  if ( i==$1 ) { print a " ", $2 } }}'  a.csv b.csv | sort -k2

( or )

awk 'NR==1 {print $0} NR==FNR{a[$2]=$0;next} { for(i in a) {  if ( i==$1 ) { print a " ", $2 | "sort -k2" } }}'  a.csv b.csv

Try

join -1 2 -2 1 --header --nocheck file[12]
phone name adress car vehicle
1234 xy asbd honda
2134 yz asbdf  toyota
6789 tc salkdur bmw

Make sure the files are sorted.

1 Like

You say that the input files are CSVs, but that usually means that a single character is used to separate fields in those files. Instead of that, the files you have shown us have varying numbers of spaces between fields (and in the case of one line in a.csv ), some extraneous spaces at the end of the line).

I will make the wild guess that your real input files are intended to be <tab> separated instead of separated by sequences of one or more <space>s. I further assume that you want the output to be in the same order as the sequence of lines that appear in the a.csv input file. If all of these assumptions are correct, the following seems to produce the output you want (except for the line with the trailing <tab> or <space>s in a.csv :

awk '
FNR == NR {
	car[$1] = $2
	next
}
$2 in car {
	$0 = $0 "\t" car[$2]
}
1' b.csv a.csv

which produces the output:

name  phone    adress    car
xy    1234    asbd	honda
yz    2134    asbdf   	toyota
tc    6789    salkdur	bmw

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Don,
The files are CSVs. I manually typed them in here, adn this is not the actual file or data, But just the structure of the file.

You are correct, I do want the preserve the order "a.csv" in "out.csv"

:slight_smile:

If you choose not to show us sample data that reflects the contents of your actual files, the likelihood of getting a suggestion that will work for you diminishes greatly. If what you are given as suggestions don't work for you, we will have to assume that you will be able to modify our suggestions to work with your real data.

We certainly know that you won't complain that our suggestions don't work with your real data if you are unwilling to tell us what your real data looks like.

Make sense. I am editing the original post with Comma Separated Values.

Thanks for your help. :slight_smile:

So maybe something more like:

awk '
BEGIN {	FS = OFS = ","
}
FNR == NR {
	car[$1] = $2
	next
}
$2 in car {
	$4 = car[$2]
}
1' b.csv a.csv

would come closer to what you need???