Comparing 2 csv files and matching content

Hello,

I have the following problem:

There are two csv files

csv-file #1:

aaa1, aaa2, ... aaan
aaa1, bbb2, ... bbbn
aaa1, ccc2, ... cccn
bbb1, bbb2, ... bbbn
...
zzz1, zzz2, ... zzzn

csv-file #2:

aaa1, matchvalue1
ccc1, matchvalue2
bbb1, matchvalue3
...

Each line of csv-file #1 shall get appended the crresponding "matchvalue" of csv-file #2,
e.g. if a line contains the reference value aaa1, this line shall be appended by matchvalue1 of csv-file #2.

Finally csv-file #2 would look like this:

aaa1, aaa2, ... aaan, matchvalue1
aaa1, bbb2, ... bbbn, matchvalue1
aaa1, ccc2, ... cccn, matchvalue1
bbb1, bbb2, ... bbbn, matchvalue3
...

In csv-file #1 the "reference values" - in my example the first element of the line, may occur in more than one line.

In the example it is the first element of each line.
How would the script look like if it may be at any other particular position of the line, e.g. always the third element of each line ?

Can anyone please help me and submit a solution ?

Thanks so much !

Regards
G.

U can start with:

awk 'NR==FNR{a[$1]=$2;next}{$0=$0","a[$1]}1' FS=',' csv-file2 csv-file1

Thanks for the reply, Klashxx.
But I get back an error message:

awk: Syntax error next to line 1
awk: Bailing out near line 1

I entered the sequence (with fitting filenames) at the prompt, so did not write it to a file and start that, right ?

Both files are in the same directory. Same error also when I add the path.

What could be the reason ?
OS is Solaris 9 and awk is - of course - on the machine.

Are there versions of awk that do not understand all command switches ?

Does anyone knwo an other solution for my problem maybe ?

Thanks in advance.

try '/bin/nawk' or '/usr/xpg4/bin/awk' instead of old/broken '/bin/awk'

With join:

 sort file2 | join -t, -11 -21 file1 -

Thanks vgersh99 !
Actually I was just asking myself if there's an alternative, when I found that "nawk", which I didn't know before.
Yes, it was on the machine :slight_smile:
And it worked perfectly.
Was about to close this thread when I found your kind hint, which crossed with my own investigations.

So, thanks to both of you, Klashxx and vgersh99. You made my day.

I think we should compile kind of "Cookbook For Solving the Most Common Issues Using Shell Tools".

I think the problem I needed a solution for is a very common one: matching various "check lists" and adding one (or more) items to each line of one file if there's a match on the other.

And those "crude" shell tools work so damned fast...

Again thanks so much for helping me

---------- Post updated at 10:12 PM ---------- Previous update was at 10:09 PM ----------

Hi once more,
how would your solution have to be modified in case the "reference column" of csvfile #1 is not the first column but e.g. the third column ?

I tried with

awk 'NR==FNR{a[$1]=$2;next}{$0=$0","a[$3]}1' FS=',' csv-file2 csv-file1

but it found only 1 hit and other lines just ended on a colon.

---------- Post updated at 10:58 PM ---------- Previous update was at 10:13 PM ----------

Hi once more,
how would your solution have to be modified in case the "reference column" of csvfile #1 is not the first column but e.g. the third column ?

I tried with

awk 'NR==FNR{a[$1]=$2;next}{$0=$0","a[$3]}1' FS=',' csv-file2 csv-file1

but it found only 1 hit and other lines just ended on a colon.

Can you post samples of the real input files and desired output?

This works for me:

> cat csv-file1
1111,1111,aaa1, aaa2, aaan
2222,2222,aaa1, bbb2, bbbn
3333,3333,aaa1, ccc2, cccn
4444,4444,bbb1, bbb2,  bbbn
5555,5555,zzz1, zzz2,  zzzn
> cat csv-file2
aaa1, matchvalue1
ccc1, matchvalue2
bbb1, matchvalue3
> nawk 'NR==FNR{a[$1]=","$2;next}{$0=$0""a[$3]}1' FS=',' csv-file2 csv-file1
1111,1111,aaa1, aaa2, aaan, matchvalue1
2222,2222,aaa1, bbb2, bbbn, matchvalue1
3333,3333,aaa1, ccc2, cccn, matchvalue1
4444,4444,bbb1, bbb2,  bbbn, matchvalue3
5555,5555,zzz1, zzz2,  zzzn