Join 4 files on first three columns

Hi,

Can someone suggest me on how to join 4 files by comparing the first three columns?

---------- Post updated at 03:56 PM ---------- Previous update was at 03:42 PM ----------

Hope it helps someone.

I was looking online for a solution and on stackoverflow, I found a solution and tried the following command

awk '{o1=$1;o2=$2;o3=$3; $1=$2=$3="";gsub(" +","");_[o1 FS o2 FS o3]=_[o1 FS o2 FS o3] FS $0} END{ for(i in _) print i,_ }' file1 file2

My input files are this way

input1:

a 1 2 3 x
b 3 4 5 y
c 6 7 8 z

input2:

a 1 2 4 y
b 3 4 6 u
c 8 9 7 z

output:

c 8 9  7z
b 3 4  5y 6u
a 1 2  3x 4y
c 6 7  8z

---------- Post updated at 03:58 PM ---------- Previous update was at 03:56 PM ----------

Can the above code be modified for four files?

please provide the sample input and output required for 4 file

Why don't you add the filenames on the command line and check for yourself? :slight_smile:

1 Like

...and why don't you use join to join files? It was meant for exactly this purpose.

Of course it is possible to solve any problem in arbitrarily complex form using tools not suited for the purpose. Still, this begs the question: is there any point to it? Using "awk" for what "join" was intended to do is like using poison ivy instead of toilet paper: less effective while being probably dangerous and perhaps cumbersome.

I hope this helps.

bakunin

Hi elixir,

Thanks for the suggestion. Will try it now.

---------- Post updated at 09:17 AM ---------- Previous update was at 09:17 AM ----------

That was a very funny comparison between poison ivy to toilet paper. I liked it :slight_smile:

I am comparing on three columns in each of the file. Join is doing it only on one column. So, I was forced to use awk. Any further suggestions are appreciated.

---------- Post updated at 09:24 AM ---------- Previous update was at 09:17 AM ----------

@raj_saini20,

Here is wat I did? I am getting my expected output. But, how do I make it tab separated or sorted? Can u enhance it?

cat 1

a 1 2 3 x
b 3 4 5 y
c 6 7 8 z

cat 2

a 1 2 4 y
b 3 4 6 u
c 8 9 7 z

cat 3

a 1 2 7 g
b 3 4 9 k
c 45 67 9 m

cat 4

a 1 2 77 m
b 3 4 7 j
c 34 56 8 p
awk '{o1=$1;o2=$2;o3=$3; $1=$2=$3="";gsub(" +","");_[o1 FS o2 FS o3]=_[o1 FS o2 FS o3] FS $0} END{ for(i in _) print i,_ }' 1 2 3 4

generated output

c 8 9  7z
b 3 4  5y 6u 9k 7j
c 45 67  9m
a 1 2  3x 4y 7g 77m
c 6 7  8z
c 34 56  8p

@elixir, thanks for a beautiful suggestion my friend. It worked.

Actually join can use arbitrary delimiters, so it is possible to use your favourite text filter (awk, sed, ed, tr, ....) to put these in (alternatively remove the respective blanks to make it one key) and remove them (alternatively put them back in) after the join.

You could have come up with this yourself, couldn't you?

bakunin

1 Like

You are right, I got you.

Merge first three columns as one key, run join and after I see the output, re split them back with desired delimiters. Thanks for another valuable thought.
I can do that. Thanks for the confidence donation. :slight_smile: