To be honest, I am really impressed and amazed at the pace I find solutions for un-solved coding mysteries in this forum.
I have a file like this
input1.txt
x y z 1 2 3
a b c 4 -3 7
k l m n 0 p
1 2 a b c 4
input2
x y z 9 0 -1
a b c 0 6 9
k l m 8 o p
1 2 a f x 9
Output
x y z 1 2 3 9 0 -1
a b c 4 -3 7 0 6 9
k l m n 0 p 8 o p
1 2 a b c 4 f x 9
The number of columns might change. To make it simple, my final output files should contain the common columns between two files and their respective varying columns side by side.
I tried using the (a[$1}=$2; next) method in awk, but it doesn't work. Can a join do this?
I tried both of them. But, the awk script works only for matching first three columns and prints 4th, 5th and 6th columns. But, what I need is to match the first three columns and print all the records no matter how many they are.
The cut command prints each record against each record which was not helpful.
What exactly is the output when you run the commands? Can you please post it. Also, what is your system?
It works on Linux with GNU awk and cut:
$ cat 1.txt
x y z 1 2 3 4 5 6 7 8 9
a b c d e f 9 7 8 9 90 1
$ cat 2.txt
x y z 2 4 5 6 7 8 1 0 0
a b c g h i 9 3 1 4 5 6
$ awk 'NR==FNR{a[$1,$2,$3]=$0; next} {printf a[$1,$2,$3]; $1=$2=$3=""; gsub(/^ */," ",$0); print $0}' 1.txt 2.txt
x y z 1 2 3 4 5 6 7 8 9 2 4 5 6 7 8 1 0 0
a b c d e f 9 7 8 9 90 1 g h i 9 3 1 4 5 6
$ cut -d" " -f4- 2.txt | paste -d" " 1.txt -
x y z 1 2 3 4 5 6 7 8 9 2 4 5 6 7 8 1 0 0
a b c d e f 9 7 8 9 90 1 g h i 9 3 1 4 5 6
cp first.txt > out.txt
for i in *.txt ; do
[[ "$i" = first.txt ]] && continue
cut -d" " -f4- "$i" | paste -d" " out.txt - > tmpFile
mv tmpFile out.txt
done
But this the most fun way of doing it -- recursive way:
#!/bin/bash
function myPaste {
if [[ $# -le 1 ]] ; then
cat $1
return
fi
arg=$1
shift
myPaste $* | cut -d" " -f4- | paste -d" " $arg -
}
myPaste $*
You would save this as myPaste.sh, make it executable and then call it:
./myPaste.sh *.txt
Note that all these above are dumb, they do not check whether the first 3 columns are the same, they just strip the first three column no matter what. The awk solution on 2 files with [$1,$2,$3] works on unsorted files also, and actually matches the first three fields. This could be extended to n files, would just need one extra loop in the END to sort the associative array.