I want to merge the columns together based on Column 1.
Also would like to know if I can merge these together if I had more than one column to match between the two files.
If file one isn't too large, then this should work
# single pass across each file, but requires the entire first file
# to be held in memory which might not be realistic.
# order is preserved based on file2
awk '
NR == FNR { cache[$1] = $0; next; }
$1 in cache {
printf( "%s", cache[$1] );
$1 = "";
print;
}
' file1 file2 >output
If file1 is large (i.e. it's not practical to cache it in memory), then this is one way. May not be the most efficent, but it should work. The order of the output is sorted by field1.
# multiple passes across the data, but memory requirement is eliminated
# order of file2 is not preserved.
(
sed 's/^/a /' file1
sed 's/^/b /' file2
) | sort -k 2n,2 -k 1,1 awk '
$1 == "a" {
x = $2;
$1 = "";
cache = $0;
next;
}
$2 == x {
$1 = $2 ="";
printf( "%s%s\n", substr( cache, 2 ), $0 );
}
'
You could do this without the seds, and depend on the number of columns to determine if an unmatched pair exists, but this works without having to know the exact layout of either file, other than the desired column to compare.