awk -f '{
a[$1]=$1; for (i=2; i<=NF; i++) {ar[$1,i]=ar[$1,i] $i; if (NF>mx) mx=NF}
}
END {
for (b in a) {
printf b " ";
for (c=2; c<=mx; c++) printf ar[b,c] " ";
print "";
}
}' infile
Ouch! I missed the 400,000 columns note. But, I don't see anything in the POSIX Standards or the Single UNIX Specifications that allow implementations to limit the number of fields in a line. And, if the input files are sorted, it is grossly inefficient to try to read the entire input file (of at least 8,000,000,000 bytes) into memory rather than sorting the input file first and using your method. But, of course, you can't use the standard sort utility to sort a file that has lines that are at least 800,000 bytes long.
All of the standard utilities that work on text files (including awk, the editors, grep, and sort) are only defined to work on text files (which limits a line to LINE_MAX bytes per line). LINE_MAX can be as small as 2,048. I don't think I've ever used a system with LINE_MAX greater than 20,480.
The only text processing utilities in the standards that are required to work on files that would be text files if line lengths were unlimited are: cut, fold, paste, and the shell. And, for the shell it is only the length of command lines that are unlimited (the shell built-in utilities that read and write files, such as read and printf, are only defined to work if the input or output is a text file).
It would be possible to use cut to create thousands (or tens of thousands or hundreds of thousands, depending on expected field widths after merging lines) of text files that can be processed with awk and then use cut again to get rid of the first field in each file, except the first one, and then use paste to put the results back together. But, having created this file with some lines that are at least 1.2Mb long (400,000 fields * (2 bytes/joined field + 1 byte separating fields)), there isn't much you can do with it.
OSX 10.8:
2048
400000
CentOS 6.3
2048
400000
AIX7:
2048
400000
Solaris 10
2048
/usr/xpg4/bin/awk: line 0 (NR=1): Record too long (LIMIT: 19999 bytes)
0
HPUX 11i:
2048
awk: Input line 1 2 3 4 5 6 7 8 9 10 cannot be longer than 3,000 bytes.
The source line number is 1.
0