I've looked at the join command which is able to perform what I need on two rows with a common field, however if I have more than two rows I need to join all of them.
Thus I have one file with multiple rows to be joined on an index number:
where the repeated index number may have an arbitrary number of rtext's to be pivoted to columns for that index number. The number suffixes on rtext are only there for clarity.
I've considered using uniq to extracting the repeated text and using the results from that to remove that specific line in awk, then just reiterate this process in a script and use 'join', but I'm convinced there is a complete, easier solution.
updates: I have progressed to a solution with gawk... now I am having a problem with losing the order of rows as columns.
Both solutions seems to work, however there are caveats for each one given I wasn't very specific on the requirements.
new data set:
10a textile
b10b wtf
b20b omg
b20b woot
20b teasdf ha
20b tesdf2 he
30c woot1
30c woot2
radoulov solutions works, though it assumes integers. I like the simplicity so I've tried modifying this code to work for non-integer indices, and it seems to lose the row order significantly(row to column transposed order is OK though):
END {
for (i in _)
printf "%s\n", _
}
{
_[$1] = _[$1] ? _[$1] FS $0 : $0
}
This implies a problem with the string as array indices, but ivhb's solution uses the same feature.
{
if (cnt[$1] ++ == 0)
{
idx[j++] = $1
rst[$1] = $0
} else {
rst[$1] = rst[$1] FS $0
}
}
END {
for (i = 0; i < j; i ++)
print rst[idx]
}
This works OK as modified.
Final script FTW:
{
if (cnt[$1] ++ == 0)
{
idx[j++] = $1
rst[$1] = $0
} else {
vindex = $1
sub(/^[[:alnum:]_]+ \y/, "")
rst[vindex] = rst[vindex] FS $0
}
}
END {
for (i = 0; i < j; i ++)
print rst[idx]
#print idx, rst[idx]
}
The only problem remains is that the space is assumed as the FS in the regexp, is there a way to escape the FS in a regexp?