Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ?
input
a1 0 a1 1 a1 1 a3 1 b2 1
a2 1 a4 1 a2 1 a4 1 c4 1
a3 1 d1 1 a3 1 b1 1 d1 1
a4 1 c4 1 b2 1
b1 1
b2 1
c4 1
d1 1
output
a1 0 1 1 0 0
a2 1 0 1 0 0
a3 1 0 1 1 0
a4 1 1 0 1 0
b1 1 0 0 1 0
b2 1 0 0 1 1
c4 1 0 1 0 1
d1 1 1 0 0 1
mute@clt:~/temp/quincyjones$ ./script
a1 0 1 1 0 0
a2 1 0 1 0 0
a3 1 0 1 1 0
a4 1 1 0 1 0
b1 1 0 0 1 0
b2 1 0 0 1 1
c4 1 0 1 0 1
d1 1 1 0 0 1
#!/bin/sh
awk 'BEGIN { FS=OFS="\t"; }
{
if (NF > maxNF) maxNF=NF
for (i=1;i<=NF;i+=2) {
if (length($i) < 1) continue #skip blanks
#new, map it
if (!name_to_idx[$i]) {
idx_to_name[++idx]=$i
name_to_idx[$i]=idx
}
arr[$i,i]=$(i+1) #make matrix
}
}
END {
for (i=1;i<=idx;i++) {
printf("%s",idx_to_name)
for (j=1;j<maxNF;j+=2)
printf("%s%d", OFS, arr[idx_to_name,j])
print "" #newline
}
}' file | sort -k1,1
1 Like
Yoda
January 30, 2013, 4:52pm
3
Here is another dirty approach using join
command.
Since join
command cannot accept more than 2 files at a time, we have to split the orignal file and perform the join
operation:
awk -F'\t' '{ print $1,$2 > "file1"; print $3,$4 > "file2"; print $5,$6 > "file3"; print $7,$8 > "file4"; print $9,$10 > "file5"; }' filename
join -a1 -1 1 -2 1 -o 1.1 1.2 2.2 -e "0" file1 file2 > j1
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 2.2 -e "0" j1 file3 > j2
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 2.2 -e "0" j2 file4 > j3
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 2.2 -e "0" j3 file5
Assuming the columns are sorted like in the sample, try:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' file
2 Likes
Yoda
January 30, 2013, 6:02pm
5
Scrutinizer, I see some minor discrepancy in the o/p:
$ awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' filename
a1 0 1 1 0 0
a2 1 0 1 0 0
a3 1 0 1 1 0
a4 1 1 0 1 0
b1 1 0 0 1 0
b2 1 0 1 0 1
c4 1 1 0 0 1
d1 1 1 0 0 1
EDIT: I noticed discrepancy even in my join
command o/p.
EDIT: Got it fixed after setting the field separator to tab spaces -F'\t'
EDIT: Scrutinizer, setting the field separator to tab spaces for your awk
code gives correct o/p
1 Like
OK, thanks, so:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' FS='\t' OFS='\t' file
1 Like
@Scrutinizer : Is it possible to apply your code even on unsorted file ?
@Neutron : Your code making all 0.* values as 0 ?
@Neutron : Is it possible to define the number of value-columns for a key column, when I run the script? In the above example (1 key and 1 value). This example (1 key 5 value cols)
a1 0 1 1 0 1 a1 1 1 1 0 1 a1 1 1 1 1 1 a1 0 0 0 0 1 a1 1 1 1 1 1 1
a2 1 0 1 0 1 a4 1 1 1 1 1 a4 0 0 1 1 1 a4 0 0 0 0 1 a4 1 1 1 1 1 1
a3 1 0 1 0 1 d1 1 1 1 0 1 d1 1 1 0 0 1 d1 0 0 0 0 1 d1 1 1 1 1 1 1
a4 1 0 1 0 1 c4 1 1 1 0 0 c4 1 1 1 0 0 c4 0 0 0 0 1 c4 1 1 1 1 1 1
b1 1 0 1 0 1
b2 1 1 1 0 1
c4 1 1 1 0 1
d1 1 1 1 0 1
---------- Post updated at 09:19 AM ---------- Previous update was at 04:50 AM ----------
With unsorted columns you could try this with two passes to the same file:
awk '
NR==FNR{
for(i=1; i<=5; i++) A[$(i*2-1),i]=$(i*2)
next
}
{
s=$1
for(i=1; i<=5; i++) s=s OFS A[$1,i]+0
print s
}
' FS='\t' OFS='\t' infile infile
Thanx. Is it possible to apply this to the 2nd example I mentioned > thanx in advance!