Ghetz
January 1, 2011, 8:58pm
1
Dear All,
I have the following input data:
w1 20 g1
w1 10 g1
w2 12 g1
w2 23 g1
w3 10 g1
w3 17 g1
w3 12.5 g1
w3 21 g1
w4 11 g1
w4 13.2 g1
w4 23 g1
w4 18 g1
First I seek to find the word frequencies in col1 and sort col2 in ascending order for each change in a col1 word. Second, append the frequencies and orders to each line such as:
W Z U freq(W) Z-order
w1 10 g1 2 1
w1 20 g1 2 2
w2 12 g1 2 1
w2 23 g1 2 2
w3 10 g1 4 1
w3 12.5 g1 4 2
w3 17 g1 4 3
w3 21 g1 4 4
w4 11 g1 4 1
w4 13.2 g1 4 2
w4 18 g1 4 3
w4 23 g1 4 4
I trying to complete the following code but not making any headway:
awk 'NR==FNR{words[++nwords]=$1;next}
{for(i=1;i<=NF;i++)freq[$i]++}
END{for(w=1;w<=nwords;w++)
print words[w], freq[words[w]]+0}' infile
I therefore need your help.
Many thanks,
Ghetz
Try this,
sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file
NR==FNR is only useful when there is more than one input file. Try this, assuming column one is in sorted order:
awk 'function pr(){if(p)for(i=1;i<=n;i++){print A,n,i;delete A};p=$1;n=0}p!=$1{pr()}{A[++n]=$0}END{pr()}' OFS='\t' infile
1 Like
Ghetz
January 2, 2011, 6:53pm
4
Dear pravin27
First many thanks for your reply.
I tried your code replacing "sortfile" with "input_file":
sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file
but some how the ascending order sort on col2 does not work. It produces the same output as that of Scrutinizer. Is there something I am missing?
Regards,
Ghetz
Hi Ghetz,
Sorry ....
Try this,
sort -nk2,1 inputfile -o inputfile; awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' OFS="\t" inputfile inputfile
1 Like
Ghetz
January 3, 2011, 5:11pm
6
Many thanks pravin27,
Your code works beautifully.
Regards,
Ghetz