Appending lines with word frequencies, ordering and indexing a column

Dear All,

I have the following input data:

w1	20	g1
w1	10	g1
w2	12	g1
w2	23	g1
w3	10	g1
w3	17	g1
w3	12.5	g1
w3	21	g1
w4	11	g1
w4	13.2	g1
w4	23	g1
w4	18	g1

First I seek to find the word frequencies in col1 and sort col2 in ascending order for each change in a col1 word. Second, append the frequencies and orders to each line such as:


W	Z	U	freq(W)	Z-order

w1	10	g1	2	1
w1	20	g1	2	2
w2	12	g1	2	1
w2	23	g1	2	2
w3	10	g1	4	1
w3	12.5	g1	4	2
w3	17	g1	4	3
w3	21	g1	4	4
w4	11	g1	4	1
w4	13.2	g1	4	2
w4	18	g1	4	3
w4	23	g1	4	4

I trying to complete the following code but not making any headway:

awk 'NR==FNR{words[++nwords]=$1;next}
{for(i=1;i<=NF;i++)freq[$i]++}
END{for(w=1;w<=nwords;w++)
print words[w], freq[words[w]]+0}' infile

I therefore need your help.

Many thanks,

Ghetz

Try this,

sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file 

NR==FNR is only useful when there is more than one input file. Try this, assuming column one is in sorted order:

awk 'function pr(){if(p)for(i=1;i<=n;i++){print A,n,i;delete A};p=$1;n=0}p!=$1{pr()}{A[++n]=$0}END{pr()}' OFS='\t' infile
1 Like

Dear pravin27

First many thanks for your reply.

I tried your code replacing "sortfile" with "input_file":

sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file

but some how the ascending order sort on col2 does not work. It produces the same output as that of Scrutinizer. Is there something I am missing?

Regards,

Ghetz

Hi Ghetz,

Sorry ....

Try this,

sort -nk2,1 inputfile -o inputfile; awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' OFS="\t" inputfile inputfile
1 Like

Many thanks pravin27,

Your code works beautifully.

Regards,

Ghetz