Scripting Help needed with a text file.

tech_frk · July 27, 2012, 6:36pm

Hi,
Iam a novice to unix shell scripting, need a help from you guys.The scenario is as follows, i have the following Text file.

order No    Company    Category
21    aaa    A
24    aaa    A
87    aaa    B
98    aaa    B
23    abc    A
45    abc    B
25    bbb    A
76    wes    A
66    wes    B
44    wes    B
35    wes    B
39    wes    B
90    esd    B
99    esd    B
109    esd    B
26    esd    B
58    esd    A
76    tre    B
75    tre    B

Desired Output is as follows(columns Count, A, B are the no of times they are repeated):

Company    Count    A    B
aaa    4    2    2
abc    2    1    1
bbb    1    1    0
wes    5    1    4
esd    5    4    1
tre    2    0    2

The output should be in a different file. Appreciate your help.

Chirel · July 27, 2012, 6:51pm

Hi,

awk '/^[0-9]/ { print $2,$3}' input-file | sort | uniq -c | awk 'BEGIN{print "Company    Count   A   B"} {if (comp && comp != $2) { printf("%-10s %5d %3d %3d\n",comp,a["A"]+a["B"],a["A"],a["B"]); comp=""; a["A"]=a["B"]=0; } comp=$2; a[$3]=$1; } END{if (comp) printf("%-10s %5d %3d %3d\n",comp,a["A"]+a["B"],a["A"],a["B"]);}'

alister · July 27, 2012, 7:36pm

That's a one-liner in name only. In the future, please use a reasonable coding style instead of such a long line. It will make your code easier for novices to understand, and members not using a 2560 pixel wide display will be able to read your post (and posts which quote your post) without having to resort to tedious horizontal scrolling.

Regards,
Alister

Chirel · July 28, 2012, 3:36am

Hi,

Alister you are right, even if this solve the problem it's not user friendly, so here is the readable version

First take only company name and the A/B status and we sort them

# awk '/^[0-9]/ { print $2,$3}' input-file | sort > sorted-file

Then we process the sorted-file by counting duplicates and re-arrange output

# uniq -c sorted-file | awk -f doit.awk
Company    Count   A   B
aaa            4   2   2
abc            2   1   1
bbb            1   1   0
esd            5   1   4
tre            2   0   2
wes            5   1   4

here is the content of the file doit.awk :

BEGIN { 
  print "Company    Count   A   B"
}

{
  if (comp && comp != $2) {
    printf("%-10s %5d %3d %3d\n",comp,a["A"]+a["B"],a["A"],a["B"]);
    comp="";
    a["A"]=a["B"]=0;
  }
  comp=$2;
  a[$3]=$1;
}

END {
  if (comp) printf("%-10s %5d %3d %3d\n",comp,a["A"]+a["B"],a["A"],a["B"]);
}

rangarasan · July 28, 2012, 4:43am

Hi,

Try this one,
Shorter version of untested code,

awk '/^[0-9]/ {a[$2]=a[$2]+1;b[$2" "$3]=b[$2" "$3]+1;}END{print"Company    Count   A   B"; for(i in a){printf("%-10s %5d %3d%3d\n",i,a,b[i" A"],b[i" B"]);}}' inputfile

if you have GNU awk you can sort the array a using asort function.
Cheers,
Ranga:-)

Chirel · July 28, 2012, 6:10am

Well done Ranga

ps for fun :

awk '/^[0-9]/ {a[$2]++;b[$2" "$3]++;}END . . . .

rangarasan · July 28, 2012, 6:37am

Hi,
I am unable to put double plus sign through my mobile thats why i used that. I should change my mobile.
Happy weekend:-)
Cheers,
Ranga:-)