Shell script - group by

baladelaware73 · May 2, 2014, 3:54pm

Hi,
I have text file as shown below.

root 25 oracle 25  batch 30  griduser 32 admin 35
root 25 oracle 25  batch 30  griduser 32
oracle 25  batch 30  griduser 32 xuser 45 admin 35

I want to group by based on user name, and the output need to be as below. Not necessary the username to be in order, but group by has to be done based on the username value.

root 50 oracle 75  batch 90  griduser 64 user 45 admin 70

Please help.
Thanks

---------- Post updated at 03:54 PM ---------- Previous update was at 03:52 PM ----------

typo error, pasting the correct output

root 50 oracle 75 batch 90 griduser 96 user 45 admin 70

Yoda · May 2, 2014, 4:05pm

Show us what you have tried so far to resolve this problem.

baladelaware73 · May 2, 2014, 4:25pm

RC=`awk 'END { print NR }' $FILENAME`
RC=`expr $RC`
OBJNO=4
if [ "$RC" -ge 4 ]; then
             SETVAL=1
             OBJDETAIL=""
             FIELD1=1
             FIELD2=2
       until [ $OBJNO -eq $SETVAL ]; do
             DATAVAL=`cat $FILENAME | awk '{a[$'$FIELD1']+=$'$FIELD2'}END{for(i in a)print i,",",int((int(a)/NR))}' `
             SETVAL=`expr $SETVAL + 1`
             FIELD1=`expr $FIELD1 + 2`
             FIELD2=`expr $FIELD2 + 2`
             OBJDETAIL="$OBJDETAIL"",""$DATAVAL"
        done
     echo $OBJDETAIL
fi

This code works when file has same username in the enitre field, now requirement changed.

Yoda · May 2, 2014, 4:31pm

Please use code tags for posting code fragments or data samples.

Here is an awk approach:

awk '
        {
                for ( i = 1; i <= NF; i += 2 )
                        A[$i] += $( i + 1 )
        }
        END {
                for ( k in A )
                        printf "%s %s ", k, A[k]
                printf "\n"
        }
' file

baladelaware73 · May 5, 2014, 11:32am

Thanks, this is a great approach. It worked for me.

Klashxx · May 5, 2014, 11:52am

A python:

# cat test.py
#!/usr/bin/env python
import re

text = '''root 25 oracle 25  batch 30  griduser 32 admin 35
root 25 oracle 25  batch 30  griduser 32
oracle 25  batch 30  griduser 32 xuser 45 admin 35'''

users = {}

for k,v in [i for i in re.findall('(?:([a-z]+)\s+(\d+))',text)]:
    if k in users:
        users[k] += int(v)
    else:
        users[k] = int(v)
print users

# ./test.py  
{'admin': 70, 'griduser': 96, 'xuser': 45, 'batch': 90, 'oracle': 75, 'root': 50}

baladelaware73 · May 5, 2014, 12:10pm

Below Awk sum the value correctly based on username, but not dividing the sum of values based on number of rows of user name. I need something like
root 50 oracle 75 .......can anyone shed some light please?

awk '        {                for ( i = 1; i <= NF; i += 2 )                        A[$i] += $( i + 1 )        }        END {                for ( k in A )                        printf "%s %s ", k, A[k]                printf "\n"        }' file

RudiC · May 5, 2014, 3:49pm

Add sth like B[$i]++ alongside the A[$i] summing, and then divide by B[k] when outputting.

baladelaware73 · May 5, 2014, 4:04pm

I tried, but some syntax error. can you please incorporate B($i)++ in the below Awk script?

awk '
        {
                for ( i = 1; i <= NF; i += 2 )
                        A[$i] += $( i + 1 )
        }
        END {
                for ( k in A )
                        printf "%s %s ", k, A[k]
                printf "\n"
        }
' file

RudiC · May 5, 2014, 5:13pm

What's so difficult about using code tags?

MadeInGermany · May 6, 2014, 2:37am

I put the first half for you, along with a small fix (not adding a not existing $(NF+1) ):

awk '
{
  for ( i = 2; i <= NF; i += 2 ) {
    A[$i] += $i
    B[$i]++
  }

The second half is left as an exercise.