awk and sum

ranjancom2000 · January 22, 2018, 8:45am

This is my file

vol0 285GB
vol0.snapshot 15GB
vol11_root 0GB
vol12_root 47GB
vol12_root.snapshot 2GB

I need the output

vol0 285GB,vol0.snapshot 15GB,sum-300GB
vol11_root 0GB,nosnap,sum-0Gb
vol12_root 47GB,vol12_root.snapshot 2GB,49GB

I was trying to use paste -d, --. But i having issue i need take only the line which has .snapshot and if no shapshot found i need to add has nosnap

For adding the value of two data and provide has SUM no idea how to do

Don_Cragun · January 22, 2018, 9:26am

ranjancom2000:

This is my file
vol0 285GB
vol0.snapshot 15GB
vol11_root 0GB
vol12_root 47GB
vol12_root.snapshot 2GB
I need the output
vol0 285GB,vol0.snapshot 15GB,sum-300GB
vol11_root 0GB,nosnap,sum-0Gb
vol12_root 47GB,vol12_root.snapshot 2GB,49GB
I was trying to use paste -d, --. But i having issue i need take only the line which has .snapshot and if no shapshot found i need to add has nosnap

For adding the value of two data and provide has SUM no idea how to do

What operating system are you using?

What shell are you using?

What did you try with paste ?

How are we supposed to guess when the sum is to be printed with "GB" and when it is to be printed with "Gb"?

How are supposed to guess when "sum-" is to be included in the output and when it is to be omitted?

Are all numbers to be added given as "GB" values? Or could "KB", "MB", "TB" and/or other multipliers be present?

ranjancom2000 · January 22, 2018, 9:46am

What operating system are you using?
I am using cywin

What shell are you using?

shell

What did you try with paste ?

cat file|paste -d, - - (I used this command to append both lines but the issue some lines dont have snapshot details)

How are we supposed to guess when the sum is to be printed with "GB" and when it is to be printed with "Gb"?
All will in GB

How are supposed to guess when "sum-" is to be included in the output and when it is to be omitted?

One i get the two date
"vol0 285GB,vol0.snapshot 15GB" using awk to sum the integer

Are all numbers to be added given as "GB" values? Or could "KB", "MB", "TB" and/or other multipliers be present?
[/quote]

Yoda · January 22, 2018, 9:57am

Here is one approach:-

awk '
        {
                match ( $0, /vol[0-9]*/ )
                vol = sprintf ( "%s", substr( $0, RSTART, RLENGTH ) )
        }
        !/snapshot/ {
                A[vol FS "orig"] = $0
                T[vol] += ( $NF + 0 )
        }
        /snapshot/ {
                A[vol FS "snap"] = $0
                T[vol] += ( $NF + 0 )
        }
        END {
                for ( k in T )
                        print A[k FS "orig"], A[k FS "snap"] ? A[k FS "snap"] : "no snap", "sum-" T[k] "Gb"


        }
' OFS=, file

rdrtx1 · January 22, 2018, 9:59am

another:

awk '
{
   w=$1; sub("[.].*", "", w);
   if(! a[w]) {b[c++]=w; g=0;}
   a[w]=a[w] $0 ",";
   if ($NF ~ /[0-9]*GB$/) {l=$NF; gsub("[^0-9]", "", l); s[w]=(g+=l);}
}
END {
   for (i=0; i<c; i++) {
      print a [b]((a [b]~ /snapshot/) ? "" : "nosnap,") "sum-" s [b]"GB";
   }
}
' datafile

RudiC · January 22, 2018, 10:14am

Or

awk '
BG      {BG = 0
         if (/snap/)    {SUM += $2
                         printf "%s,sum-%dGB\n", $0, SUM
                         SUM = 0
                         next
                        }
         else            printf "%s,sum-%dGB\n", "nosnap", SUM
         SUM = 0
        }
!BG     {printf "%s,", $0
         SUM += $2
         BG = 1
        }
' file

ranjancom2000 · January 22, 2018, 10:15am

rdrtx1:

another:

awk '
{
   w=$1; sub("[.].*", "", w);
   if(! a[w]) {b[c++]=w; g=0;}
   a[w]=a[w] $0 ",";
   if ($NF ~ /[0-9]*GB$/) {l=$NF; gsub("[^0-9]", "", l); s[w]=(g+=l);}
}
END {
   for (i=0; i<c; i++) {
   print a [b]((a [b]~ /snapshot/) ? "" : "nosnap,") "sum-" s [b]"GB";
   }
}
' datafile

I can believe this works great. But still i dont know how it works

rdrtx1 · January 22, 2018, 10:44am

awk '
{
   w=$1; sub("[.].*", "", w);                                                      # strip first word
   if(! a[w]) {b[c++]=w; g=0;}                                                     # if word not read before load into word counter (keep the order read); reset gigabit sum;
   a[w]=a[w] $0 ",";                                                               # concatenate line string into word array
   if ($NF ~ /[0-9]*GB$/) {l=$NF; gsub("[^0-9]", "", l); s[w]=(g+=l);}             # strip number from last word; add value to sum for word
}
END {
   for (i=0; i<c; i++) {                                                           # loop for words read
      print a [b]((a [b]~ /snapshot/) ? "" : "nosnap,") "sum-" s [b]"GB"; # print word array line stored, add "nosnap" if not found, sum for word in array
   }
}
' datafile

Scrutinizer · January 22, 2018, 11:21am

Another approach:

awk '
{
  split($1,F,".")
  i=F[1]
  A=A $0 ","
  T+=$2
}
END {
  for(i in A)
    printf "%ssum-%sGB\n",A,T
}
' file

But you did not answer one of Don Cragun's questions in post #2

This is quite essential, because the approaches in this thread will be fail if the file can also contain KB, MB or TB values.

ranjancom2000 · January 22, 2018, 11:51am

scrutinizer:

Another approach:
awk '
{
  split($1,F,".")
  i=F[1]
  A=A $0 ","
  T+=$2
}
END {
  for(i in A)
   printf "%ssum-%sGB\n",A,T
}
' file
But you did not answer one of Don Cragun's questions in post #2

This is quite essential, because the approaches in this thread will be fail if the file can also contain KB, MB or TB values.

The value will be in GB only