Is there a awk solution for this??

timj123 · March 6, 2008, 7:26pm

I am writing a awk script that gathers certain data from certain fields. I needed a awk solution for this, because it will later become a function in the script.

I have the following data that I need output on a single line, but record spans across multilple lines and records are not "together". Example would be tom below, record "tom" below is on 4 different lines, but I only need data from 2 of the lines, I will also need the same info for pat, tim, and tad, or whoever else has a record like the format below.

2008   fl01   LAC   2589   polk   doal
xx 2008q1 mx
     sect 25698541

     Sales 08 Dept group

        lead1    2008q1
        tom
        pat
        tim
        tad

        lead1  07q4   07q3   07q2   07q1   06q4   06q3   jan
        tom    0      96     0      3312   3624   0      312
        pat    0      17     0      0      30     0      30
        tim    357    03     04     25     3020   3120   20
        tad    1734   0      0      0      5213   5213   0

        lead1  feb    mar    apr    may    jun    jul    aug
        tom    0      96     0      0      0      0      0
        pat    0      17     0      0      0      0      0
        tim    357    23     5      7      8      14     70
        tad    1734   0      0      0      0      0      0

        lead1  sept   oct    nov    dec
        tom    0      0      460    92
        pat    0      0      240    0
        tim    0      21     1800   0
        tad    0      0      672    0

2008   fl01  LAC   2589    polk   doal
yy 2008q1 mx
     sect 2569852

     Sales 08 Dept group

I needed the following output:

lead1   07q4    07q1    06q4    06q3    sept    oct     nov
tim	357	25	3020	3120	0	21	1800 
tad	1734	0	5213	5213	0	0	672

Is there a awk solution to this??

thanks in advance for this, because I think this is a difficult one.

timj123 · March 11, 2008, 9:09pm

timj123:

I am writing a awk script that gathers certain data from certain fields. I needed a awk solution for this, because it will later become a function in the script.

I have the following data that I need output on a single line, but record spans across multilple lines and records are not "together". Example would be tom below, record "tom" below is on 4 different lines, but I only need data from 2 of the lines, I will also need the same info for pat, tim, and tad, or whoever else has a record like the format below.
2008   fl01   LAC   2589   polk   doal
xx 2008q1 mx
   sect 25698541

   Sales 08 Dept group

   lead1    2008q1
   tom
   pat
   tim
   tad

   lead1  07q4   07q3   07q2   07q1   06q4   06q3   jan
   tom    0      96     0      3312   3624   0      312
   pat    0      17     0      0      30     0      30
   tim    357    03     04     25     3020   3120   20
   tad    1734   0      0      0      5213   5213   0

   lead1  feb    mar    apr    may    jun    jul    aug
   tom    0      96     0      0      0      0      0
   pat    0      17     0      0      0      0      0
   tim    357    23     5      7      8      14     70
   tad    1734   0      0      0      0      0      0

   lead1  sept   oct    nov    dec
   tom    0      0      460    92
   pat    0      0      240    0
   tim    0      21     1800   0
   tad    0      0      672    0

2008   fl01  LAC   2589    polk   doal
yy 2008q1 mx
   sect 2569852

   Sales 08 Dept group
I needed the following output:
lead1   07q4    07q1    06q4    06q3    sept    oct     nov
tim	357	25	3020	3120	0	21	1800 
tad	1734	0	5213	5213	0	0	672
Is there a awk solution to this??

thanks in advance for this, because I think this is a difficult one.

any help out there for this, please?

radoulov · March 12, 2008, 6:44am

awk 'NR == 1 { print "lead1   07q4    07q1    06q4    06q3    sept    oct     nov" }
$1 ~ "^("users")$" && NF > 1 { 
x[$1]++
if (x[$1] == 1)
  p[$1] = sprintf ("%s\t%s\t%s\t%s\t%s", $1, $2, $5, $6, $7)
if (x[$1] == 3) {
  printf "%s\t%s\t%s\t%s\n", p[$1], $2, $3, $4 
 }
}' users="tim|tad" file

You can add more users in the pattern: tim|tad|pat etc.
Use nawk or /usr/xpg4/bin/awk on Solaris.

timj123 · March 13, 2008, 10:18am

radoulov:

awk 'NR == 1 { print "lead1   07q4    07q1    06q4    06q3    sept    oct     nov" }
$1 ~ "^("users")$" && NF > 1 { 
x[$1]++
if (x[$1] == 1)
  p[$1] = sprintf ("%s\t%s\t%s\t%s\t%s", $1, $2, $5, $6, $7)
if (x[$1] == 3) {
  printf "%s\t%s\t%s\t%s\n", p[$1], $2, $3, $4 
 }
}' users="tim|tad" file

You can add more users in the pattern: tim|tad|pat etc.
Use nawk or /usr/xpg4/bin/awk on Solaris.

This works GREAT, I REALLY appreciate the help on this, but what if I wanted to sum up, columns 07q4 and 07q1, and then put that value at the end of the printf statement? Having issues with that part. Can you help?

radoulov · March 13, 2008, 10:43am

awk 'NR == 1 { print "lead1   07q4    07q1    06q4    06q3    sept    oct     nov     tot" }
$1 ~ "^("users")$" && NF > 1 { 
x[$1]++
if (x[$1] == 1) {
  p[$1] = sprintf ("%s\t%s\t%s\t%s\t%s", $1, $2, $5, $6, $7)
  t[$1] = $2 + $5
}
if (x[$1] == 3) {
  printf "%s\t%s\t%s\t%s\t%d\n", p[$1], $2, $3, $4, t[$1] 
 }
}' users="tim|tad" file

timj123 · March 13, 2008, 10:57am

OK, I feel stupid now.
Thanks again soo much on saving me about a weeks worth of frustration.
I realize I need to look at issues like these from a different angle.

aspect_p · March 13, 2008, 12:20pm

radoulov:

awk 'NR == 1 { print "lead1   07q4    07q1    06q4    06q3    sept    oct     nov     tot" }
$1 ~ "^("users")$" && NF > 1 { 
x[$1]++
if (x[$1] == 1) {
  p[$1] = sprintf ("%s\t%s\t%s\t%s\t%s", $1, $2, $5, $6, $7)
  t[$1] = $2 + $5
}
if (x[$1] == 3) {
  printf "%s\t%s\t%s\t%s\t%d\n", p[$1], $2, $3, $4, t[$1] 
 }
}' users="tim|tad" file

Can you go a little further in describing the awk methods used on the script, im sorry to be a bother but something like this can be a HUGE ace in my arsenal of shell scripting.

radoulov · March 14, 2008, 6:28am

Of course,
line by line.

NR == 1 { print "lead1   07q4    07q1    06q4    06q3    sept    oct     nov     tot" }

Just print the header (you may prefere to use BEGIN, instead of NR == 1, but in that case you should use the -v syntax for the users variable:

awk -v users="tim|tad"

$1 ~ "^("users")$" && NF > 1

+ for every record that matches the pattern:

first field is one of the names given (the variable users is defined at the end: users="tim|tad" , the pipe | means alternation, tim OR tad.
AND (logical and) the number of fields is greater than 1, this is specific for the OP input file format, we want to skip the following lines:

        lead1    2008q1
        tom
        pat
        tim
        tad

+ do the following:

{ 
x[$1]++
if (x[$1] == 1) {
  p[$1] = sprintf ("%s\t%s\t%s\t%s\t%s", $1, $2, $5, $6, $7)
  t[$1] = $2 + $5
}

x is an associative array, the keys are the names ($1), the value/element is their count (how many times they appear in the matched lines). So when the name appears for the first time: x[$1] == 1:

tim    357    03     04     25     3020   3120   20
or
tad    1734   0      0      0      5213   5213   0

Save fields 1, 2, 5, 6 and 7 as value of another associative array p with key $1 (the name), sprintf is needed for formatting (values separated by tabs).

the t array is the sum of $2 and $5 and uses the same key.

if (x[$1] == 3)

if x[$1] is 3, the following lines:

tim    0      21     1800   0
or
tad    0      0      672    0

{
  printf "%s\t%s\t%s\t%s\t%d\n", p[$1], $2, $3, $4, t[$1] 
 }

print the previously saved (when p[$1] was 1 ) record (p[$1]) plus $2 (sept), $3 (oct), $4 (nov) and the sum.