Parsing of file for Report Generation (String parsing and splitting)

umar.shaikh · February 27, 2009, 5:55am

Hey guys,

I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.

The file is in the following format:

TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy BExpected=yyy CTime=xx CResult=yyy CExpected=yyy ....

The file can continue this way as long as it wants. The A type (for that matter any type) might be repeated again too.
But there would always be 3 fields for a given type, Time, Result and ExpectedResult.

I need to consolidate all of these values for all the different types (A, B, etc.) and create a tabular report for them.

Oh, yes, forgot to mention.... I'm using a shell based on zsh.

Thanks,
Umar

adderek · February 27, 2009, 5:20pm

Seems like a job for a higher-level code. Perl, python, awk, anything like that.
If the line can be very long you could expect unexpected problems (like storing too much data into a variable in shell or the line being truncated).

On the other side such trick could probably work as well: "for var in $(cat file); do .... done" or better - use "read".... you might need to play with the FS/IFS settings.

Anyway - I strongly suggest higher level language.

cfajohnson · February 27, 2009, 5:37pm

There is no reason either should happen. The variable length is limited only by available memory.

The is not the way to read a file. It sets var to each word in the file, not each line.

There is no need to play with IFS to read a file line by line.

cfajohnson · February 27, 2009, 5:48pm

umar.shaikh:

Hey guys,

I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.

The file is in the following format:
TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy BExpected=yyy CTime=xx CResult=yyy CExpected=yyy ....
The file can continue this way as long as it wants. The A type (for that matter any type) might be repeated again too.
But there would always be 3 fields for a given type, Time, Result and ExpectedResult.

I need to consolidate all of these values for all the different types (A, B, etc.) and create a tabular report for them.

What does "consolidate" mean? What do you want to do with the line?

What type of output do you want?

Is this something like it:

printf "<table>\n"
while read a b line
do
  eval "$line" ## set all the variables
  printf "<tr><td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s\n" \
    "$a" "$b" "$ATime" "$AResult" "$AExpected" "$BTime" "$BResult" "$BExpected" \
    "$CTime" "$CResult" "$CExpected" 
done < "$FILE"
printf "</table>\n"

Use POSIX shell syntax for scripting; use extensions only when it is more efficient.

adderek · February 27, 2009, 6:06pm

Might be needed if you choose the way mentioned above.

umar.shaikh · February 28, 2009, 2:01am

Ok. I am open to use awk or jython.

cfajohnson:

What does "consolidate" mean? What do you want to do with the line?

What type of output do you want?

Is this something like it:

printf "<table>\n"
while read a b line
do
  eval "$line" ## set all the variables
  printf "<tr><td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s\n" \
   "$a" "$b" "$ATime" "$AResult" "$AExpected" "$BTime" "$BResult" "$BExpected" \
   "$CTime" "$CResult" "$CExpected" 
done < "$FILE"
printf "</table>\n"

I read your code. I did think of something like that but it does not help me.
I guess I might have not been clear enough. The variables A,B,C,... could be of any number.
So i could have something ranging from A till maybe M (i.e. each A would have ATime, AResult, AExpected....... continuing like this all the way till MTime, MResult, MExpected)
Hope you follow...

Ok. I'll give you an example:
File:

TID1 Name1 ATime=2 AResult=PASS AExpected=PASS BTime=3 BResult=PASS BExpected=PASS CTime=3 CResult=PASS CExpected=FAIL
TID2 Name2 ATime=2 AResult=FAIL AExpected=PASS
TID3 Name3 ATime=2 AResult=PASS AExpected=PASS BTime=3 BResult=PASS BExpected=PASS CTime=3 CResult=FAIL CExpected=FAIL DTime=2 DResult=PASS DExpected=PASS

The output file would be more like this: (I'm thinking in these lines as of now)

<html>
Below table values are in Time, Result and Expected Result format<br>
<table border=1>
<tr rowspan=2><td rowspan=2>TID1</td><td rowspan=2>Name1</td><td colspan=3>JobA</td><td colspan=3>JobB</td><td colspan=3>JobC</td></tr>
<tr><td>2</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>FAIL</td></tr>
<tr rowspan=2><td rowspan=2>TID2</td><td rowspan=2>Name2</td><td colspan=3>JobA</td></tr>
<tr><td>2</td><td>FAIL</td><td>PASS</td></tr>
<tr rowspan=2><td rowspan=2>TID3</td><td rowspan=2>Name3</td><td colspan=3>JobA</td><td colspan=3>JobB</td><td colspan=3>JobC</td><td colspan=3>JobD</td></tr>
<tr><td>2</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>PASS</td><td>3</td><td>FAIL</td><td>FAIL</td><td>2</td><td>PASS</td><td>PASS</td></tr>
</table>
</html>

adderek · February 28, 2009, 4:51am

AWK seems to be a good choice (however I have not analyzed the example in details). It seems like you are going to greate a table with cells spanning across multiple rows/columns. Think twice if this is what you need. Probably it would be acceptable to have the same fields repeated in multiple rows instead of using rowspan. That would make your code much easier.
You might need to count the entries first and then make a second run to fill the output document. Easy way is to parse that file 2x with a different code - however this is not optimal (low performance, too much I/O). That might be the best option for you.
There are plenty of other ways to do this (including PERL with "split" which can be very slow).

If by any chance you experience problems with number of "columns" in AWM try changing the awk (mawk, nawk, gawk,... there is plenty of this stuff with different limitations).

Good luck

cfajohnson · February 28, 2009, 2:20pm

exec < "$FILE"
set -f
while read row name data
do
  printf '<tr><td rowspan="2">%s</td><td rowspan="2">%s</td>\n' "$row" "$name"
  set -- $data

  while [ $# -ge 3 ]
  do
    job=${data%${data#?}}
    printf '<td colspan="3">Job%s</td>' "$job"
    shift 3
  done

  set -- $data

  printf '<tr>'
  while [ $# -ge 3 ]
  do
    job=${data%${data#?}}
    printf '   <td>%s</td><td>%s</td><td>%s</td>\n' "${1#*=}" "${2#*=}" "${2#*=}"
    shift 3
  done
  echo
done

umar.shaikh · March 2, 2009, 12:38am

Great!

Thanks a LOT! I modified your code a bit as per my requirements and it works like a gem!!

Thanks again!

Regards,
Umar