Parsing of file for Report Generation (String parsing and splitting)

Hey guys,

I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.

The file is in the following format:

TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy BExpected=yyy CTime=xx CResult=yyy CExpected=yyy ....

The file can continue this way as long as it wants. The A type (for that matter any type) might be repeated again too.
But there would always be 3 fields for a given type, Time, Result and ExpectedResult.

I need to consolidate all of these values for all the different types (A, B, etc.) and create a tabular report for them.

Oh, yes, forgot to mention.... I'm using a shell based on zsh.

Thanks,
Umar

Seems like a job for a higher-level code. Perl, python, awk, anything like that.
If the line can be very long you could expect unexpected problems (like storing too much data into a variable in shell or the line being truncated).

On the other side such trick could probably work as well: "for var in $(cat file); do .... done" or better - use "read".... you might need to play with the FS/IFS settings.

Anyway - I strongly suggest higher level language.

There is no reason either should happen. The variable length is limited only by available memory.

The is not the way to read a file. It sets var to each word in the file, not each line.

There is no need to play with IFS to read a file line by line.

What does "consolidate" mean? What do you want to do with the line?

What type of output do you want?

Is this something like it:

printf "<table>\n"
while read a b line
do
  eval "$line" ## set all the variables
  printf "<tr><td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s<td>%s\n" \
    "$a" "$b" "$ATime" "$AResult" "$AExpected" "$BTime" "$BResult" "$BExpected" \
    "$CTime" "$CResult" "$CExpected" 
done < "$FILE"
printf "</table>\n"

Use POSIX shell syntax for scripting; use extensions only when it is more efficient.

Might be needed if you choose the way mentioned above.

Ok. I am open to use awk or jython.

I read your code. I did think of something like that but it does not help me. :slight_smile:
I guess I might have not been clear enough. The variables A,B,C,... could be of any number.
So i could have something ranging from A till maybe M (i.e. each A would have ATime, AResult, AExpected....... continuing like this all the way till MTime, MResult, MExpected)
Hope you follow...

Ok. I'll give you an example:
File:

TID1 Name1 ATime=2 AResult=PASS AExpected=PASS BTime=3 BResult=PASS BExpected=PASS CTime=3 CResult=PASS CExpected=FAIL
TID2 Name2 ATime=2 AResult=FAIL AExpected=PASS
TID3 Name3 ATime=2 AResult=PASS AExpected=PASS BTime=3 BResult=PASS BExpected=PASS CTime=3 CResult=FAIL CExpected=FAIL DTime=2 DResult=PASS DExpected=PASS

The output file would be more like this: (I'm thinking in these lines as of now)

<html>
Below table values are in Time, Result and Expected Result format<br>
<table border=1>
<tr rowspan=2><td rowspan=2>TID1</td><td rowspan=2>Name1</td><td colspan=3>JobA</td><td colspan=3>JobB</td><td colspan=3>JobC</td></tr>
<tr><td>2</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>FAIL</td></tr>
<tr rowspan=2><td rowspan=2>TID2</td><td rowspan=2>Name2</td><td colspan=3>JobA</td></tr>
<tr><td>2</td><td>FAIL</td><td>PASS</td></tr>
<tr rowspan=2><td rowspan=2>TID3</td><td rowspan=2>Name3</td><td colspan=3>JobA</td><td colspan=3>JobB</td><td colspan=3>JobC</td><td colspan=3>JobD</td></tr>
<tr><td>2</td><td>PASS</td><td>PASS</td><td>3</td><td>PASS</td><td>PASS</td><td>3</td><td>FAIL</td><td>FAIL</td><td>2</td><td>PASS</td><td>PASS</td></tr>
</table>
</html>

AWK seems to be a good choice (however I have not analyzed the example in details). It seems like you are going to greate a table with cells spanning across multiple rows/columns. Think twice if this is what you need. Probably it would be acceptable to have the same fields repeated in multiple rows instead of using rowspan. That would make your code much easier.
You might need to count the entries first and then make a second run to fill the output document. Easy way is to parse that file 2x with a different code - however this is not optimal (low performance, too much I/O). That might be the best option for you.
There are plenty of other ways to do this (including PERL with "split" which can be very slow).

If by any chance you experience problems with number of "columns" in AWM try changing the awk (mawk, nawk, gawk,... there is plenty of this stuff with different limitations).

Good luck

exec < "$FILE"
set -f
while read row name data
do
  printf '<tr><td rowspan="2">%s</td><td rowspan="2">%s</td>\n' "$row" "$name"
  set -- $data

  while [ $# -ge 3 ]
  do
    job=${data%${data#?}}
    printf '<td colspan="3">Job%s</td>' "$job"
    shift 3
  done

  set -- $data

  printf '<tr>'
  while [ $# -ge 3 ]
  do
    job=${data%${data#?}}
    printf '   <td>%s</td><td>%s</td><td>%s</td>\n' "${1#*=}" "${2#*=}" "${2#*=}"
    shift 3
  done
  echo
done

Great! :slight_smile:

Thanks a LOT! I modified your code a bit as per my requirements and it works like a gem!!

Thanks again! :b:

Regards,
Umar