Merge cells in all rows of a HTML table dynamically.

Hello All,

I have visited many pages in Unix.com and could find out one solution for merging the HTML cells in the 1st row.

(Unable to post the complete URL as I should not as per website rules).

But, however I try, I couldn't achieve this merging to happen for all other rows of HTML table.

Could someone help me in this regard please....

Awaiting your response!!

---------- Post updated at 07:29 PM ---------- Previous update was at 07:27 PM ----------

awk '
        BEGIN {
		FS=","
                print "To: to@email.com"
                print "MIME-Version: 1.0"
                print "Content-Type: text/html"
                print "Subject: Email Subject"
                print "<html><body>"
                print "<table border=1 cellspacing=2 cellpadding=2>"
        }
        NR == FNR {
                if ( $0 !~ /^#/ && $0 !~ /^S/ )
                        C[$1]++
                next
        }
        !/^#/ && /^S/ {
                print "<tr>"
                for ( i = 1; i <= NF; i++ )
                        print "<td><b>" $i "</b></td>"
                print "</tr>"
        }
        !/^#/ && !/^S/ {
                print "<tr>"
                for ( i = 1; i <= NF; i++ )
                {
                        if ( i == 1 && !( $i in R ) )
                        {
                                print "<td rowspan=" C[$i] ">" $i "</td>"
                                R[$i]
                        }
                        if ( i > 1 )
                                 print "<td>" $i "</td>"
                }
                print "</tr>"
        }
        END {
                print "</table></body></html>"
        }
' inputfile inputfile | /usr/sbin/sendmail -t

This is the sample code which i have tried...

Just write link as text. This is just a security measure against spammers and we (the moderators) will change it to a link if it is a legit reference. You will be allowed to post links yourself once you cross the threshold of a certain post count.

To be honest i haven't completely understood what you want to do and what exactly the script you showed doesn't do. Please describe (as exactly as possible) what you want to do and ideally provide a meaningful example of some input and what the result you expect it to be transformed into should be.

I hope this helps.

bakunin

1 Like

Actually, I have a comma delimited .csv file with 7 columns. Rows can be any.

Using the below code, I framed the HTML part of the .csv file and am able to send it to the mail.

awk 'BEGIN{
FS=","
print  "MIME-Version: 1.0"
print  "Content-Type: text/html"
print  "Content-Disposition: inline"
print  "<HTML><BODY>""<TABLE border="1"><TH>Header 1</TH><TH>Header 2</TH><TH>Header 3</TH><TH>Header 4</TH><TH>Header 5</TH><TH>Header 6</TH><TH>Header 7</TH>" 
}
{
printf "<TR>"
for(i=1;i<=NF;i++)
{
printf "<TD>%s</TD>", $i
}
print "</TR>"
 }
END{
print "</TABLE></BODY></HTML>"
 }
' inputfile.csv > inputfile.html

Now, when I checked the mail, some of the columns for adjacent rows in the output has equal data.

Example:

My output is like -

12345,XYZ,123,PPP,01,AAAAAAAAAA,DEFGDEFG
12345,XYZ,234,ABC,02,AAAAAAAAAA,DEFGDEFG
12345,XYZ,567,ABC,05,AAAAAAAAAA,HHHHHHH
45167,XYZ,809,ABC,09,BBBBBBBBBB,HHHHHHH
45167,XYZ,222,ABC,04,BBBBBBBBBB,DEFGDEFG

With this said my sample output, I need to merge all the adjacent rows which has equal values as we do in the excel sheet before sending the data in the mail so that it looks a bit tidy.

With the code which I have pasted with my question, I was able to merge the equal values of adjacent rows in the first column alone. But, its not merging (or combining) the equal values in other rows and columns.

Where am I missing that part and is there any way I can achieve this.

I hope now my problem is clear.

OK, understood, but what is considered "equal values"? Is (in the above example) line 1 equal to line 2 because fields 6 and 7 match? Or because fields 1 and 2 match and therefore line 3 is also considered equal? Are the lines in your example all considered different because they all differ somewhere and only completely duplicated lines are "equal" as per your definition?

Please keep in mind that we do not know what you do. What is probably obvious to you isn't obvious at all for us because we lack all the "implied knowledge" you have about your work.

I hope this helps.

bakunin

1 Like

Thank you for your response!!

Considering the sample output below:

12345,XYZ,123,PPP,01,AAAAAAAAAA,DEFGDEFG
12345,XYZ,234,ABC,02,AAAAAAAAAA,DEFGDEFG
12345,XYZ,567,ABC,05,AAAAAAAAAA,HHHHHHH
45167,XYZ,809,ABC,09,BBBBBBBBBB,HHHHHHH
45167,XYZ,222,ABC,04,BBBBBBBBBB,DEFGDEFG

In the first column we have 12345 value present in first 3 rows. So for the first 3 rows, first column should be a merged value showing single merged value 12345 instead of 3. In other words equivalent to rowspan in HTML or Merge in Excel sheet.

considering last 2 rows, instead of it displaying 45167 value twice it should only display it once with 4th and 5th rows first column as a merged cell.

If you consider second column, all the values in that column for all rows is XYZ. So, instead of showing the same value XYZ 5 times, it should show XYZ as a single value with all 5 cells as a single merged cell.

Similarly for others.

12345	XYZ	123	PPP	1	AAAAAAAAAA	DEFGDEFG
12345	XYZ	234	ABC	2	AAAAAAAAAA	DEFGDEFG
12345	XYZ	567	ABC	5	AAAAAAAAAA	HHHHHHH
45167	XYZ	809	ABC	9	BBBBBBBBBB	HHHHHHH
45167	XYZ	222	ABC	4	BBBBBBBBBB	DEFGDEFG

Say this is the initial HTML format I got in mail

The same should be converted to below.

12345	XYZ	123	PPP	1	AAAAAAAAAA	DEFGDEFG
		234	ABC	2		
		567		5		HHHHHHH
45167		809		9	BBBBBBBBBB	
		222		4		DEFGDEFG

I hope this gives a clear picture.

Would

awk -F, '{for (i=1; i<=NF;i++) if ($i == T) $i = ""; else T = $i} 1' OFS=, file
12345,XYZ,123,PPP,01,AAAAAAAAAA,DEFGDEFG
,,234,ABC,02,,
,,567,,05,,HHHHHHH
45167,,809,,09,BBBBBBBBBB,
,,222,,04,,DEFGDEFG

come close to what you need? Replace the OFS with "\t" to get to your tabulated output.

1 Like

Thank you for the response!!

But the thing here is, when I convert the output to HTML format it will show me blank cells instead of merged cells.

The thing is, HTML and CSV are not hundred percent identical. In HTML you can use that rowspan and colspan clauses (to <td> ) to expand cells across row/column boundaries. There is no such thing in CSV.

In CSV you have just a succession of "fields" (delimited by comma, hence the name) forming a line (=row). These fields are always a - one! - field, not something spanning lines or several places where a field would be. Even successive lines do not necessarily share some common structure: if a line has less (or more) fields than another line, that is just fine, but these fields will not be realigned to form columns.

Bottom line: it is in fact possible to write a script (well, actually more like a program, because you need to work back and forth) so that the output is not CSV but HTML. In HTML you indeed can have these rowspan - and colspan -clauses you want to have, but this output would never translate back into proper CSV. (That is, not by just stripping down the HTML tags. It would take another equally complex program.)

I hope this helps.

bakunin

1 Like

Because one does not know in advance what the rowspan value is, one needs to store at least a bunch of lines in memory.
The following stores all the lines in memory, and prints everything in the END section. (Still it certainly uses less memory than Excel. But there is room for further optimization...)

awk '
BEGIN {
  FS=","
  print "MIME-Version: 1.0"
  print "Content-Type: text/html"
  print "Content-Disposition: inline"
  print "<HTML><BODY><TABLE border=1>"
}
NR==1 {
  nf=NF
  for (i=1; i<=nf; i++)
    printf "<TH>Header %s</TH>", i
  print ""
}
{
  for (i=1; i<=nf; i++)
    if ($i!=lastval) {
      saveval[NR,i]=$i
      lastspan=NR
    } else {
      rowspan[lastspan,i]++
    }
  split($0,lastval)
}
END {
  split("",lastspan)
  for (r=1; r<=NR; r++) {
    printf "<TR>" 
    for (i=1; i<=nf; i++)
      if ((r,i) in rowspan) {
        span=rowspan[r,i]
        printf "<TD rowspan=%s>%s</TD>", span+1, saveval[r,i] 
      } else if (!((i in span) && span--)) {
        printf "<TD>%s</TD>", saveval[r,i] 
      }
    print "</TR>"
  }
  print "</TABLE></BODY></HTML>"
}
' inputfile.csv > inputfile.html
1 Like

Thanks a ton!!! :slight_smile:

Its working perfectly!!

Hello All,

I still need further help on this one. Sometimes the output of the HTML code provided is coming out a bit weird.

Let me post the example below and the output which i was getting and what should be the actual output.

Header1,Header2,Header3,Header4,Header5,Header6,Header6
AAAA,TTTTT,AA-MMM-YYYY,XYZ,1,AA & BB,Reason1
AAAA,TTTTT,BB-MMM-YYYY,UVW,782,AB & BB,Reason1
AAAA,TTTTT,CC-MMM-YYYY,UVW,908,AC & BB,Reason1
AAAA,TTTTT,DD-MMM-YYYY,XYZ,497,AD & BB,Reason1
AAAA,TTTTT,EE-MMM-YYYY,UVW,37,AD & BD,Reason1
AAAA,TTTTT,FF-MMM-YYYY,XYZ,536,AE & BD,Reason1
AAAA,TTTTT,GG-MMM-YYYY,UVW,43,AE & BE,Reason1
AAAA,TTTTT,HH-MMM-YYYY,UVW,1099,AC & BE,Reason1
AAAA,TTTTT,II-MMM-YYYY,UVW,62,AC & DE,Reason1
AAAA,TTTTT,JJ-MMM-YYYY,UVW,54,AC & EE,Reason1
BBBB,TTTTT,AA-MMM-YYYY,UVW,603,AE & EE,Reason1
BBBB,TTTTT,FF-MMM-YYYY,UVW,603,CE & EE,Reason1
BBBB,TTTTT,GG-MMM-YYYY,UVW,553,CE & ED,Reason1
BBBB,TTTTT,JJ-MMM-YYYY,UVW,603,CC & ED,Reason1
CCCC,TTTTT,BB-MMM-YYYY,UVW,164,CC & EB,Reason1
CCCC,TTTTT,KK-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,DD-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,LL-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,FF-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,MM-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,HH-MMM-YYYY,UVW,352,CA & ED,Reason1
CCCC,TTTTT,NN-MMM-YYYY,UVW,262,CC & ED,Reason1
CCCC,TTTTT,JJ-MMM-YYYY,UVW,440,CA & EG,Reason1
DDDD,TTTTT,AA-MMM-YYYY,UVW,1490,DA & EG,Reason1
DDDD,TTTTT,CC-MMM-YYYY,UVW,1490,DA & EC,Reason1
DDDD,TTTTT,EE-MMM-YYYY,UVW,1490,DA & EC,Reason1
DDDD,TTTTT,GG-MMM-YYYY,UVW,1490,DA & EC,Reason1
EEEE,TTTTT,AA-MMM-YYYY,UVW,930,DA & ET,Reason1
EEEE,TTTTT,CC-MMM-YYYY,UVW,930,DA & EG,Reason1
EEEE,TTTTT,EE-MMM-YYYY,UVW,930,DA & EG,Reason1
EEEE,TTTTT,GG-MMM-YYYY,UVW,930,DA & EG,Reason1

HTML Format of the above data is being obtained as below:

MIME-Version: 1.0
Content-Type: text/html
Content-Disposition: inline
<HTML><BODY><TABLE border=1>
<TH>Header 1</TH><TH>Header  2</TH><TH>Header 3</TH><TH>Header  4</TH><TH>Header 5</TH><TH>Header  6</TH><TH>Header 7</TH>
<TR><TD rowspan=10>AAAA</TD><TD  rowspan=31>TTTTT</TD><TD>AA-MMM-YYYY</TD><TD>XYZ</TD><TD>1</TD><TD>AA & BB</TD><TD rowspan=31>Reason1</TD></TR>
<TR><TD>BB-MMM-YYYY</TD><TD  rowspan=2>UVW</TD><TD>782</TD><TD>AB & BB</TD></TR>
<TR><TD>CC-MMM-YYYY</TD><TD>908</TD><TD>AC & BB</TD></TR>
<TR><TD>DD-MMM-YYYY</TD><TD>XYZ</TD><TD>497</TD><TD>AD & BB</TD></TR>
<TR><TD>EE-MMM-YYYY</TD><TD>37</TD><TD>AD & BD</TD></TR>
<TR><TD>FF-MMM-YYYY</TD><TD>536</TD><TD>AE & BD</TD></TR>
<TR><TD>GG-MMM-YYYY</TD><TD  rowspan=25>UVW</TD><TD>43</TD><TD>AE & BE</TD></TR>
<TR><TD>HH-MMM-YYYY</TD><TD>1099</TD><TD>AC & BE</TD></TR>
<TR><TD>II-MMM-YYYY</TD><TD>62</TD><TD>AC & DE</TD></TR>
<TR><TD>JJ-MMM-YYYY</TD><TD>54</TD><TD>AC & EE</TD></TR>
<TR><TD rowspan=4>BBBB</TD><TD>AA-MMM-YYYY</TD><TD  rowspan=2>603</TD><TD>AE & EE</TD></TR>
<TR><TD>FF-MMM-YYYY</TD><TD>AE & BD</TD></TR>
<TR><TD>GG-MMM-YYYY</TD><TD>553</TD><TD>CE & ED</TD></TR>
<TR><TD>JJ-MMM-YYYY</TD><TD>CC & ED</TD></TR>
<TR><TD rowspan=9>CCCC</TD><TD>BB-MMM-YYYY</TD><TD>CC & EB</TD></TR>
<TR><TD>KK-MMM-YYYY</TD><TD  rowspan=5>262</TD><TD rowspan=5>CC & ED</TD></TR>
<TR><TD>DD-MMM-YYYY</TD></TR>
<TR><TD>LL-MMM-YYYY</TD></TR>
<TR><TD>FF-MMM-YYYY</TD></TR>
<TR><TD>MM-MMM-YYYY</TD></TR>
<TR><TD>HH-MMM-YYYY</TD><TD>352</TD><TD>CA & ED</TD></TR>
<TR><TD>NN-MMM-YYYY</TD></TR>
<TR><TD>JJ-MMM-YYYY</TD></TR>
<TR><TD rowspan=4>DDDD</TD><TD>AA-MMM-YYYY</TD><TD  rowspan=4>1490</TD></TR>
<TR><TD>CC-MMM-YYYY</TD><TD rowspan=3>DA & EC</TD></TR>
<TR><TD>EE-MMM-YYYY</TD></TR>
<TR><TD>GG-MMM-YYYY</TD></TR>
<TR><TD rowspan=4>EEEE</TD><TD>AA-MMM-YYYY</TD><TD  rowspan=4>930</TD><TD>DA & ET</TD></TR>
<TR><TD>CC-MMM-YYYY</TD><TD rowspan=3>DA & EG</TD></TR>
<TR><TD>EE-MMM-YYYY</TD></TR>
<TR><TD>GG-MMM-YYYY</TD></TR>
</TABLE></BODY></HTML>

As you observe, the "UVW" part of the below line in the input data is getting missed out some how for next 2 rows resulting in the table showing wrong data altogether.

AAAA,TTTTT,EE-MMM-YYYY,UVW,37,AD & BD,Reason1
AAAA,TTTTT,FF-MMM-YYYY,XYZ,536,AE & BD,Reason1

Please help me!!

---------- Post updated at 12:08 PM ---------- Previous update was at 11:17 AM ----------

Also, basis checking the data i observe the below:

END {
  split("",lastspan)
  for (r=1; r<=NR; r++) {
    printf "<TR>" 
    for (i=1; i<=nf; i++)
      if ((r,i) in rowspan) {
        span=rowspan[r,i]
        printf "<TD rowspan=%s>%s</TD>", span+1, saveval[r,i] 
      } else if (!((i in span) && span--)) {
        printf "<TD>%s</TD>", saveval[r,i] 
      }
    print "</TR>"
  }
  print "</TABLE></BODY></HTML>"
}

In this part of the code, if rowspan value is assigned to a particular HTML cell, next row is being processed fine, but next after next row is having trouble. In other words, every second row after rowspan value is set is causing trouble.
Example:

<TR><TD>BB-MMM-YYYY</TD><TD  rowspan=2>UVW</TD><TD>782</TD><TD>AB & BB</TD></TR>
<TR><TD>CC-MMM-YYYY</TD><TD>908</TD><TD>AC & BB</TD></TR>

In this part rowspan is set to 2. Next row after this one is as below which is getting displayed correctly.

<TR><TD>DD-MMM-YYYY</TD><TD>XYZ</TD><TD>497</TD><TD>AD & BB</TD></TR>

But, the 2 rows after this row is having a problem.

<TR><TD>EE-MMM-YYYY</TD><TD>37</TD><TD>AD & BD</TD></TR>
<TR><TD>FF-MMM-YYYY</TD><TD>536</TD><TD>AE & BD</TD></TR>

I hope you understand.

---------- Post updated 02-01-18 at 11:56 AM ---------- Previous update was 01-31-18 at 12:08 PM ----------

Please help me!

I tried what ever I can with the knowledge i have. But couldnt identify the root cause itself where the code is going wrong.

But understood that as and when rowspan is set, post the completion of rowspan value the next row (after rowspan is complete) is coming fine. But the next row after this is not in the correct format.

Different approach, hopefully simpler:

tac file | awk '
BEGIN   {
         FS=","
#        print "MIME-Version: 1.0"
#        print "Content-Type: text/html"
#        print "Content-Disposition: inline"
         print "</TABLE></BODY></HTML>"
        }

NR > 1  {printf "<TR>" 
         for (i=1; i<=NF; i++)  {if ($i != LAST)     {printf "<TD "
                                                         if (ROWSPAN > 1) printf "rowspan=%s", ROWSPAN
                                                         printf ">%s</TD>", LAST
                                                         ROWSPAN = 0
                                                        }
                                 ROWSPAN++
                                }
         print "</TR>"
        }

        {split ($0, LAST)
        }

NR == 1 {for (i=1; i<=NF; i++) ROWSPAN++
        }

END     {for (i=1; i<=NF; i++)
         printf "<TH>%s</TH>", LAST
         print ""
         print "<HTML><BODY><TABLE border=1>"
        }

' | tac

Pls. test and report back.

EDIT: Or, still a tad simpler,

tac file | awk '
BEGIN   {
         FS=","
#        print "MIME-Version: 1.0"
#        print "Content-Type: text/html"
#        print "Content-Disposition: inline"
         print "</TABLE></BODY></HTML>"
        }

NR > 1  {printf "<TR>" 
         for (i=1; i<=NF; i++)  {if ($i != LAST)     {printf "<TD "
                                                         if (ROWSPAN > 1) printf "rowspan=%s", ROWSPAN
                                                         printf ">%s</TD>", LAST
                                                         ROWSPAN = 0
                                                        }
                                }
         print "</TR>"
        }

        {for (i=split ($0, LAST); i; i--) ROWSPAN++
        }

END     {for (i=1; i<=NF; i++)
         printf "<TH>%s</TH>", LAST
         print ""
         print "<HTML><BODY><TABLE border=1>"
        }

' | tac
1 Like

Hello RudiC,

Thank you for your response. But, when I was trying its throwing a message
"ksh: tac: not found."

---------- Post updated at 12:06 PM ---------- Previous update was at 11:17 AM ----------

I managed to use the below command instead of tac and code posted by you is giving me results as expected.

sed '1!G;h;$!d'

Code that has been tested is the below one:

tac file | awk '
BEGIN   {
         FS=","
#        print "MIME-Version: 1.0"
#        print "Content-Type: text/html"
#        print "Content-Disposition: inline"
         print "</TABLE></BODY></HTML>"
        }

NR > 1  {printf "<TR>" 
         for (i=1; i<=NF; i++)  {if ($i != LAST)     {printf "<TD "
                                                         if (ROWSPAN > 1) printf "rowspan=%s", ROWSPAN
                                                         printf ">%s</TD>", LAST
                                                         ROWSPAN = 0
                                                        }
                                 ROWSPAN++
                                }
         print "</TR>"
        }

        {split ($0, LAST)
        }

NR == 1 {for (i=1; i<=NF; i++) ROWSPAN++
        }

END     {for (i=1; i<=NF; i++)
         printf "<TH>%s</TH>", LAST
         print ""
         print "<HTML><BODY><TABLE border=1>"
        }

' | tac

I shall test for multiple combinations of input and shall update you on the complete test results by tomorrow.

Thank you once again.

Well, tac is the "reverse cat" and I thought it were generally available. If your sed snippet provides the same functionality, all should work well...

1 Like

Version without tac - a bit more intricate, of course, doing everything in memory:

awk -F, '

        {HOLD[NR] = $0
        }

END     {for (nr=NR; nr; nr--)  {nf = split(HOLD[nr], HTMP)
                                 if (nr < NR)   {LTMP = "<TR>" 
                                                 for (i=1; i<=nf; i++)  if (HTMP != LAST) {LTMP = LTMP "<TD "
                                                                                                 if (ROWSPAN > 1) LTMP = LTMP "rowspan=" ROWSPAN
                                                                                                 LTMP = LTMP ">" LAST "</TD>"
                                                                                                 ROWSPAN = 0
                                                                                                }
                                                 LTMP = LTMP "</TR>"
                                                 LINE[nr] = LTMP
                                                }

                                 for (i=split (HOLD[nr], LAST); i; i--) ROWSPAN++
                                }

         print "<HTML><BODY><TABLE border=1>"
         for (i=1; i<=nf; i++) printf "<TH>%s</TH>", LAST
         print ""
         for (i=1; i<NR; i++) print LINE
         print "</TABLE></BODY></HTML>"
        }

'  file
1 Like

Thank you so much for your time RudiC.

All the 3 methods are working perfectly!!:b:

Don't modify posts (as you did in post#11) after someone has replied to or referenced them, as it will confuse people reading the thread later on. If you feel you revealed personal or classified information, change that to meaningless placeholders like "xxxx" or "blablabla".

1 Like

Sure RudiC... I shall not do it next time.

Actually the data which I have given is live data so have to remove it. Anyways i thought sample data is already there in my first post so have removed it.

I will make sure not to do it from next time on.

---------- Post updated at 08:53 AM ---------- Previous update was at 08:25 AM ----------

I have changed the data portion of the example. I hope that is fine.
Please let me know if I still need to change any other details further.

Thanks alot for your time and help.:):b: