HTML Conversion of text file

Hi,

I have following text file. I want to convert it into the below HTML format. Kindly help.

Input Text File

Header 1
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****

Header 2
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****


Header 3
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****

I want to convert the above plain text file to a html file with a table with a header row containing my headers from text file and the rest of the details in the corresponding columns. Here I have "====" as the delimiter for my headers. The ** represents my data which could be several lines of text.

<p>
    <p>
    </p>
</p>
<table border="1" cellpadding="0" width="1184">
    <tbody>
        <tr>
            <td valign="top">
                <pre>Header 1 </pre>
            </td>
            <td valign="top">
                <pre>Header 2</pre>
            </td>
            <td valign="top">
                <pre>Header 3</pre>
            </td>
            <td valign="top">
                <pre>Header 4</pre>
            </td>
        </tr>
        <tr>
            <td valign="top">
                <pre>Name:***</pre>
                <pre>Age:***</pre>
                <pre>Address:***</pre>
                <pre>Work Phone:***</pre>
                <pre>Email:***</pre>
                <pre>Mobile:***</pre>
                <pre>Country:***</pre>
                <pre>City:***</pre>
                <pre>Pin code:***</pre>
                <pre><p> </p></pre>
                <pre>some text here ****</pre>
            </td>
            <td valign="top">
                <pre>Name:***</pre>
                <pre>Age:***</pre>
                <pre>Address:***</pre>
                <pre>Work Phone:***</pre>
                <pre>Email:***</pre>
                <pre>Mobile:***</pre>
                <pre>Country:***</pre>
                <pre>City:***</pre>
                <pre>Pin code:***</pre>
                <pre><p> </p></pre>
                <pre>some text here ****</pre>
            </td>
            <td valign="top">
                <pre>Name:***</pre>
                <pre>Age:***</pre>
                <pre>Address:***</pre>
                <pre>Work Phone:***</pre>
                <pre>Email:***</pre>
                <pre>Mobile:***</pre>
                <pre>Country:***</pre>
                <pre>City:***</pre>
                <pre>Pin code:***</pre>
                <pre><p> </p></pre>
                <pre>some text here ****</pre>
            </td>
            <td valign="top">
                <pre>Name:***</pre>
                <pre>Age:***</pre>
                <pre>Address:***</pre>
                <pre>Work Phone:***</pre>
                <pre>Email:***</pre>
                <pre>Mobile:***</pre>
                <pre>Country:***</pre>
                <pre>City:***</pre>
                <pre>Pin code:***</pre>
                <pre><p> </p></pre>
                <pre>some text here ****</pre>
            </td>
        </tr>
    </tbody>
</table>

Do you already have some code/script you've started with? Where exactly are you stuck in your implementation?

If you are familiar with XSLT (Extensible Stylesheet Language Transformations), you can very easily convert your document into XHTML.

I tried out awk, but I am not able to get the proper output. :o

awk 'BEGIN{print "<table> <table border=2>"} {print "<tr>";for(i=1;i<=NF;i++)print "<td>" $i"</td>";print "</tr>"} END{print "</table>"} inputtext.txt >htmloutput.html
akshay@nio:/tmp$ cat file
Header 1
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****

Header 2
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****


Header 3
=======
Name:***
Age:***
Address:***
Work Phone:***
Email:***
Mobile:***
Country:***
City:***
Pincode:***

some text here ****
akshay@nio:/tmp$ cat table.awk

function dothis( j){
	if(headerfound)
	{
		
		for(j=1;j<=i-1;j++)
		{
			c = "\t<pre>" td[j] "</pre>"
			out[ph] = ( ph in out) ? out[ph] "\n" c  : c
		}
		delete td; i = 0
	}
}
/=======/{
		h = p
		dothis()
		headerfound = 1 
		next
}

headerfound{
	td[++i] = $0
}

{
	p = $0; 
	ph = h
}
END{ 
	dothis() 
	for(i in out)
	{
		c = "\t<td><pre>" i "</pre></td>"
		header = length(header) ? header "\n" c : c

		c = "<td>\n" out "\n</td>"
		other = length(other) ? other "\n" c : c
	}

	

	print "<table>\n<tbody>\n<tr>\n" header "\n</tr>\n<tr>\n" other "\n</tr>\n</tbody>\n</table>"
}
akshay@nio:/tmp$ awk -f table.awk file
<table>
<tbody>
<tr>
	<td><pre>Header 1</pre></td>
	<td><pre>Header 2</pre></td>
	<td><pre>Header 3</pre></td>
</tr>
<tr>
<td>
	<pre>Name:***</pre>
	<pre>Age:***</pre>
	<pre>Address:***</pre>
	<pre>Work Phone:***</pre>
	<pre>Email:***</pre>
	<pre>Mobile:***</pre>
	<pre>Country:***</pre>
	<pre>City:***</pre>
	<pre>Pincode:***</pre>
	<pre></pre>
	<pre>some text here ****</pre>
	<pre></pre>
</td>
<td>
	<pre>Name:***</pre>
	<pre>Age:***</pre>
	<pre>Address:***</pre>
	<pre>Work Phone:***</pre>
	<pre>Email:***</pre>
	<pre>Mobile:***</pre>
	<pre>Country:***</pre>
	<pre>City:***</pre>
	<pre>Pincode:***</pre>
	<pre></pre>
	<pre>some text here ****</pre>
	<pre></pre>
	<pre></pre>
</td>
<td>
	<pre>Name:***</pre>
	<pre>Age:***</pre>
	<pre>Address:***</pre>
	<pre>Work Phone:***</pre>
	<pre>Email:***</pre>
	<pre>Mobile:***</pre>
	<pre>Country:***</pre>
	<pre>City:***</pre>
	<pre>Pincode:***</pre>
	<pre></pre>
</td>
</tr>
</tbody>
</table>

----edit----

Adding attributes and styles like border, width, etc I left to you, modify print statement as per your requirement

1 Like

Only one word : Awesome!!

This is an excellent piece of work Akshay. Hats off! The output is exactly what I was looking for. Thanks a ton for your timely help. I really appreciate the helping attitude of experts out here.

This is quite embarrassing:o , I am again n trouble. I need to separate the "some text here ***" to a separate row. I tried the following but its not working. :frowning:

END{ 
	dothis() 
	for(i in out)
	{
		c = "\t<td><pre>" i "</pre></td>"
		header = length(header) ? header "\n" c : c


                if ( out !~ /^some text here/ ) {
                        c = "<td>\n" out "\n</td>"
                        other1 = length(other1) ? other1 "\n" c : c
                }

		c = "<td>\n" out "\n</td>"
		other = length(other) ? other "\n" c : c
	}

	
	print "<table>\n<tbody>\n<tr>\n" header "\n</tr>\n<tr>\n" other "\n</tr>\n</tbody>\n</table>"
}

The following output is what I trying to achieve.

BEGIN { L=0; H=0 }

/=======/ {
        H++;
        INDATA=1;
        L=0;
        next
}

# Titles in T[H]
{       T[H]=$0;        }

# Lines in D[X,line], line count in D[X,0]
INDATA {        D[H-1,++L]=$0   ;       D[H-1,0]=L      }

END {
        # Remove trailing blank lines, find some text here appendices
        for(N=0; N<H; N++)
        {
                do { D[N,0]--; } while(D[N,D[N,0]] == "");

                if(D[N,D[N,0]-1] == "")
                {
                        SOME[N]=D[N,D[N,0]];
                        D[N,0] -= 2;
                }
        }

        printf("<table>\n<trbody>\n<tr>\n");
        for(N=0; N<H; N++) print "\t<td><pre>" T[N] "</pre></td>";

        print "</tr>\n<tr>";

        for(N=0; N<H; N++)
        {
                print "<td>";
                for(M=1; M<=D[N,0]; M++) print "\t<pre>" D[N,M] "</pre>";
                print "</td>";
        }

        print "</tr>\n<tr>";

        for(N=0; N<H; N++)
        {
                print "\t<td><pre>" SOME[N] "</pre></td>"
        }

        printf("</tr>\n</tbody>\n</table>\n");
}
<table>
<trbody>
<tr>
        <td><pre>Header 1</pre></td>
        <td><pre>Header 2</pre></td>
        <td><pre>Header 3</pre></td>
</tr>
<tr>
<td>
        <pre>Name:***</pre>
        <pre>Age:***</pre>
        <pre>Address:***</pre>
        <pre>Work Phone:***</pre>
        <pre>Email:***</pre>
        <pre>Mobile:***</pre>
        <pre>Country:***</pre>
        <pre>City:***</pre>
        <pre>Pincode:***</pre>
</td>
<td>
        <pre>Name:***</pre>
        <pre>Age:***</pre>
        <pre>Address:***</pre>
        <pre>Work Phone:***</pre>
        <pre>Email:***</pre>
        <pre>Mobile:***</pre>
        <pre>Country:***</pre>
        <pre>City:***</pre>
        <pre>Pincode:***</pre>
</td>
<td>
        <pre>Name:***</pre>
        <pre>Age:***</pre>
        <pre>Address:***</pre>
        <pre>Work Phone:***</pre>
        <pre>Email:***</pre>
        <pre>Mobile:***</pre>
        <pre>Country:***</pre>
        <pre>City:***</pre>
        <pre>Pincode:***</pre>
</td>
</tr>
<tr>
        <td><pre>some text here ****</pre></td>
        <td><pre>some text here ****</pre></td>
        <td><pre></pre></td>
</tr>
</tbody>
</table>
1 Like

Thanks for the really quick reply, Corona. I have several lines in the "some text here" paragraph. Number of lines in that may vary. The above code is putting only the last line into the row. The only thing constant is my number of lines before "some text here" (before blank line) ie 9. Probably what I need is to pick up 9 lines after header and put it into a row and rest of the text it into another row.

Hi.

I like the flexibility of awk, but I find that the code tends to be very specific, and not as general as I would like -- as General George Patton said I don't like to pay for the same real estate twice. meaning that I don't like to solve similar problems all over again.

So I look for tools that can help more generally. In this case, I found one that transforms text into HTML, and it knows how to recognize tables: txt2html.

However, the input is structured in blocks vertically:

a
b
c

and I think it is more useful to have them appear horizontallly:

a b c

and that's an easy job (in this situation) for csplit to create a number of files, and then paste to align them side-by-side. Then we can augment the pieces with sed to conform to one of the table type formats.

Here is the script and result:

#!/usr/bin/env bash

# @(#) s3	Demonstrate transform text to table, txt2html
# t2t: http://www.scholnick.net/t2t
# txt2tags: http://txt2tags.sourceforge.net

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit paste txt2html

FILE=${1-data2}

pl " Sample of input data file $FILE, sizes of split files:"
head -3 $FILE ; tail -3 $FILE

# Remove debris, split data into separate files, combine side-to-side
rm -f xx*
csplit -z -k $FILE '/^Header/' {*}
paste xx* > f1

pl " Results, txt2html (adding markup):"
( pe ; pe ) > f2
sed 's/^/| /;s/	/ | /g;s/$/|/' f1 >> f2
( pe ; pe ) >> f2
rm -f f2.html
txt2html --make_tables f2 > f2.html
ls -lgG f2.html

rm -f xx*
exit 0

producing:

./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
csplit (GNU coreutils) 6.10
paste (GNU coreutils) 6.10
txt2html /usr/bin/txt2html version: 2.51

-----
 Sample of input data file data2, sizes of split files:
Header 1
=======
Name:***

Text line 1
Text line 2
175
152
140

-----
 Results, txt2html (adding markup):
-rw-r--r-- 1 1323 Jul 24 11:37 f2.html

See the attachment f2.html and man pages.

That was fairly straight-forward. If one can use perl, then a host of other possibilities arise. The approach can be similar, with statements storing data in rows, but more custimization can be done. I find that perl code is far more likely to be generalizable, and so can handle option processing, which is not a strength of awk (although it can be done).

Best wishes ... cheers, drl

1 Like

In the future, please post representative data. Or better yet, if possible, real data. Simplified data only requires simplified programs.

1 Like