Format text file to html

Hi Experts,

Anybody out there figure out on how to achieve in shell scripts or tools. I have done googling to find solutions but no luck.
I have thousands of .txt files to batch process, please see the below sample text content after -------- start here --------. What I want to achieve is to insert html tags to format my content and rename .txt to html.

PS: and heres the tags: <p align="center"> </p> and <p align="justify"> </p>

Thanks

sample snip plain Text File:

tags should insert from start and end of paragraph. insert Tags:

and html files should look exactly like this:

Sample inside .txt files

Republic of the Country



Country Name v. Rodrigo, 19 C.N. 120494 (14875721)


Country Name v. Rodrigo

120494 C.N. 14875721


ON DIVISION OF OPINION OF THE JUDGES

OF THE DISTRICT COURT OF COUTRHR

Uniquerow


0The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near 
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.

Page 120494 C.N. 14875721

1The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.

Page 120494 C.N. 14875722

2The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.

Page 120494 C.N. 14875723

This will looks like this:


<p align="center"> Republic of the Country </p>



<p align="center"> Country Name v. Rodrigo, 19 C.N. 120494 (14875721) </p>


<p align="center"> Country Name v. Rodrigo </p>

<p align="center"> 120494 C.N. 14875721 </p>


<p align="center"> ON DIVISION OF OPINION OF THE JUDGES </p>

<p align="justify"> OF THE DISTRICT COURT OF COUTRHR </p>

<p align="justify"> Uniquerow </p>


<p align="justify"> 0The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near 
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. </p>

<p align="justify"> Page 120494 C.N. 14875721 </p>

<p align="justify"> 1The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river.
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. </p>

<p align="justify"> Page 120494 C.N. 14875722 </p>

<p align="justify"> 2The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. </p>

<p align="justify"> Page 120494 C.N. 14875723 </p>

What have you done so far? Do you have an example of what your text looks like afterwards? And please wrap your example in code tags. External links are frowned upon unless there solely for extra information.

When to apply "center" and when "justify"?

1 Like

"justify" apply start @ "Uniquerow" from my sample data including the next row before my "Uniquerow" down to each paragraph.

"center" apply to other left paragraph

<p align="justify"> OF THE DISTRICT COURT OF COUTRHR </p>

<p align="justify"> Uniquerow </p>
.............
.............
<p align="justify"> 2The Quick Brown fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
fox jumps over the lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. 
he lazy dog near the bank of the river. The Quick Brown fox jumps over the lazy dog near the bank of the river. </p>

<p align="justify"> Page 120494 C.N. 14875723 </p>

Try

awk '
/Uniquerow/     {L=1}
                {sub(/^/, "<p align=\"" (L?"justify":"center") "\">")
                 sub (/$/, "</p>")
                 }
1
' RS= ORS="\n\n" file
1 Like

So much thanks you real life saver, BTW when I run the script I notice

<p align="center"> OF THE DISTRICT COURT OF COUTRHR </p>

instead of

<p align="justify"> OF THE DISTRICT COURT OF COUTRHR </p>

is their another process run the scripts?

You could replace the /Uniquerow/ match by e.g. a /COUTRHR/ match, but I assume this would not work with the other thousands of files.

It's not that easy to apply a selection / decision to a line before. You could

  • use a cyclic buffer but need to modify the to-be-printed element just before printout
  • reverse print the file with tac, apply the changes, and reverse with tac again

---------- Post updated at 16:39 ---------- Previous update was at 16:36 ----------

Try

tac file | awk '
/Unique/        {T=NR+2}
NR == T         {L=1}
                {sub(/^/, "<p align=\"" (L?"center":"justify") "\">")
                 sub (/$/, "</p>")
                }
1
' RS= ORS="\n\n" | tac
1 Like

Thank you for your kind now code is working, one more thing if I make all the row with "center tag" to bold
with this tag

<strong></strong>

where can I put this tag? then the output will look like this:

<p align="center"> <strong>Republic of the Country </strong></p>

<p align="center"> <strong>Country Name v. Rodrigo, 19 C.N. 120494 (14875721) </strong></p>

<p align="center"> <strong>Country Name v. Rodrigo </strong></p>

<p align="center"> <strong>120494 C.N. 14875721 </strong></p>

<p align="center"> <strong>ON DIVISION OF OPINION OF THE JUDGES </strong></p>

Any idea from your side?

---------- Post updated 02-07-15 at 00:02 ---------- Previous update was 01-07-15 at 23:56 ----------

However, try

tac file | awk '     
/Unique/        {T=NR+2}
NR == T         {L=1}
                {sub(/^/, "<p align=\"" (L?"center\"><strong>":"justify\">"))
                 sub (/$/, (L?"</strong>":"")"</p>")
                }
1
' RS= ORS="\n\n" | tac
1 Like

Thank you so much

awk 'BEGIN { print "<div align=center>" }
{ gsub(/&/,"&"); gsub(/</,"<") }
/^$/ { print; last = "blank"; next }
last == "blank" || NR == 1 {
  last = ""
  printf " <p>"
}
/THE DISTRICT COURT OF COUTRHR/ { print "</div>\n<div align=justify>\n <p>" $0; next }
1
END { print "</div>" }
' "$file"
1 Like