Create html <ui> <li> by parsing text file

Hi you all,
this is my first post in this forum. I'm italian (please forgive me) :slight_smile: so my english will fail to be correct...

Anyway, let's get straight to the point!
I have a text file like this:

  ,,,,
  Disney: 00961-002,,,,
 ,Pippo: 00531-002,,,
 ,,Pluto: 00238-002,,
 ,,Paperino: 00177-002,,
 ,,,Paperina: 00121-002,
 ,,,,Minnie: 00193-002
 ,,,Paperone: 00434-002,
 ,,,Paperoga: 00956-002,
 ,,Gambadilegno: 00715-002,,
 ,Topolino: 000078-002,,,
 ,,Basettoni: 00000-002,,
 ,,Clarabella: 00163-002,,
 ,,Paperinik: 00511-002,,
  ,,,Orazio: 00133-002,

The number of commas on the first line indicates the number of levels, those before the item indicate its depth, those after the item may be useless. The number after ":" is not relevant and should not be considered.

Well, I would like to parse it with a shell script (on Debian 8) to get as the following:

  <ul><li><span class="tf-nc">Disney</span>
       <ul><li><span class="tf-nc">Pippo</span>            
           <ul><li><span class="tf-nc">Pluto</span></li>
             <li><span class="tf-nc">Paperino</span>
               <ul> <li><span class="tf-nc">Paperina</span>
                   <ul><li><span class="tf-nc">Minnie</span></li>
                   </ul><li><span class="tf-nc">Paperone</span>
                 <li><span class="tf-nc">Paperoga</span>
                 </li></ul></li></li><li><span class="tf-nc">Gambadilegno</span></li>
           </ul></li><li><span class="tf-nc">Topolino</span>
           <ul><li><span class="tf-nc">Basettoni</span></li>
             <li><span class="tf-nc">Clarabella</span>
             <li><span class="tf-nc">Paperinik</span>             
              <ul><li><span class="tf-nc">Orazio</span></li>
              </ul></li></ul></li></ul></li></ul>

This Html code was written by hand, just to example, but in the real world, the file can contains many many items and many many �levels�
The final result will be, in a web browser, something like the picture you can see here: albertocortesi dot it / output.jpg

Is there anyone who could give me a little help?

Hi and Welcome,

UNIX.com is not a 'script writing service'. We are here to help you write your own scripts, but you must do your own work and show your on efforts and attempts.

What have you tried? What error messages did you get? What output did you see when you made your own attempt to parse this text?

                    Hi Neo, thanks for the welcome!

Yes, it's pretty obvious that the forum is not a scripting service, and, I assure you, it's not what I need.

I'm completely new to scripting, and I know practically nothing, but I know other programming languages ����and conceptually I know what I have to do to get the result I want.

What I don't know is where to start, for example, I have no idea how to parse the strings in the file.

I was able to loop through the lines of the file and display them on the console, but for now, nothing more.

#!/bin/bash
input=$1
while IFS= read -r line
do
  echo "$line"
 done < "$input"

The next step would be to parse, string by string, to understand how many commas there are before the part to be extracted and to use this value to start creating the <ul><li> structure.

After several attempts I was able to split the individual strings of the file in this way:

#!/bin/bash
input=$1
declare -a FirstStep
N=0

while IFS= read -r line
do
 Res="$(cut -d':' -f1 <<< $line)"
 #echo $Res
 if [ $N -gt 0  ] 
 then 
 FirstStep[N]=$Res
 fi
 let N++
done < "$input"

for i in "${FirstStep[@]}"
do
   :   
   echo $i
done

But now I'm completely stuck and I don't know how to go on. I gladly accept clues or suggestions.

1 Like

The first suggestion I would offer is to always format your HTML (or code) before doing any analysis.

Doing a quick cut, paste and format in Visual Studio Code:

<ul>
    <li><span class="tf-nc">Disney</span>
        <ul>
            <li><span class="tf-nc">Pippo</span>
                <ul>
                    <li><span class="tf-nc">Pluto</span></li>
                    <li><span class="tf-nc">Paperino</span>
                        <ul>
                            <li><span class="tf-nc">Paperina</span>
                                <ul>
                                    <li><span class="tf-nc">Minnie</span></li>
                                </ul>
                                <li><span class="tf-nc">Paperone</span>
                                    <li><span class="tf-nc">Paperoga</span>
                                    </li>
                        </ul>
                        </li>
                        </li>
                        <li><span class="tf-nc">Gambadilegno</span></li>
                </ul>
                </li>
                <li><span class="tf-nc">Topolino</span>
                    <ul>
                        <li><span class="tf-nc">Basettoni</span></li>
                        <li><span class="tf-nc">Clarabella</span>
                            <li><span class="tf-nc">Paperinik</span>
                                <ul>
                                    <li><span class="tf-nc">Orazio</span></li>
                                </ul>
                            </li>
                    </ul>
                    </li>
        </ul>
        </li>
</ul>

It's a lot easier to see what you are trying to do when formatted properly (easy on the eyes).

Seem to me, at quick glance, your HTML is not correct.

Perhaps use some tool like Visual Studio Code (it's is free, works just fine, well supported) to edit your HTML and make sure your HTML is correct?

In other words, it's a bit early to write a parser to format the HTML when your HTML is not correct in your model / desired HTML, blah blah ....

Yes Neo,
my code works, but it's really ugly! :slight_smile:

It was written by hand just to give an idea. The formatting code make clearer the future steps. Tnx!

I stiil working on basics of scripting, May I need some time to learn.