For eg. From the 1st line I would like to retrieve abc placed between: txt>abc</a>
I have used the following command but as you can see that number of letters in the word keeps changing abc, abc01, abc045, cdf, Manhattan.
awk -F\/ '{print substr($4,0,3)}' list.html
So this command is getting the output for only the 3 letter word. However I want to extract the same information (abc01, abc045, cdf, Manhattan) from all the lines in the HTML code. Please help.
I Just ran this but it is giving me no output. Just blank lines. This HTML file is having 5 lines and when I run the command you mentioned I am just getting 5 blank lines.
Does the HTML actually look like the data you pasted, or did you pretty it up? Many times when XML/HTML comes up, 5 "lines" is later found to mean tags not necessarily organized into lines at all.