awk -F "[<>]" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt
I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns.
My attempt is the above script and an output.txt is created but it is empty.
The desired output is:
http://geneticslab.emory.edu/tests/MM021 Autism Spectrum Disorders 61
http://geneticslab.emory.edu/tests/MM250 Brain Malformations 50
Thank you :).