I have a file which has some thousand records in the following format
File: input.txt ->
<option value="14333">VISWANADH VELAMURI</option>
<option value="17020">VISWANADHA RAMA KRISHNA</option>
I want to generate a csv file from the above file as follows
File: output.txt ->
14333,VISWANADH VELAMURI
17020,VISWANADHA RAMA KRISHNA
The HTML option tags are to be removed alongwith the unwanted tabs and the empty lines in between. I have tried cut, awk, but I am not getting the correct combination. Can you please help me out, as I want to upload this data into a database table.
Thanks.
If you have Python, here's an alternative:
import re
for line in open("inputfile"):
print ','.join(re.findall(r'<.*value=\"(.*)\">(.*)<.*?>',line)[0])
from command line:
#/home: python script.py > output.csv
With GNU awk/nawk:
awk '$0=$0{printf "%s,%s\n",$2,$3}' \
FS="<option value=\"|\">|</option>" infile
#! /opt/third-party/bin/perl
open (FILE, "< $ARGV[0] ") || die "Unable to open $ARGV[0] <$!> \n";
my(@split_fields, @second_split, @further);
while( chomp($_ = <FILE> ) ) {
@split_fields = split(/"/, $_);
@second_split = split(/>/, $_);
@further = split(/</, $second_split[1]);
print "$split_fields[1],$further[0]\n";
}
close(FILE);
exit 0
one more,
sed 's/\(.*\)\"\(.*\)\"\(.*\)>\(.*\)<\(.*\)/\2,\4/' filename