Generate csv file

rahulrathod · January 23, 2007, 8:09am

I have a file which has some thousand records in the following format

File: input.txt ->

&lt;option value="14333"&gt;VISWANADH VELAMURI&lt;/option&gt;	

&lt;option value="17020"&gt;VISWANADHA RAMA KRISHNA&lt;/option&gt;

I want to generate a csv file from the above file as follows

File: output.txt ->

14333,VISWANADH VELAMURI
17020,VISWANADHA RAMA KRISHNA

The HTML option tags are to be removed alongwith the unwanted tabs and the empty lines in between. I have tried cut, awk, but I am not getting the correct combination. Can you please help me out, as I want to upload this data into a database table.

Thanks.

ghostdog74 · January 23, 2007, 8:39am

If you have Python, here's an alternative:

import re
for line in open("inputfile"):
     print ','.join(re.findall(r'<.*value=\"(.*)\">(.*)<.*?>',line)[0])

from command line:

#/home: python script.py > output.csv

radoulov · January 23, 2007, 8:52am

With GNU awk/nawk:

awk '$0=$0{printf "%s,%s\n",$2,$3}' \
FS="<option value=\"|\">|</option>" infile

matrixmadhan · January 23, 2007, 9:31am

#! /opt/third-party/bin/perl

open (FILE, "< $ARGV[0] ") || die "Unable to open $ARGV[0] <$!> \n";

my(@split_fields, @second_split, @further);

while( chomp($_ = <FILE> ) ) {
  @split_fields = split(/"/, $_);
  @second_split = split(/>/, $_);
  @further = split(/</, $second_split[1]);
  print "$split_fields[1],$further[0]\n";
}

close(FILE);

exit 0

matrixmadhan · January 23, 2007, 9:34am

one more,

sed 's/\(.*\)\"\(.*\)\"\(.*\)>\(.*\)<\(.*\)/\2,\4/'  filename