Is there a way to read in a two-columned CSV file, and based on the fields in 1st column, output many different files? The input/output looks something like:
input.csv:
call Call Mom.
call Call T-Mobile.
go Go home.
go Go to school.
go Go to gas station.
play Play music.
play Play Beatles.
outputs 3 files:
call.xml
<value><tokens><token>Call</token><token>Mom</token></tokens></value>
<value><tokens><token>Call</token><token>T-Mobile</token></tokens></value>
go.xml
<value><tokens><token>Go</token><token>home</token></tokens></value>
<value><tokens><token>Go</token><token>to</token><token>school</token></tokens></value>
<value><tokens><token>Go</token><token>to</token><token>gas</token><token>station</token></tokens></value>
play.xml
<value><tokens><token>Play</token><token>music</token></tokens></value>
<value><tokens><token>Play</token><token>Beatles</token></tokens></value>
I'm stuck at the part of checking which items in 1st column are the same, then saving all those identical items along with their rows into a new list?? Is this possible to do with shell scripts, or would I need to use Python?
rdrtx1
December 5, 2012, 3:54pm
2
try:
while read f l
do
printf "<value>" >> $f.xml
for w in ${l%[.]}
do
printf "<token>$w</token>" >> $f.xml
done
printf "</value>\n" >> $f.xml
done < input
1 Like
Yoda
December 5, 2012, 4:05pm
3
sort input.csv | awk ' { print $1 } ' | uniq | while read file
do
awk -v FL=$file '$1==FL {
for (i=2;i<=NF;i++) {
if(i==NF) printf "%s", "<tokens>"$i"</tokens></value>\n";
if(i==2) printf "%s", "<value><tokens>"$i"</tokens>"
if(i!=NF && i!=2) printf "%s", "<tokens>"$i"</tokens>";
}
}' input.csv > $file.xml
done
What if you wanted to add a line at the beginning and end of file?
call.xml
<head><body>
<value><tokens><token>Call</token><token>Mom</token></tokens></value>
<value><tokens><token>Call</token><token>T-Mobile</token></tokens></value>
</head></body>
Yoda
December 6, 2012, 8:16pm
5
sort input.csv | awk ' { print $1 } ' | uniq | while read file
do
echo "<head><body>" > $file.xml
awk -v FL=$file '$1==FL {
for (i=2;i<=NF;i++) {
if(i==NF) printf "%s", "<tokens>"$i"</tokens></value>\n";
if(i==2) printf "%s", "<value><tokens>"$i"</tokens>"
if(i!=NF && i!=2) printf "%s", "<tokens>"$i"</tokens>";
}
}' input.csv >> $file.xml
echo "</body></head>" >> $file.xml
done