Split file based on multi valued attributes

snukala · January 31, 2011, 6:42am

Hi,
I am new to shell scripting.
I have a file which has multi valued attributes. I wanted to split it so that there will be no muliti valued attributes.
Example file:

"attr1","amv1;amv2;3","bmv1;bmv2","abc","abc1;abc2;abc3"

Plz note this is CSV file and ; is the delimiter for multi valued attributes.

The out put shoud be:

"attr1","amv","bmv1","abc","abc1"
"attr1","amv1","bmv1","abc","abc2"
"attr1","amv1","bmv1","abc","abc3"

"attr1","amv1","bmv2","abc","abc1"
"attr1","amv1","bmv2","abc","abc2"
"attr1","amv1","bmv2","abc","abc3" etc

Please help.

homeboy · January 31, 2011, 7:59am

are you sure there is always 5 columns in your file?if not,my poor algorithm won't work because it needs to know how many for loops should be "hard encoded".Here is my code,hope to see more graceful solution.

i=0
awk 'BEGIN{RS=","}{print $1}' txt | sed 's/;/ /g' >> a.txt
while read str;do
	eval array${i}=$str
	let "i += 1"
done < a.txt

for a in $array0;do
	for b in $array1;do
		for c in $array2;do
			for d in $array3;do
				for e in $array4;do
					echo $a,$b,$c,$d,$e
				done
			done
		done
	done
done

snukala · January 31, 2011, 9:19am

Hi,
Thanks for responding.
The later part of the script,ie, the section under for seems to be failing.
I used the following script.

#!/bin/awk -f
i=0
awk 'BEGIN{RS=",";FS="\n"}{print $1}' a.txt | sed 's/;/ /g' >> b.txt
while read str;do
    eval array${i}=$str
    i=`expr $i + 1`
    echo $i
done < b.txt
for a in $array0;do
        for b in $array1;do
                for c in $array2;do
                        for d in $array3;do
                                for e in $array4;do
                                        for f in $array5;do
                                        echo $a,$b,$c,$d,$e,$f
                                done
                        done
                done
          done
     done
done

could you plz suggest why is it failing in the or loop.

Regards

homeboy · January 31, 2011, 9:05pm

you enforce the awk to explain this scripting by using "#! /bin/awk -f".This scripting has a for loop,it's a shell style loop,not awk.It should be #! /bin/sh.

malcomex999 · February 1, 2011, 2:36am

awk -F, -v m=\" '{c2=split($2,col2,";");c3=split($3,col3,";");c5=split($5,col5,";")
for(i=0;++i<=c2;)
 for(j=0;++j<=c3;)
  for(k=0;++k<=c5;)
    print $1,col2 m,col3[j] m,$4,col5[k] m}' OFS="," infile

Use nawk on solaris...

snukala · February 1, 2011, 4:50am

Hi malcomex999 and homeboy,
Thank you very much for your support.
It is working.

Regards
Subhash Nukala

snukala · February 4, 2011, 7:22am

Hi malcomex999
Can you please let me know how i can do the reverse.
I mean the following into one line .

"attr1","amv","bmv1","abc","abc1"
"attr1","amv1","bmv1","abc","abc2"
"attr1","amv1","bmv1","abc","abc3"

"attr1","amv1","bmv2","abc","abc1"
"attr1","amv1","bmv2","abc","abc2"
"attr1","amv1","bmv2","abc","abc3"

To single line like:

"attr1","amv,av1","bmv1,bmv2","abc","abc1,abc2,abc3"

Thanks and Regards

malcomex999 · February 6, 2011, 1:16am

I don't know why you want to convert it to single line again but this might help you though it's lame and you might need to find a way to get rid of the trailing comma in col2,col3 and col5(may be using sed).

awk -F, -v m=\" 'NF{gsub(m,x,$0)
 col1=$1;col2[$2];col3[$3];col4=$4;col5[$5]}END{
   printf m col1 m OFS m
 for(c2 in col2)
   printf c2 OFS
    printf m OFS m
 for(c3 in col3)
   printf c3 OFS
    printf m col4 m OFS m
 for(c5 in col5)
   printf c5 OFS
    printf m "\n"
}' OFS=, infile