Rows into columns?

I have a file thats space delimited that looks something like this:

Joe Smith jsmith 43234 bill1;bill2;read;read2;schedule
Andy Summers asummers 11232 bill1;read
Beth McConnel bmconnel 43443 read;read2;schedule;bill
Susan Fowler sfowler 09332 bill1;read;schedule

I need to transpose/manupilate the rows to look like:

Joe Smith jsmith 43234 bill1
Joe Smith jsmith 43234 bill2
Joe Smith jsmith 43234 read
Joe Smith jsmith 43234 read2
Joe Smith jsmith 43234 schedule
Andy Summers asummers 11232 bill1
Andy Summers asummers 11232 read
Beth McConnel bmconnel 43443 read
Beth McConnel bmconnel 43443 read2
Beth McConnel bmconnel 43443 schedule
Beth McConnel bmconnel 43443 bill
Susan Fowler sfowler 09332 bill1
Susan Fowler sfowler 09332 read
Susan Fowler sfowler 09332 schedule

I guess the good thing is that the row I need to transpose will always start in column 5 (user function) and that the items I need to transpose are semi colon delmited, but its varaible, sometimes there is only 1 entry, other times there could be as many as 7.

I am able to do elementary transpostions like making 1;2;3;4 into

1
2
3
4

But I am having a hard time capturing the preceeding info and repeating it per user function. Can anyone provide any hints or suggestions?

awk '{n=split($5,a,";");for(i=1;i<=n;i++){print $1,$2,$3,$4,a}}' infile > outfile
1 Like

I'm sorry to keep clogging people's posts. I've been practicing trying to find pure shell (or close to) solutions to problems, so while this may not be the most practical solution for your needs, I'd appreciate if anyone could tell me if there's a better way I could have solved this guys problem than what I came up with:

(23:43:55\[D@DeCoBox15)
[~]$ cat input
Joe Smith jsmith 43234 bill1;bill2;read;read2;schedule
Andy Summers asummers 11232 bill1;read
Beth McConnel bmconnel 43443 read;read2;schedule;bill
Susan Fowler sfowler 09332 bill1;read;schedule

(01:16:30\[D@DeCoBox15)
[~]$ cat mysolution
#!/bin/bash
    tr ";" "\n" < $1 | while read line; do
        if [[ $(echo $line|wc -w) -gt 1 ]]; then
            array=($line);
            echo "${array[*]}";
        else
            echo "${array[0]} ${array[1]} ${array[2]} ${array[3]} $line";
        fi;
    done

(01:16:54\[D@DeCoBox15)
[~]$ ./mysolution input
Joe Smith jsmith 43234 bill1
Joe Smith jsmith 43234 bill2
Joe Smith jsmith 43234 read
Joe Smith jsmith 43234 read2
Joe Smith jsmith 43234 schedule
Andy Summers asummers 11232 bill1
Andy Summers asummers 11232 read
Beth McConnel bmconnel 43443 read
Beth McConnel bmconnel 43443 read2
Beth McConnel bmconnel 43443 schedule
Beth McConnel bmconnel 43443 bill
Susan Fowler sfowler 09332 bill1
Susan Fowler sfowler 09332 read
Susan Fowler sfowler 09332 schedule

I think I could have eliminated the tr if I changed the IFS and tweaked the way I set the names, but this is the best I could come up with.

Damn, after trying out Bartus's solution, I just feel silly now...

[~]$ time awk '{n=split($5,a,";");for(i=1;i<=n;i++){print $1,$2,$3,$4,a}}' pxe > /dev/null

real    0m0.088s
user    0m0.000s
sys     0m0.031s

(23:49:50\[D@DeCoBox15)
[~]$ time ./tst > /dev/null

real    0m2.338s
user    0m0.391s
sys     0m0.516s

This one probably won't win any performance benchmark, but it should produce the desired output.

while read line; do
firstpart=$(echo "$line" | sed 's/^\(.*[0-9]\{5\}\).*/\1/')
records=$(echo "$line" | awk '{print $NF }' | sed 's/;/ /g')
for i in $(echo $records); do
echo $firstpart $i
done
done <file

Ok...I just don't get it. Your loop involves creating multiple variables with multiple calls to sed and awk, and my loop only has wc -w in it.

Is 6 calls of wc -w really slower than 4 calls of awk and 8 calls of sed?

(01:21:43\[D@DeCoBox15)
[~]$ time ./mysolution input
Joe Smith jsmith 43234 bill1
Joe Smith jsmith 43234 bill2
Joe Smith jsmith 43234 read
Joe Smith jsmith 43234 read2
Joe Smith jsmith 43234 schedule
Andy Summers asummers 11232 bill1
Andy Summers asummers 11232 read
Beth McConnel bmconnel 43443 read
Beth McConnel bmconnel 43443 read2
Beth McConnel bmconnel 43443 schedule
Beth McConnel bmconnel 43443 bill
Susan Fowler sfowler 09332 bill1
Susan Fowler sfowler 09332 read
Susan Fowler sfowler 09332 schedule

real    0m2.092s
user    0m0.272s
sys     0m0.468s


(00:50:43\[D@DeCoBox15)
[~]$ time ./yoursolution input
Joe Smith jsmith 43234 bill1
Joe Smith jsmith 43234 bill2
Joe Smith jsmith 43234 read
Joe Smith jsmith 43234 read2
Joe Smith jsmith 43234 schedule
Andy Summers asummers 11232 bill1
Andy Summers asummers 11232 read
Beth McConnel bmconnel 43443 read
Beth McConnel bmconnel 43443 read2
Beth McConnel bmconnel 43443 schedule
Beth McConnel bmconnel 43443 bill
Susan Fowler sfowler 09332 bill1
Susan Fowler sfowler 09332 read
Susan Fowler sfowler 09332 schedule

real    0m1.622s
user    0m0.151s
sys     0m0.394s

And on top of that your solution doesn't really give correct results :stuck_out_tongue: At least from the output you are showing here.

Well I'll be a son of a something else...fixed it.

(01:18:49\[D@DeCoBox15)
[~]$ ./mysolution input > my_results

(01:19:05\[D@DeCoBox15)
[~]$ sdiff expected_results my_results
Joe Smith jsmith 43234 bill1                                    Joe Smith jsmith 43234 bill1
Joe Smith jsmith 43234 bill2                                    Joe Smith jsmith 43234 bill2
Joe Smith jsmith 43234 read                                     Joe Smith jsmith 43234 read
Joe Smith jsmith 43234 read2                                    Joe Smith jsmith 43234 read2
Joe Smith jsmith 43234 schedule                                 Joe Smith jsmith 43234 schedule
Andy Summers asummers 11232 bill1                               Andy Summers asummers 11232 bill1
Andy Summers asummers 11232 read                                Andy Summers asummers 11232 read
Beth McConnel bmconnel 43443 read                               Beth McConnel bmconnel 43443 read
Beth McConnel bmconnel 43443 read2                              Beth McConnel bmconnel 43443 read2
Beth McConnel bmconnel 43443 schedule                           Beth McConnel bmconnel 43443 schedule
Beth McConnel bmconnel 43443 bill                               Beth McConnel bmconnel 43443 bill
Susan Fowler sfowler 09332 bill1                                Susan Fowler sfowler 09332 bill1
Susan Fowler sfowler 09332 read                                 Susan Fowler sfowler 09332 read
Susan Fowler sfowler 09332 schedule                             Susan Fowler sfowler 09332 schedule

(01:19:15\[D@DeCoBox15)
[~]$
1 Like

Wow, I was definitely going down the wrong path. I never thought it could be created with a one liner. After some minor adjustments to the input file and command, it works perfect!! Thanks!! :b::b:

Just shorter :wink:

awk '{n=split($NF,a,";");for(i=0;++i<=n;){$NF=a;print}}' file