help with ksh/awk/sed script, random # of fields

axo959 · April 17, 2009, 4:41pm

Hello all, I'm working on an attendance callout script for a school district. I need to change our current layout for the vendor. Currently the data is in the form of:

studentid,period,building,

Heres a sample of some made up records:

500,1,30,
500,2,30,
500,3,30,
500,6,30,
7899,2,31,
9021,1,33,
9021,6,33,
907711,5,40,
907711,6,40,

I need to reformat this, omitting the header row, to look like the following:

500,1,2,3,6,30,
7899,2,31,
9021,1,6,33,
907711,5,6,40,

I've done stuff like this with awk in the past when the number of fields were a constant. Just not sure how to loop through this when the number of fields vary. The kids can have an unexcused in any number of periods throughout the day. Some of our buildings have over 9 periods configured for attendance purposes as well.

If anyone has some code to get me going in the right direction that would be great.

I will read the file first then do something like this to compare the current record with the next record to verify its the same kid. (beyond that I'm not sure how to proceed)

awk -F, '
$1 == lastid {
         do something
}
$1 != lastid {
        lastid=$1
        do something
}'

Thanks,

awk · April 17, 2009, 5:52pm

awk -F, -v OFS=, '{ for (I=2; I<NF; I++)
                    {print $1, $I}
                  }' <<END |\
sort -kn1,1 -kn2,2 -t, -u |\
awk -F, 'NR==1{Save0=$1}
         Save0 == $1{Line=Line "," $2; next}
         Save0 != $1 {print Save0  Line;
                      Line="," $2;
                      Save0=$1;
                     }
         END{print Save0  Line;}'
500,1,30,
500,2,30,
500,3,30,
500,6,30,
7899,2,31,
9021,1,33,
9021,6,33,
907711,5,40,
907711,6,40,

Produced

500,1,2,3,6,30
7899,2,31
9021,1,6,33
907711,5,6,40

axo959 · April 17, 2009, 6:00pm

Thanks a lot awk, I'll get to give your code a go on Monday first thing.

Franklin52 · April 18, 2009, 8:51am

Assuming your data is sorted:

awk -F, '
NR==1{s=$1 FS $2; b=$3; next}
b!=$3{print s FS b; s=$1 FS $2; b=$3; next}
{s=s FS $2}
END{print s FS b}' file

Regards

summer_cherry · April 20, 2009, 4:38am

nawk -F"," '{
if (_[$1","$3]==""){
  _[$1","$3]=$2
  next
}
_[$1","$3]=sprintf("%s,%s",_[$1","$3],$2)
}
END{
for (i in _){
 split(i,arr,",")
 print arr[1]","_,","arr[2]
} 
}' filename

axo959 · April 21, 2009, 12:44pm

awk:

awk -F, -v OFS=, '{ for (I=2; I<NF; I++)
   {print $1, $I}
   }' <<END |\
sort -kn1,1 -kn2,2 -t, -u |\
awk -F, 'NR==1{Save0=$1}
   Save0 == $1{Line=Line "," $2; next}
   Save0 != $1 {print Save0  Line;
   Line="," $2;
   Save0=$1;
   }
   END{print Save0  Line;}'
500,1,30,
500,2,30,
500,3,30,
500,6,30,
7899,2,31,
9021,1,33,
9021,6,33,
907711,5,40,
907711,6,40,

Thanks awk, works great for me. I was wondering what the <<END is in the first awk command? Is it like a here doc? I'm trying to understand this better to figure out what order the code is being executed in.

thanks again

awk · April 21, 2009, 12:48pm

Ideally, I should have had the word END on a line by itself after the data. It shows the korn shell where to stop the input of data to the awk program.

Still the script terminated at that point, the shell script figures out the END is implied.

Sorry, I will remember to put it in future posts.

axo959 · April 21, 2009, 1:08pm

heck, np at all. i'm just a newb so the only samples i've seen are pretty much in two books i have. thanks for getting back to me so fast.