Need to parse file "x" lines at a time ... awk array?

STN · January 27, 2010, 10:26am

I have files that store multiple data points for the same device "vertically" and include multiple devices. It repeats a consistant pattern of lines where for each line:

Column 1 is a common number for the entire file and all devices in that file
Column 2 is a unique device number
Column 3 is a unique identifier of the data point included on that line
Column 4 is the unique data point

# cat myfile.csv
x,y1,a,name1
x,y1,b,2.5
x,y1,c,4
x,y2,a,name2
x,y2,b,3
x,y2,c,5.5
x,y3,a,name3
x,y3,b,1
x,y3,c,2

So above I have three devices (y1, y2 and y3) that all have three data points (a, b and c). One of the data points is a unique name, so I can discard $1,$2,$3 and I only want to retain $4. What I want to do is flatten the three data points into a single line per device:

name1,2.5,4
name2,3,5.5
name3,1,2

I have found a way to take a given set of lines, awk print $4 and insert them on the same line

# sed -n 1,3p myfile.csv | awk -F"," '{print $4","}' | tr -d '\n'
name1,2.5,4

But I need a loop to continue processing the next "x" lines.

Above is a simple view of what I'm trying to do. My files have 53 data points for every device. And the number of devices is "random". Therefore my loop that "ingests" 53 lines at a time and then spits them out on a single line needs to continue until the file is complete (do ; done < $1 ?). For example one file is 312,912 lines (5,904 devices x 53 data points) another is 318,000 lines (6,000 devices x 53 data points). Using sed I can do what I need to do on the first 53 lines of the file, but now I just need to insert it into a loop.

 
# sed -n 1,53p myfile.csv | awk -F"," '{print $4","}' | tr -d '\n'

Any help would be greatly appreciated.

Signed,

Sleepless in Seattle

jgt · January 27, 2010, 12:55pm

You could use 'split' to split the file into multiple files of 53 lines each, and then concatenate the results back into a single file.

Scrutinizer · January 27, 2010, 1:25pm

Try this:

awk -F, '{if($2==p)d=d","$4;else{print d;d=$4;p=$2}}END{print d}' infile

Franklin52 · January 27, 2010, 1:38pm

If the order of the lines are the same:

awk -F, '{printf("%s%s", $4, NR%3?FS:"\n")}' file

STN · January 27, 2010, 3:00pm

That kicked out a syntax error on me. Could be the version of awk I have (Sol10).

That worked like a charm! If I interpret this a bit you did not imply anything regarding 53 lines, but instead told it to read all the lines while $2 is constant. Once $2 changes then dump line and start over. Correct?

Franklin52 · January 27, 2010, 3:24pm

Use nawk or /usr/xpg4/bin/awk on Solaris.

Scrutinizer · January 27, 2010, 3:28pm

Exactly

STN · January 27, 2010, 3:51pm

Ok, I will try that.

Scrutinizer, thank you.