Reading data from file using awk

kristinu · February 27, 2013, 12:01pm

I have a file as below. It contains two data sets separated by >.
I want to pipe each data set to another program called psxy. How
can I get the different records

Have started doing as follows but it only passes the first data set

awk 'BEGIN {RS=">"};{print $0}' p.dat

cat p.dat

1 0.1
2 1.6
3 1.3
4 1.5
>
5 0.1
6 1.6
7 1.3
8 1.5

alister · February 27, 2013, 12:10pm

Are you just trying to delete the line with ">"? If not, post what that sample data should look like after it's been prepared for psxy.

Regards,
Alister

kristinu · February 27, 2013, 12:17pm

I do not want to delete >. I do not want to create 2 files and passing each one to psxy separately.

Suppose I have 2 files

cat p1.dat
1 0.1
2 1.6
3 1.3
4 1.5

cat p2.dat

5 0.1
6 1.6
7 1.3
8 1.5

Then calling psxy in this way

psxy p1.dat ...
psxy p2.dat ...

As I do not want to end up with lot of files, I put the data in one file separating the data using >.

alister · February 27, 2013, 12:59pm

Replace r with the record number to be extracted:

awk 'BEGIN {n=1}; n>r {exit}; $0==">" {++n; next}; n==r' r=2 file

For modern AWK's which support RS regular expressions:

awk 'NR>r {exit}; NR==r {printf "%s", $0}' RS='>\n' r=2 file

If portability is a factor, use the first suggestion.

Regards,
Alister

Don_Cragun · February 27, 2013, 4:02pm

kristinu, does psxy accept - as a filename indicating that it should read from standard input?
If it does, you can execute psxy inside an awk script feeding it the portions of your input file between the ( > ) separator line without creating intermediate files.

kristinu · February 27, 2013, 7:17pm

I have done like and works good

awk 'NR>r {exit}; NR==r {printf "%s", $0}' RS='>\n' r=2 p.dat \
  | psxy -JX4/4 -R0.3/0.6/0/1.2 -B0.1f0.05:"":/a0.2f0.1:"y":/a0.2f0.1:."":WSne \
         -m -K > p.ps

Don, how can I do as you say. Currently I have to loop through the records one by one changing the value of r.

alister · February 27, 2013, 9:01pm

Redirect the awk printf statement to a pipe. Make sure to call close() after each print or all records will be sent to the same instance of psxy.

Regards,
Alister

Don_Cragun · February 28, 2013, 1:17am

kristinu:

I have done like and works good
awk 'NR>r {exit}; NR==r {printf "%s", $0}' RS='>\n' r=2 p.dat \
  | psxy -JX4/4 -R0.3/0.6/0/1.2 -B0.1f0.05:"":/a0.2f0.1:"y":/a0.2f0.1:."":WSne \
   -m -K > p.ps
Don, how can I do as you say. Currently I have to loop through the records one by one changing the value of r.

kristinu,
Sorry for the delay in responding. Hopefully, alister's comments were enough to explain what I was suggesting. If not, the following should work. My script is a little more complex than your original script because the awk I'm using doesn't allow multiple characters in RS. So, you can simplify it on your system, but the following is more portable to other systems.

awk 'NR == 1 || $1 == ">" {
        if(r++) close(cmd)
        cmd = sprintf("%s %s > p%03d.ps", "psxy -JX4/4 -R0.3/0.6/0/1.2",
                "-B0.1f0.05:\"\":/a0.2f0.1:\"y\":/a0.2f0.1:.\"\":WSne -m -K", r)
        if(NR > 1) next
}
{       print | cmd
}' p.dat

If you wanted to run this on a Solaris/SunOS system, you'd need to use /usr/xpg4/bin/awk or nawk instead of awk .

Note that this script produces files p001.ps , p002.ps , etc. If you have less than 100 sections or more than 1000 sections in p.dat , you could change the %03d in the sprintf() format string to specify a different number of digits in the output file names.

Hope this helps,
Don

kristinu · February 28, 2013, 7:45am

I want to append the psxy to the same ps file

Don_Cragun · February 28, 2013, 10:49am

OK. So try:

rm p.ps 2> /dev/null
awk 'NR == 1 || $1 == ">" {
        if(cmd != "") close(cmd)
        cmd = sprintf("%s %s >> p.ps", "psxy -JX4/4 -R0.3/0.6/0/1.2",
                "-B0.1f0.05:\"\":/a0.2f0.1:\"y\":/a0.2f0.1:.\"\":WSne -m -K")
        if(NR > 1) next
}
{       print | cmd
}' p.dat

Delete the rm command if you want this to add to an existing p.ps file rather than creating a new one for each p.dat file.

alister · February 28, 2013, 6:30pm

don cragun:

rm p.ps 2> /dev/null
awk 'NR == 1 || $1 == ">" {
   if(cmd != "") close(cmd)
   cmd = sprintf("%s %s >> p.ps", "psxy -JX4/4 -R0.3/0.6/0/1.2",
   "-B0.1f0.05:\"\":/a0.2f0.1:\"y\":/a0.2f0.1:.\"\":WSne -m -K")
   if(NR > 1) next
}
{       print | cmd
}' p.dat

I have not tested it, but it looks like that code will send a ">" into the pipe if the first "dataset" is empty. Also, if a "dataset" is empty, should psxy be invoked and immediately read EOF? If so, then the above is incorrect in that respect as well.

The problem statement was insufficiently precise to make those determinations, so neither case may be of concern. Just an observation.

Regards,
Alister

kristinu · February 28, 2013, 7:42pm

I am going to go with a simple loop in bash, moving by record number

Don_Cragun · February 28, 2013, 7:50pm

Thanks Alister,
I considered both of these issues, but the idea of graphing zero points didn't make any sense to me. So, I didn't worry about those cases. However, I should have stated my assumptions when I posted my proposal.

Don