Read n lines from a text files getting n from within the text file

malandisa · September 12, 2013, 8:16pm

I dont even have a sample script cause I dont know where to start from. My data lookes like this

> sat#16 #data: 15 site:UNZA baseline:  205.9151
  0.008   -165.2465     35.8109     40.6685     21.9148    121.1446     26.4629    -18.4976     33.8722
  0.017   -165.2243     48.2201     40.6908     21.9058    120.8975     26.4179    -18.4794     33.8951
  0.025   -165.1857     41.5258     40.7293     21.9056    120.6503     26.3729    -18.4611     33.9179
  0.033   -165.1438     33.9471     40.7713     21.9072    120.4031     26.3278    -18.4427     33.9407
  0.042   -165.1106     40.8982     40.8045     21.9039    120.1559     26.2826    -18.4242     33.9635
  0.050   -165.0913     38.5970     40.8238     21.8932    119.9087     26.2375    -18.4056     33.9862
  0.058   -165.0968     43.7509     40.8183     21.8692    119.6616     26.1922    -18.3868     34.0089
  0.067   -165.0654     40.7461     40.8497     21.8649    119.4144     26.1470    -18.3680     34.0316
  0.075   -165.0817     46.2518     40.8334     21.8350    119.1672     26.1016    -18.3490     34.0542
  0.083   -165.0412     40.4513     40.8739     21.8355    118.9200     26.0563    -18.3299     34.0768
  0.092   -165.0533     44.7589     40.8617     21.8078    118.6728     26.0108    -18.3107     34.0994
  0.100   -165.0425     43.8650     40.8726     21.7924    118.4256     25.9653    -18.2913     34.1219
  0.108   -165.0378     39.0059     40.8773     21.7736    118.1784     25.9198    -18.2719     34.1444
  0.117   -165.0049     44.6828     40.9102     21.7699    117.9312     25.8741    -18.2523     34.1668
> sat#19 #data: 6 site:UNZA baseline:  235.0682
 17.900   -171.3423     62.8450     63.7259     26.5775    325.4868     15.0105     -7.3923     22.8404
 17.908   -171.5057     55.5991     63.5625     26.6247    325.4057     15.2054     -7.4669     22.8738
 17.917   -171.7347     53.3075     63.3335     26.6444    325.3236     15.4003     -7.5407     22.9067
 17.925   -172.0204     57.9098     63.0478     26.6402    325.2407     15.5952     -7.6138     22.9393
 17.933   -172.2141     50.7400     62.8541     26.6748    325.1568     15.7902     -7.6860     22.9714
 17.942   -172.4885     48.0680     62.5797     26.6751    325.0720     15.9851     -7.7575     23.0031
> sat# 1 #data: 8 site:UNZA baseline:  225.5519
  0.008   -148.7547     75.6631     76.7972     65.5288     89.3165     56.3463    -15.3538     30.6348
  0.017   -148.7668     73.5426     76.7851     65.5199     89.7731     56.3486    -15.3715     30.6350
  0.025   -148.7836     80.0087     76.7683     65.5057     90.2296     56.3488    -15.3892     30.6352
  0.033   -148.7821     76.5570     76.7698     65.5058     90.6859     56.3469    -15.4069     30.6354
  0.042   -148.7916     78.7440     76.7603     65.4952     91.1420     56.3430    -15.4246     30.6356
  0.050   -148.8006     74.6552     76.7513     65.4836     91.5978     56.3370    -15.4423     30.6359
  0.058   -148.8115     73.8469     76.7404     65.4692     92.0532     56.3289    -15.4600     30.6362
  0.067   -148.8211     75.2542     76.7308     65.4545     92.5083     56.3188    -15.4777     30.6365

and so on. I need the sript to read the first line with text and use the #data value to determine how many lines to read below this text line, then read those lines of data, then read the next line with text and use the #data value to count how namy lines to read below that and so on.

To be honext I have no idea where to start from. And honestly I am not completely new to shell scripting but this has challenged me to the core. This is not an assignment, its for my rresearch work am a seniour lecturer in the department of physics at the university of zambia in lusaka, zambia. I hope it is doable.

The read data can be directed to a gnuplot ot output to a file.

Thank you in advance

Chubler_XL · September 12, 2013, 8:38pm

Try this:

awk -F: '/#data/ { R=NR+$2} NR<=R' infile

Don_Cragun · September 12, 2013, 10:22pm

Note that there are only 14 lines following the 1st line, but the 1st line contains #data: 15 ; is a line missing in your sample, or is the XXX value in the #data: XXX segments of the headers unreliable? (Note that since the header lines all start with a ">" and the data lines only contain floaing point values, the count in the header can be ignored.)

The counts in the other two header lines seem to match the following data.

Once you have read the data lines, what do you want to do with them?

Chubler_XL · September 12, 2013, 10:34pm

Good points Don Cragun, my assumption was that the value in the data line was less than then number of lines and any additional lines need to be discarded.

In my solution, any processing could be done in the NR <=R block:

awk -F: '/#data/ { R=NR+$2} NR<=R { "Any processing code here"; print }' infile

Don_Cragun · September 12, 2013, 11:56pm

chubler_xl:

Good points Don Cragun, my assumption was that the value in the data line was less than then number of lines and any additional lines need to be discarded.

In my solution, any processing could be done in the NR <=R block:
awk -F: '/#data/ { R=NR+$2} NR<=R { "Any processing code here"; print }' infile

Understood.

The O.P. said the output "can be directed to a gnuplot ot (sic) output to a file.", but didn't specify any options for gnuplot and didn't specify filenames nor whether the headers should be included in the files or stripped from the files. There is no gnuplot man page in the Man Pages section of this forum, but some references I found said that it is very picky about having its input in tab separated fields (and the data lines given in the sample input have no tabs). Should all data be sent to one instantiation of gnuplot or should the data under each header be sent to a different instantiation of gnuplot?

I don't think I can do much more until we get a clarification on the requirements.

malandisa · September 13, 2013, 8:35am

Thank you vry much for all your advices, I am now working on ta script to use this method and I will report back just now to let you know the result. I want to plot the data once I have read it. and since I want to plot in different colors, for each block so I will direct my output to a gnuplot script, I hope that works.

I will let you know the result!

Thank you

---------- Post updated at 08:30 AM ---------- Previous update was at 07:17 AM ----------

Good afternoon,

Somehow I am not able to get this to do what I want to do.

I am trying to do is to read the value after #data, that should be the number of data lines in that block, so I should be able to read that number of lines under that header and sent that to a file which I will then call in gnuplot and plot that particular data.
The Final output is a plot of column 1 vs column 7 all the blocks on the same plot. But since the column 1 is not in a continuous sequence, I need to plot each block of data separately and overplot on the same plot.

Here is the intedend output

Either the data for each block is sent to a seperate file like this
file 1

  0.008   -165.2465     35.8109     40.6685     21.9148    121.1446     26.4629    -18.4976     33.8722   0.017   -165.2243     48.2201     40.6908     21.9058    120.8975     26.4179    -18.4794     33.8951   0.025   -165.1857     41.5258     40.7293     21.9056    120.6503     26.3729    -18.4611     33.9179   0.033   -165.1438     33.9471     40.7713     21.9072    120.4031     26.3278    -18.4427     33.9407   0.042   -165.1106     40.8982     40.8045     21.9039    120.1559     26.2826    -18.4242     33.9635   0.050   -165.0913     38.5970     40.8238     21.8932    119.9087     26.2375    -18.4056     33.9862   0.058   -165.0968     43.7509     40.8183     21.8692    119.6616     26.1922    -18.3868     34.0089   0.067   -165.0654     40.7461     40.8497     21.8649    119.4144     26.1470    -18.3680     34.0316   0.075   -165.0817     46.2518     40.8334     21.8350    119.1672     26.1016    -18.3490     34.0542   0.083   -165.0412     40.4513     40.8739     21.8355    118.9200     26.0563    -18.3299     34.0768   0.092   -165.0533     44.7589     40.8617     21.8078    118.6728     26.0108    -18.3107     34.0994   0.100   -165.0425     43.8650     40.8726     21.7924    118.4256     25.9653    -18.2913     34.1219   0.108   -165.0378     39.0059     40.8773     21.7736    118.1784     25.9198    -18.2719     34.1444   0.117   -165.0049     44.6828     40.9102     21.7699    117.9312     25.8741    -18.2523     34.1668

file 2

 17.900   -171.3423     62.8450     63.7259     26.5775    325.4868     15.0105     -7.3923     22.8404  17.908   -171.5057     55.5991     63.5625     26.6247    325.4057     15.2054     -7.4669     22.8738  17.917   -171.7347     53.3075     63.3335     26.6444    325.3236     15.4003     -7.5407     22.9067  17.925   -172.0204     57.9098     63.0478     26.6402    325.2407     15.5952     -7.6138     22.9393  17.933   -172.2141     50.7400     62.8541     26.6748    325.1568     15.7902     -7.6860     22.9714  17.942   -172.4885     48.0680     62.5797     26.6751    325.0720     15.9851     -7.7575     23.0031

file 3

 0.008   -148.7547     75.6631      76.7972     65.5288     89.3165     56.3463    -15.3538     30.6348   0.017   -148.7668     73.5426     76.7851     65.5199     89.7731     56.3486    -15.3715     30.6350   0.025   -148.7836     80.0087     76.7683     65.5057     90.2296     56.3488    -15.3892     30.6352   0.033   -148.7821     76.5570     76.7698     65.5058     90.6859     56.3469    -15.4069     30.6354   0.042   -148.7916     78.7440     76.7603     65.4952     91.1420     56.3430    -15.4246     30.6356   0.050   -148.8006     74.6552     76.7513     65.4836     91.5978     56.3370    -15.4423     30.6359   0.058   -148.8115     73.8469     76.7404     65.4692     92.0532     56.3289    -15.4600     30.6362   0.067   -148.8211     75.2542     76.7308     65.4545     92.5083     56.3188    -15.4777     30.6365

so that I can then use gnuplot to plot column 1 vs colomn 7 for each file on the same plot.

If instead of sending the data to a file I can send it directly to gnuplot and produce one plot with lines for each block of data then that would be the perfect result I am looking for.

For example I have put this line into a script

#! /bin/bash
#
awk -F: '/#data/ { R=NR+$2} NR<=R { print $1, $7}' UNZA2250.txt > data.out

I expected that data.out should contain the data but without the headers for each block and thats not what is happening.

Thank you all the the time and assistance.... its much appreciated.

Regards

---------- Post updated at 08:35 AM ---------- Previous update was at 08:30 AM ----------

Sorry the output data didnt come out correctly in my post above here is the disired outpot

file 1

  0.008   -165.2465     35.8109     40.6685     21.9148    121.1446     26.4629    -18.4976     33.8722
  0.017   -165.2243     48.2201     40.6908     21.9058    120.8975     26.4179    -18.4794     33.8951
  0.025   -165.1857     41.5258     40.7293     21.9056    120.6503     26.3729    -18.4611     33.9179
  0.033   -165.1438     33.9471     40.7713     21.9072    120.4031     26.3278    -18.4427     33.9407
  0.042   -165.1106     40.8982     40.8045     21.9039    120.1559     26.2826    -18.4242     33.9635
  0.050   -165.0913     38.5970     40.8238     21.8932    119.9087     26.2375    -18.4056     33.9862
  0.058   -165.0968     43.7509     40.8183     21.8692    119.6616     26.1922    -18.3868     34.0089
  0.067   -165.0654     40.7461     40.8497     21.8649    119.4144     26.1470    -18.3680     34.0316
  0.075   -165.0817     46.2518     40.8334     21.8350    119.1672     26.1016    -18.3490     34.0542
  0.083   -165.0412     40.4513     40.8739     21.8355    118.9200     26.0563    -18.3299     34.0768
  0.092   -165.0533     44.7589     40.8617     21.8078    118.6728     26.0108    -18.3107     34.0994
  0.100   -165.0425     43.8650     40.8726     21.7924    118.4256     25.9653    -18.2913     34.1219
  0.108   -165.0378     39.0059     40.8773     21.7736    118.1784     25.9198    -18.2719     34.1444
  0.117   -165.0049     44.6828     40.9102     21.7699    117.9312     25.8741    -18.2523     34.1668

file 2

 17.900   -171.3423     62.8450     63.7259     26.5775    325.4868     15.0105     -7.3923     22.8404
 17.908   -171.5057     55.5991     63.5625     26.6247    325.4057     15.2054     -7.4669     22.8738
 17.917   -171.7347     53.3075     63.3335     26.6444    325.3236     15.4003     -7.5407     22.9067
 17.925   -172.0204     57.9098     63.0478     26.6402    325.2407     15.5952     -7.6138     22.9393
 17.933   -172.2141     50.7400     62.8541     26.6748    325.1568     15.7902     -7.6860     22.9714
 17.942   -172.4885     48.0680     62.5797     26.6751    325.0720     15.9851     -7.7575     23.0031

file 3

  0.008   -148.7547     75.6631     76.7972     65.5288     89.3165     56.3463    -15.3538     30.6348
  0.017   -148.7668     73.5426     76.7851     65.5199     89.7731     56.3486    -15.3715     30.6350
  0.025   -148.7836     80.0087     76.7683     65.5057     90.2296     56.3488    -15.3892     30.6352
  0.033   -148.7821     76.5570     76.7698     65.5058     90.6859     56.3469    -15.4069     30.6354
  0.042   -148.7916     78.7440     76.7603     65.4952     91.1420     56.3430    -15.4246     30.6356
  0.050   -148.8006     74.6552     76.7513     65.4836     91.5978     56.3370    -15.4423     30.6359
  0.058   -148.8115     73.8469     76.7404     65.4692     92.0532     56.3289    -15.4600     30.6362
  0.067   -148.8211     75.2542     76.7308     65.4545     92.5083     56.3188    -15.4777     30.6365

Don_Cragun · September 13, 2013, 2:50pm

I repeat: "Note that there are only 14 lines following the 1st line, but the 1st line contains #data: 15 ; ". The script you're using assumes that the 15 in the 1st line of your input file is correct and your output is wrong because that 15 should be 14 for the data in your input file.

If your input data is invalid, you will not get valid output. Or more commonly: GIGO (Garbage in; garbage out).

You can get the output you said want from that input with the following awk script (although I STRONGLY suggest that you not put a space in your filenames):

awk '
$1 == ">" {
        if(f) close(f)
        f="file " ++n
        next
}
{       print > f
}' UNZA2250.txt

malandisa · September 13, 2013, 3:06pm

Don! Thank you.

This works, it does the job for sure, and yes you are right I had a mistake in the data for the first block, its suppossed to be 14 lines in the first block, and the header is supposed to be

#data: 14

I truly appreciate your assistance.

Thank you

malandisa · September 13, 2013, 5:42pm

Thank you again everyone especially Don for your assistance. This has worked perfectly, and finally here is my script that plots the data the way I want

#! /bin/bash
# 
# read each block and output to a file
awk '
$1 == ">" {
        if(f) close(f)
        f="file" ++n".txt"
        next
}
{       print > f
}' UNZA2250.txt
# count the # of block to plot
nx=`awk '{c+=gsub(s,s)}END{print c}' s='>' UNZA2250.txt`
echo $nx
# --------------------------------------
gnuplot << EOF
# set size 1.7,1.3
set terminal postscript eps enhanced color "Helvetica" 30
set output "fig.eps"
set ylabel "TEC"
set xlabel "hour"
set xrange [0:24]
set yrange [0:120]
set nokey

filename(n) = sprintf("file%d.txt", n)
plot for  filename(i) using 1:4 with lines
EOF
# view the output image
gv fig.eps &
# 
# remove the text files used for ploting
rm file*.txt

The data file is attached, the output figure is what I am looking for.

If there is a better and/or easier way to achieve this, I would love to learn. Especially if there can be a way to use the awk within gnuplot without having to output the data to the files as I have done here.

Thank you again