formatting data file with awk or sed

lego · March 25, 2010, 7:15pm

Hi,

I have a (quite large) data file which looks like:
_____________
header part..
more header part..
x1 x2 x3 x4 x5 x6
x7 x8 x9 x10 x11 x12
x13 ...
... x59 x60
y1 y2 y3 y4...
... y100
______________

where x1, x2,...,x60 and y1, y2,...y100 are numbers of 10 digits (so each line contains 10x6 numbers +5 spaces: 65 characters).
The header spans 80 lines. The real data starts at line 81.
I would like to have an output like this:
______________
x1 y1
x1 y2
x1 y3
x1 y4
...

x2 y1
x2 y2
x2 y3
...
...

x60 y98
x60 y99
x60 y100
______________

Can anybody tell me how can I get it? Maybe using sed, awk, or perl?
Any help would be much appreciated!

rdcwayx · March 25, 2010, 9:16pm

awk 'NR==FNR && NR>=91 {for (i=1;i<=NF;i++) {y[++j]=$i}} 
     NR>FNR && FNR>80 && FNR<91 {for (k=1;k<=NF;k++) {for (x=1;x<=j;x++) print $k,y[x]}}' urfile urfile

durden_tyler · March 25, 2010, 9:55pm

One way to do it with Perl -

perl -ne 'chomp; if (/^\d+/ && $.<=80){$i++<=9 ? push @x,split/ /,$_ : push @y,split/ /,$_} END{foreach $i(@x){foreach $j(@y){print "$i\t$j\n"}}}' yourfile

tyler_durden

---------- Post updated at 09:55 PM ---------- Previous update was at 09:45 PM ----------

Here's the test on a dummy file with similar structure, on my system:

$ 
$ cat -n data.txt
     1  header line 1
     2  header line 2
     3  header line 3
     4  123 456
     5  901 234
     6  000 111
     7  666 777
     8  334
     9  real data line 1
    10  real data line 2
$ 
$ perl -ne 'chomp; if (/^\d+/ && $.<=8){$i++<=1 ? push @x,split/ /,$_ : push @y,split/ /,$_}
            END {foreach $i(@x){foreach $j(@y){print "$i\t$j\n"}}}' data.txt
123     000
123     111
123     666
123     777
123     334
456     000
456     111
456     666
456     777
456     334
901     000
901     111
901     666
901     777
901     334
234     000
234     111
234     666
234     777
234     334
$ 
$

Line nos. 4 and 5 consist of x values (123, 456, 901, 234).
Line nos. 6, 7 and 8 consist of y values (000, 111, 666, 777, 334).
Line no. 8 in my file corresponds to line no. 80 in yours.
I test for $i++ <= 1 because only the first two matching lines contain x values. You'd test for $i++ <= 9 because the first 10 matching lines contain x values in your file.

HTH,
tyler_durden

lego · March 26, 2010, 6:51am

Thanks for your replies, but I couldn't get it to work with my file..
I attach here the file I need to work with (data.txt), and I'll try to explain what I exactly want.
I have a file which has this structure:

What I need is this output file:

In the attached file, the header spans 14 columns. From lines 15 to 24 (inclusive) there are the X values. From 25 to 41 (and also 42 to 58) there are the Y values.
From line 61 to 70, there are the first 60 P values, from 71 to 80 the first E values, ...
From line 101 to 110, there are the values of P61, P62, ....

In total, there are 60 different values for X, 100 different values for Y, and 6000 different values for P,E,D, and F.

I hope I have explained my problem clearly.
Please help me!

lego · March 26, 2010, 11:29am

Can someone help me please?

lego · March 26, 2010, 10:11pm

I finally got the solution!
After googling a lot and with your help, I solved my problem.

I attach here ("phoenics2gnuplot.txt") the solution. An example input file is like the file ("data.txt") that I attached in two previous posts.
Thanks!