Reversing numbers in a datafile of rows and columns

mattings · April 8, 2009, 9:57am

Hello,

I've tried searching the forum for an answer to my question, but without any luck...

I have a datafile looking simplified as follows:
01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24

I want to reverse it by rearranging all the numbers from last to first, so that it looks like this:
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 09 08 07
06 05 04 03 02 01

I've tried using tac:

> tac --separator=" " filename
24
23 22 21 20 18
19 17 16 15 14 12
13 11 10 09 08 06
07 05 04 03 02 01

This does not give me the result I want. Can anyone explain why?

Maybe I should use sed or awk, or something else? Can anyone help me with this one?

vgersh99 · April 8, 2009, 10:07am

nawk -f invert.awk myFile

invert.awk:

{
  for(i=1; i<=NF;i++)
    arr[FNR,i]=$i
  nf=NF
  fnr=FNR
}
END {
  for(i=fnr; i;i--)
    for(j=nf; j;j--)
      printf("%s%c", arr[i,j], (j==1)?ORS:OFS)
}

ghostdog74 · April 8, 2009, 11:22am

# tac file | while read line;do echo $line|tr " " "\n"|sort -rn|tr "\n" " ";echo; done
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 09 08 07
06 05 04 03 02 01

drl · April 8, 2009, 12:23pm

Hi.

The original tac reverses tokens separated by spaces. The last token on each line is "number-followed-by-newline". When the tokens are reversed, the last token is "24-newline", and so on.

The solutions posted are fine. Here is a compact perl solution:

#!/usr/bin/env bash

# @(#) s1       Demonstrate reversal of symbols on lines (perl).

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) tac perl
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Original results (trailing newline added):"
tac --separator=" " $FILE
echo

echo
echo " Re-written results:"
tac $FILE |
perl -n -e 'chomp;print join(" ",reverse(split(/ /))),"\n";'

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.11-x1, i686
Distribution        : Xandros Desktop 3.0.3 Business
GNU bash 2.05b.0
tac (coreutils) 5.2.1
perl 5.8.4

 Data file data1:
01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24

 Original results (trailing newline added):
24
23 22 21 20 18
19 17 16 15 14 12
13 11 10 09 08 06
07 05 04 03 02 01

 Re-written results:
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 09 08 07
06 05 04 03 02 01

The heart of the operation: after the lines are reversed by tac, process them with a perl script. The easy-to-read-but-not-so-easy-to-understand perl code stacks a number of functions: split the line into an array of tokens, reverse the content of the array (and thus the tokens), join the array with a space as separator into a string, print the string with a newline at the end. The "-n" says to do that series of operations for each line ... cheers, drl

mattings · April 14, 2009, 4:15am

vgersh99:

nawk -f invert.awk myFile

invert.awk:

{
  for(i=1; i<=NF;i++)
   arr[FNR,i]=$i
  nf=NF
  fnr=FNR
}
END {
  for(i=fnr; i;i--)
   for(j=nf; j;j--)
   printf("%s%c", arr[i,j], (j==1)?ORS:OFS)
}

This code does exactly the job I was looking for. Thanks a lot!
But I noticed a little problem when I tried it on one of my datafiles.

The datafile looks simplified as:
01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33

When I used the code, it produced:
33 32 31
27 26 25
21 20 19
15 14 13
09 08 07
03 02 01

I want it to produce:
33 32 31 30 29 28
27 26 25 24 23 22
21 20 19 18 17 16
15 14 13 12 11 10
09 08 07 06 05 04
03 02 01

Is there an easy way around this problem by modifying this code?

reddybs · April 14, 2009, 5:11am

Hi,

# tac file | awk '{print $6,$5,$4,$3,$2,$1}'

Any loopholes in the above command?
Hope this suites this situation.

ghostdog74 · April 14, 2009, 7:30am

should be alright if the file has exactly six fields. If the fields are variable, then might not be suitable.

vgersh99 · April 14, 2009, 10:14am

I'm not sure if that's the correct 'inversion'.
Based on my previous logic the correct/corrected inversion of the matrix

01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33

is:

         33 32 31
30 29 28 27 26 25
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 09 08 07
06 05 04 03 02 01

modified code:

{
  for(i=1; i<=NF;i++)
    arr[FNR,i]=$i
  if (NF>nf) nf=NF
  fnr=FNR
  for(j=i;j<=nf;j++)
    arr[FNR,j]=OFS OFS
}
END {
  for(i=fnr; i;i--)
    for(j=nf; j;j--)
      printf("%s%c", arr[i,j], (j==1)?ORS:OFS)
}

mattings · April 14, 2009, 11:03am

Yes, your solution works well for the simplified dataset i provided. Thanks.

However, after I tried it on one of my real datasets (10s of MBs of numbers) it turned out I actually need a more complicated transformation of the numbers. (I hadn't realised this before I saw the result of the reversing done by this code.)

So my real case is as follows: The numbers in the datafile are a matrix of 1253 rows times 1977 columns. They are stored in a file with 6 numbers per row (a header provides the real number of columns and rows), which makes a matrix of 412864 rows times 6 columns. What I need to do is reversing the matrix the numbers represent (1253x1977), not the matrix of the datafile (412864x6).

If anyone understood my problem, I'm very happy to see a code which solves this problem!

vgersh99 · April 14, 2009, 11:24am

post a header, please.
If you 6 columns, and the real/TOTAL number of columns (per row) is 1977, 1977 / 6 = 329 (not evenly divided - remainder=3).
329 rows of 6 columns represents the REAL row of data, correct?
What happens with the remainder of 3? Are you sure it's 1977 REAL rows or is it something else?

We can translate the 6 column matrix into a REAL matrix of (1977 or whatever) and then invert it - should be easily done.

Pleas answer the above questions.

summer_cherry · April 14, 2009, 10:49pm

perl may a little bit easier

open $fh,"<","a.txt";
while(<$fh>){
  chomp;
	$arr[$.]=join " ", reverse split;
}
close $fh;
for(my $i=$#arr;$i>=1;$i--){
	print $arr[$i],"\n";
}

mattings · April 15, 2009, 3:07am

Yes, I'm sure the real rows consists of 1977 numbers. The header looks like this:

NCOLS 1977
NROWS 1253
XLLCORNER 554650.000000
YLLCORNER 8040950.000000
CELLSIZE 25
NODATA_VALUE 9999900.000000

The dataset represents a seabed image. The bottom left corner is given by the XLL and YLL numbers in UTM coordinates. The cellsize is the resolution in meters. The numbers in the datafile represents the depth below seasurface. So the whole file is an image of the seabed with 1977x1253 pixels.

My problem is that the software I'm loading it into, is not the same as the one having generated the datafile. The image is displayed starting with the first row (of 1977 numbers) from the bottom left corner and then continuing upwards, and ending in the upper right corner with the last number in the 1253th row. The numbers is, however, stored in the datafile with the first row (of 1977 numbers) starting from the UPPER left corner, and continuing downwards ending in the lower right corner. This means that the seabed image is displayed mirrored along the middle row, which I of course don't want it to be.

The data is stored such that the first 1977 numbers represents the first row, the next 1977 numbers represent the second row, etc. And as you calculated, that means that every row consist of 329 rows pluss a reminder of three (which is half a row) = 329 * 6 + 3 = 1977. I guess this makes it more complicated, but not impossible to solve? Every row in the datafile consists of 6 numbers, except the last which only has three numbers.

So, what I need is a code which can rearrange this datafile to these specifications. That means that the last 1977 numbers becomes the first 1977 numbers (in the same order), the second last 1977 numbers becomes the second first 1977 numbers, etc.

I hope I made myself understandable, and that someone can help me.

vgersh99 · April 15, 2009, 8:41am

ok, try this - this does not derive the number of the REAL columns from the header - the header is not considered any different from the body of the file:

# this assumes 12 [default] columns encoded into 6 columns in the file - this is my
# test case file from the earlier post
nawk -f mat.awk myFile

# this assumes 1977 columns encoded into 6 columns in the file
nawk -v cols=1977 -f mat.awk myFile

mat.awk:

BEGIN {
  if (cols=="") cols=12
}
{
  for(i=1; i<=NF;i++) {
    col = ((col++)%cols)+1
    if (col==1) row++
    arr[row,col]=$i

  }
  if (NF>nf) nf=NF
  for(j=i+1;j<=nf;j++)
    arr[row,j]=OFS OFS
}
END {
  for(i=row; i;i--)
    for(j=cols; j;j--)
      printf("%s%c", arr[i,j], !((j-1)%nf)?ORS:OFS)
}

mattings · April 16, 2009, 8:32am

This did not produce quite the result I'm looking for.

To make it a bit easier to understand my problem, I've included a simple example below:

Original dataset stored in rows of six columns (only three in the last row):
01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33 34 35 36
37 38 39 40 41 42
43 44 45 46 47 48
49 50 51 52 53 54
55 56 57 58 59 60
61 62 63 64 65 66
67 68 69 70 71 72
73 74 75

The dataset above represents a 15 column x 5 row matrix looking like this:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

I want a matrix looking like this:
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

So, the task is to make the original dataset (with six columns) into the last matrix above with 15 columns.

I don't actualy think I need to store the new matrix in the same format as the original dataset (with only six columns), as long as linux allows matrixes with 1977 columns (which is what I have in the real dataset). It would be great if the code used could be generic, so that I can use it for any kind of matrix stored with six columns. (The dataset described in my previous post represents a 1977 column x 1253 row matrix.)

I can also mention that the numbers in the real dataset are both positive and negative, they have decimals, have varying length, and are space delimited.

vgersh99 · April 16, 2009, 9:20am

ok, this is a quite a different inversion - this is NOT what you've wanted from the start.
But that's ok - try this:

# this assumes 12 [default] columns encoded into 6 columns in the file - this is my
# test case file from the earlier post
nawk -f mat.awk myFile

# this assumes 1977 columns encoded into 6 columns in the file
nawk -v cols=1977 -f mat.awk myFile

# this assumes 15 columns encoded into 6 columns in the file - as in your previously given sample file
nawk -v cols=15 -f mat.awk myFile

mat.awk:

BEGIN {
  if (cols=="") cols=12
}
{
  for(i=1; i<=NF;i++) {
    col = ((col++)%cols)+1
    if (col==1) row++
    arr[row,col]=$i
  }
}
END {
  for(i=row; i;i--)
    for(j=1; j<=cols; j++)
      printf("%s%c", arr[i,j], !(j%cols)?ORS:OFS)
}

mattings · April 17, 2009, 7:48am

It still didn't work quite as I wanted it to.

Anyway, I managed to get a friend to write a python code which solved the problem. If anyone is interested the code is included:

#!/usr/bin/python
length_of_a_row=1977
input_file='infile.dat'
output_file='outfile.dat'
data=open(input_file).read() # opens the file and reads it into memory
data=data.split() #turns it into one long list of single numbers
data2=[] #we'll need this as an empty list in the for-loop
for i in xrange(len(data)/length_of_a_row):
 data2.append(data[length_of_a_row*i:length_of_a_row*(i+1)])
#reshapes it in the size of the matrix
data=[] # free a bit of memory
data2.reverse() #resort with first row at bottom
#get rid of the array shape:
data=" ".join([" ".join(x) for x in data2]) # turn it into one long string
data2=[]
data=data.split() # and split it into one long list again 
#get it into output shape again:
for i in xrange(len(data)/6):
 data2.append(data[6*i:6*(i+1)])
data2.append(data[6*(len(data)/6):len(data)]) #the last line is not full length
data=[] #free some memory
#write the whole stuff out to hard disk
out=open(output_file,'w') 
out.write("\n".join([" ".join(x) for x in data2]))
out.write('\n')
out.close()

vgersh99 · April 17, 2009, 8:03am

given mat.txt:

01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33 34 35 36
37 38 39 40 41 42
43 44 45 46 47 48
49 50 51 52 53 54
55 56 57 58 59 60
61 62 63 64 65 66
67 68 69 70 71 72
73 74 75

and mat.awk:

BEGIN {
  if (cols=="") cols=12
}
{
  for(i=1; i<=NF;i++) {
    col = ((col++)%cols)+1
    if (col==1) row++
    arr[row,col]=$i
  }
}
END {
  for(i=row; i;i--)
    for(j=1; j<=cols; j++)
      printf("%s%c", arr[i,j], !(j%cols)?ORS:OFS)
}

nawk -v cols=15 -f mat.awk mat.txt

produces:

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

Anything wrong?