I've tried searching the forum for an answer to my question, but without any luck...
I have a datafile looking simplified as follows:
01 02 03 04 05 06
07 08 09 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
I want to reverse it by rearranging all the numbers from last to first, so that it looks like this:
24 23 22 21 20 19
18 17 16 15 14 13
12 11 10 09 08 07
06 05 04 03 02 01
The original tac reverses tokens separated by spaces. The last token on each line is "number-followed-by-newline". When the tokens are reversed, the last token is "24-newline", and so on.
The solutions posted are fine. Here is a compact perl solution:
#!/usr/bin/env bash
# @(#) s1 Demonstrate reversal of symbols on lines (perl).
echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) tac perl
set -o nounset
echo
FILE=${1-data1}
echo " Data file $FILE:"
cat $FILE
echo
echo " Original results (trailing newline added):"
tac --separator=" " $FILE
echo
echo
echo " Re-written results:"
tac $FILE |
perl -n -e 'chomp;print join(" ",reverse(split(/ /))),"\n";'
exit 0
The heart of the operation: after the lines are reversed by tac, process them with a perl script. The easy-to-read-but-not-so-easy-to-understand perl code stacks a number of functions: split the line into an array of tokens, reverse the content of the array (and thus the tokens), join the array with a space as separator into a string, print the string with a newline at the end. The "-n" says to do that series of operations for each line ... cheers, drl
Yes, your solution works well for the simplified dataset i provided. Thanks.
However, after I tried it on one of my real datasets (10s of MBs of numbers) it turned out I actually need a more complicated transformation of the numbers. (I hadn't realised this before I saw the result of the reversing done by this code.)
So my real case is as follows: The numbers in the datafile are a matrix of 1253 rows times 1977 columns. They are stored in a file with 6 numbers per row (a header provides the real number of columns and rows), which makes a matrix of 412864 rows times 6 columns. What I need to do is reversing the matrix the numbers represent (1253x1977), not the matrix of the datafile (412864x6).
If anyone understood my problem, I'm very happy to see a code which solves this problem!
post a header, please.
If you 6 columns, and the real/TOTAL number of columns (per row) is 1977, 1977 / 6 = 329 (not evenly divided - remainder=3).
329 rows of 6 columns represents the REAL row of data, correct?
What happens with the remainder of 3? Are you sure it's 1977 REAL rows or is it something else?
We can translate the 6 column matrix into a REAL matrix of (1977 or whatever) and then invert it - should be easily done.
The dataset represents a seabed image. The bottom left corner is given by the XLL and YLL numbers in UTM coordinates. The cellsize is the resolution in meters. The numbers in the datafile represents the depth below seasurface. So the whole file is an image of the seabed with 1977x1253 pixels.
My problem is that the software I'm loading it into, is not the same as the one having generated the datafile. The image is displayed starting with the first row (of 1977 numbers) from the bottom left corner and then continuing upwards, and ending in the upper right corner with the last number in the 1253th row. The numbers is, however, stored in the datafile with the first row (of 1977 numbers) starting from the UPPER left corner, and continuing downwards ending in the lower right corner. This means that the seabed image is displayed mirrored along the middle row, which I of course don't want it to be.
The data is stored such that the first 1977 numbers represents the first row, the next 1977 numbers represent the second row, etc. And as you calculated, that means that every row consist of 329 rows pluss a reminder of three (which is half a row) = 329 * 6 + 3 = 1977. I guess this makes it more complicated, but not impossible to solve? Every row in the datafile consists of 6 numbers, except the last which only has three numbers.
So, what I need is a code which can rearrange this datafile to these specifications. That means that the last 1977 numbers becomes the first 1977 numbers (in the same order), the second last 1977 numbers becomes the second first 1977 numbers, etc.
I hope I made myself understandable, and that someone can help me.
ok, try this - this does not derive the number of the REAL columns from the header - the header is not considered any different from the body of the file:
# this assumes 12 [default] columns encoded into 6 columns in the file - this is my
# test case file from the earlier post
nawk -f mat.awk myFile
# this assumes 1977 columns encoded into 6 columns in the file
nawk -v cols=1977 -f mat.awk myFile
mat.awk:
BEGIN {
if (cols=="") cols=12
}
{
for(i=1; i<=NF;i++) {
col = ((col++)%cols)+1
if (col==1) row++
arr[row,col]=$i
}
if (NF>nf) nf=NF
for(j=i+1;j<=nf;j++)
arr[row,j]=OFS OFS
}
END {
for(i=row; i;i--)
for(j=cols; j;j--)
printf("%s%c", arr[i,j], !((j-1)%nf)?ORS:OFS)
}
So, the task is to make the original dataset (with six columns) into the last matrix above with 15 columns.
I don't actualy think I need to store the new matrix in the same format as the original dataset (with only six columns), as long as linux allows matrixes with 1977 columns (which is what I have in the real dataset). It would be great if the code used could be generic, so that I can use it for any kind of matrix stored with six columns. (The dataset described in my previous post represents a 1977 column x 1253 row matrix.)
I can also mention that the numbers in the real dataset are both positive and negative, they have decimals, have varying length, and are space delimited.
Anyway, I managed to get a friend to write a python code which solved the problem. If anyone is interested the code is included:
#!/usr/bin/python
length_of_a_row=1977
input_file='infile.dat'
output_file='outfile.dat'
data=open(input_file).read() # opens the file and reads it into memory
data=data.split() #turns it into one long list of single numbers
data2=[] #we'll need this as an empty list in the for-loop
for i in xrange(len(data)/length_of_a_row):
data2.append(data[length_of_a_row*i:length_of_a_row*(i+1)])
#reshapes it in the size of the matrix
data=[] # free a bit of memory
data2.reverse() #resort with first row at bottom
#get rid of the array shape:
data=" ".join([" ".join(x) for x in data2]) # turn it into one long string
data2=[]
data=data.split() # and split it into one long list again
#get it into output shape again:
for i in xrange(len(data)/6):
data2.append(data[6*i:6*(i+1)])
data2.append(data[6*(len(data)/6):len(data)]) #the last line is not full length
data=[] #free some memory
#write the whole stuff out to hard disk
out=open(output_file,'w')
out.write("\n".join([" ".join(x) for x in data2]))
out.write('\n')
out.close()