Python for text manipulating

Dear All,

I am trying to write a python code for reading a fixed number of lines from a big file then save those pieces into another file as columns. I think sample file is necessary for understanding:

Sample Input file:

Epi. dist.(km)=       0.8100E+02
0.7466E-07  0.4942E-07  0.7133E-07  0.6010E-07  0.1123E-06  0.4819E-07  0.9435E-07  0.6491E-07  0.6051E-07  0.5202E-07
0.1647E-06  0.5267E-07  0.5724E-07  0.1031E-06  0.1143E-06  0.4776E-07  0.3594E-06  0.4832E-07  0.7363E-07  0.1009E-06
0.6381E-07  0.6170E-07  0.6429E-07  0.9720E-07  0.4749E-07  0.6851E-07  0.1585E-06  0.8395E-07  0.4581E-07  0.7321E-07


Epi. dist.(km)=       0.8200E+02
0.2102E-06  0.9821E-07  0.6322E-07  0.1665E-06  0.5524E-06  0.6590E-07  0.1916E-06  0.1292E-06  0.1104E-06  0.8035E-06
0.6358E-06  0.4661E-07  0.9605E-07  0.1492E-06  0.1093E-06  0.1931E-05  0.7120E-07  0.9829E-07  0.1230E-05  0.4093E-06
0.1910E-06  0.2032E-06  0.9052E-07  0.5392E-07  0.5534E-06  0.9899E-07  0.1620E-06  0.1220E-06  0.1220E-06  0.3248E-06


Epi. dist.(km)=       0.8300E+02
0.5564E-07  0.8314E-07  0.8364E-07  0.3975E-07  0.4601E-07  0.4936E-07  0.5480E-07  0.1290E-06  0.4882E-07  0.4571E-07
0.1229E-06  0.2128E-06  0.8432E-07  0.9233E-07  0.4091E-07  0.5957E-07  0.7815E-07  0.6687E-07  0.5402E-07  0.6230E-07
0.6264E-07  0.1231E-06  0.6061E-07  0.4451E-07  0.7700E-07  0.3504E-07  0.1151E-06  0.1150E-06  0.5685E-07  0.5579E-07

.
.
.

So there are blocks in the file that starts with "Epi dist", I need to extract those blocks(without Epi.dist line) then save them as columns into another file. There are 20 similar blocks in the file and each block is 135 lines long(excluding Epi.dist line).

Sample output file:

0.7466E-07 0.2102E-06 0.5564E-07
0.4942E-07 0.9821E-07 0.8314E-07
0.7133E-07 0.6322E-07 0.8364E-07
0.6010E-07 0.1665E-06 0.3975E-07
0.1123E-06 0.5524E-06 0.4601E-07
0.4819E-07 0.6590E-07 0.4936E-07
0.9435E-07 0.1916E-06 0.5480E-07
0.6491E-07 0.1292E-06 0.1290E-06
0.6051E-07 0.1104E-06 0.4882E-07
0.5202E-07 0.8035E-06 0.4571E-07
0.1647E-06 0.6358E-06 0.1229E-06
0.5267E-07 0.4661E-07 0.2128E-06
0.5724E-07 0.9605E-07 0.8432E-07
0.1031E-06 0.1492E-06 0.9233E-07
0.1143E-06 0.1093E-06 0.4091E-07
0.4776E-07 0.1931E-05 0.5957E-07
0.3594E-06 0.7120E-07 0.7815E-07
0.4832E-07 0.9829E-07 0.6687E-07
0.7363E-07 0.1230E-05 0.5402E-07
0.1009E-06 0.4093E-06 0.6230E-07
0.6381E-07 0.1910E-06 0.6264E-07
0.6170E-07 0.2032E-06 0.1231E-06
0.6429E-07 0.9052E-07 0.6061E-07
0.9720E-07 0.5392E-07 0.4451E-07
0.4749E-07 0.5534E-06 0.7700E-07
0.6851E-07 0.9899E-07 0.3504E-07
0.1585E-06 0.1620E-06 0.1151E-06
0.8395E-07 0.1220E-06 0.1150E-06
0.4581E-07 0.1220E-06 0.5685E-07
0.7321E-07 0.3248E-06 0.5579E-07

I hope I did a good job explaining my problem. Could you help me out?

Does it have to be python?

Python solution would be preferable for me. Since I use another python script for doing other stuff with the output file, basically I want to merge two of them to make one script. I am using a dirty bash, awk solution to produce the output file at the moment.

Could you please also show us what you've tried? It would be quicker to proceed from there..

C:\Temp>
C:\Temp>type epi_dist.txt
Epi. dist.(km)=       0.8100E+02
0.7466E-07  0.4942E-07  0.7133E-07  0.6010E-07  0.1123E-06  0.4819E-07  0.9435E-07  0.6491E-07  0.6051E-07  0.5202E-07
0.1647E-06  0.5267E-07  0.5724E-07  0.1031E-06  0.1143E-06  0.4776E-07  0.3594E-06  0.4832E-07  0.7363E-07  0.1009E-06
0.6381E-07  0.6170E-07  0.6429E-07  0.9720E-07  0.4749E-07  0.6851E-07  0.1585E-06  0.8395E-07  0.4581E-07  0.7321E-07


Epi. dist.(km)=       0.8200E+02
0.2102E-06  0.9821E-07  0.6322E-07  0.1665E-06  0.5524E-06  0.6590E-07  0.1916E-06  0.1292E-06  0.1104E-06  0.8035E-06
0.6358E-06  0.4661E-07  0.9605E-07  0.1492E-06  0.1093E-06  0.1931E-05  0.7120E-07  0.9829E-07  0.1230E-05  0.4093E-06
0.1910E-06  0.2032E-06  0.9052E-07  0.5392E-07  0.5534E-06  0.9899E-07  0.1620E-06  0.1220E-06  0.1220E-06  0.3248E-06


Epi. dist.(km)=       0.8300E+02
0.5564E-07  0.8314E-07  0.8364E-07  0.3975E-07  0.4601E-07  0.4936E-07  0.5480E-07  0.1290E-06  0.4882E-07  0.4571E-07
0.1229E-06  0.2128E-06  0.8432E-07  0.9233E-07  0.4091E-07  0.5957E-07  0.7815E-07  0.6687E-07  0.5402E-07  0.6230E-07
0.6264E-07  0.1231E-06  0.6061E-07  0.4451E-07  0.7700E-07  0.3504E-07  0.1151E-06  0.1150E-06  0.5685E-07  0.5579E-07


C:\Temp>
C:\Temp>type pivot_epi_dist.py
#!python
pivot = []
index = -2
fin = open('epi_dist.txt', 'rt')
for line in fin:
    line = line.strip()
    if line == '':
        continue
    if line[0:4] == 'Epi.':
        index += 1
        iter = 0
    else:
        nums = line.split()
        for num in nums:
            if index < 0:
                pivot.append([num])
            else:
                pivot[iter].append(num)
                iter += 1
fin.close()
for row in pivot:
    print(" ".join(row), end='\n')

C:\Temp>
C:\Temp>
C:\Temp>python pivot_epi_dist.py
0.7466E-07 0.2102E-06 0.5564E-07
0.4942E-07 0.9821E-07 0.8314E-07
0.7133E-07 0.6322E-07 0.8364E-07
0.6010E-07 0.1665E-06 0.3975E-07
0.1123E-06 0.5524E-06 0.4601E-07
0.4819E-07 0.6590E-07 0.4936E-07
0.9435E-07 0.1916E-06 0.5480E-07
0.6491E-07 0.1292E-06 0.1290E-06
0.6051E-07 0.1104E-06 0.4882E-07
0.5202E-07 0.8035E-06 0.4571E-07
0.1647E-06 0.6358E-06 0.1229E-06
0.5267E-07 0.4661E-07 0.2128E-06
0.5724E-07 0.9605E-07 0.8432E-07
0.1031E-06 0.1492E-06 0.9233E-07
0.1143E-06 0.1093E-06 0.4091E-07
0.4776E-07 0.1931E-05 0.5957E-07
0.3594E-06 0.7120E-07 0.7815E-07
0.4832E-07 0.9829E-07 0.6687E-07
0.7363E-07 0.1230E-05 0.5402E-07
0.1009E-06 0.4093E-06 0.6230E-07
0.6381E-07 0.1910E-06 0.6264E-07
0.6170E-07 0.2032E-06 0.1231E-06
0.6429E-07 0.9052E-07 0.6061E-07
0.9720E-07 0.5392E-07 0.4451E-07
0.4749E-07 0.5534E-06 0.7700E-07
0.6851E-07 0.9899E-07 0.3504E-07
0.1585E-06 0.1620E-06 0.1151E-06
0.8395E-07 0.1220E-06 0.1150E-06
0.4581E-07 0.1220E-06 0.5685E-07
0.7321E-07 0.3248E-06 0.5579E-07

C:\Temp>
C:\Temp>

thank you for your answer durden_tyler. However when put your code into a file and then run it I got this error message:

 ./convert.py
  File "./convert.py", line 30
    print(" ".join(row), end='\n')
                            ^
SyntaxError: invalid syntax

Note that durden tyler called the script like this:

python pivot_epi_dist.py

While you call it like this:

./convert.py

That would work if the shebang were correct. You need to specify the absolute path to python, for example:

#!/usr/bin/python

if that is where python lives on your system, or

#!/usr/bin/env python

Yes I am aware that and adjusted shebang for my system but still get the mentioned error. Could the error be related to python version? I have python 2.7.

The print FUNCTION in the code is a version 3.x method, the print statement no longer exists in these versions.

However the print statement is what you require so should be Version 2.x and lower.

There were/are several differences between versions up to 2.7.x and 3.0.0 and above.

Yes, the issue was the "print()". In my earlier post, I used Python version 3.4 (Anaconda on Windows 7) where print() is a function. Your version is 2.7 where print is a statement. The syntax is different in versions 3 and less-than-3.
More information here:

https://docs.python.org/3.0/whatsnew/3.0.html

And the fix for version less than 3 (this is on Cygwin on Windows 7):

$
$ python --version
Python 2.7.8
$
$ cat pivot_epi_dist_ver1.py
#!python
pivot = []
index = -2
fin = open('epi_dist.txt', 'rt')
for line in fin:
    line = line.strip()
    if line == '':
        continue
    if line[0:4] == 'Epi.':
        index += 1
        iter = 0
    else:
        nums = line.split()
        for num in nums:
            if index < 0:
                pivot.append([num])
            else:
                pivot[iter].append(num)
                iter += 1
fin.close()
for row in pivot:
    print " ".join(row)
$
$
$ python pivot_epi_dist_ver1.py
0.7466E-07 0.2102E-06 0.5564E-07
0.4942E-07 0.9821E-07 0.8314E-07
0.7133E-07 0.6322E-07 0.8364E-07
0.6010E-07 0.1665E-06 0.3975E-07
0.1123E-06 0.5524E-06 0.4601E-07
0.4819E-07 0.6590E-07 0.4936E-07
0.9435E-07 0.1916E-06 0.5480E-07
0.6491E-07 0.1292E-06 0.1290E-06
0.6051E-07 0.1104E-06 0.4882E-07
0.5202E-07 0.8035E-06 0.4571E-07
0.1647E-06 0.6358E-06 0.1229E-06
0.5267E-07 0.4661E-07 0.2128E-06
0.5724E-07 0.9605E-07 0.8432E-07
0.1031E-06 0.1492E-06 0.9233E-07
0.1143E-06 0.1093E-06 0.4091E-07
0.4776E-07 0.1931E-05 0.5957E-07
0.3594E-06 0.7120E-07 0.7815E-07
0.4832E-07 0.9829E-07 0.6687E-07
0.7363E-07 0.1230E-05 0.5402E-07
0.1009E-06 0.4093E-06 0.6230E-07
0.6381E-07 0.1910E-06 0.6264E-07
0.6170E-07 0.2032E-06 0.1231E-06
0.6429E-07 0.9052E-07 0.6061E-07
0.9720E-07 0.5392E-07 0.4451E-07
0.4749E-07 0.5534E-06 0.7700E-07
0.6851E-07 0.9899E-07 0.3504E-07
0.1585E-06 0.1620E-06 0.1151E-06
0.8395E-07 0.1220E-06 0.1150E-06
0.4581E-07 0.1220E-06 0.5685E-07
0.7321E-07 0.3248E-06 0.5579E-07
$
$
1 Like

Thank you again, this time it works perfectly. :slight_smile: