Python (startswith) reg expression

research3 · May 15, 2009, 9:24am

Hello together,

Yesterday I have recieved the script in the forum which works well.
This script should insert in the previous line, the line that starts with ";".
I'd like this process to recur after any arbitrary sign unless there's a number as a sign at the beginning of a line.

Actual state:

#!/usr/bin/env python

import sys

data=open(sys.argv[1]).read().split("\n")
for n,items in enumerate(data):
if items.startswith("A"):
data[n-1]= data[n-1]+items
data.pop(n)

print '\n'.join(data)

should be:

#!/usr/bin/env python

import sys

data=open(sys.argv[1]).read().split("\n")
for n,items in enumerate(data):
if not items.startswith('[0-9]'):
data[n-1]= data[n-1]+items
data.pop(n)

print '\n'.join(data)

Can I use regular expression at Phyton (startswith)?

I have carried out an investigation but unfortunately up till now I haven't found the right solution.

Any suggestion ??

Thanks in advance!

ghostdog74 · May 15, 2009, 9:34am

put your Python code in code tags. Python indentation is important so if you don't put in code tags, its hard to troubleshoot. of course you can use regular expression, using the re module. (i will leave it to you to read the docs if you are interested).
but for this case (and most cases), regular expression is not needed. you just take the first character of the line and check for isdigit(). eg

import sys
data=open(sys.argv[1]).read().split("\n")
for n,items in enumerate(data):
    if items[0].isdigit(): # check first character is a digit, ie equivalent to [0-9]
        data[n-1]= data[n-1]+items
        data.pop(n)

in a Python string, first character is position 0, second character position 1 and so on. so if you want to get third character of string, its string[2]. read the Python docs and study how Python works.

research3 · May 15, 2009, 9:52am

Thanks again for your fast reply I' going look through the docs!

Something is wrong with the code or with my csv file, I will try to find out what the issue is, and I'll post my result.

Gee-Money · May 15, 2009, 10:14am

Just to clarify what ghostdog said...this "startswith" business is whats known as a "string method", and in string methods (there are many), it uses strings, not regular expressions.

The only way in python to deal with regex's is the "re" module

import re

the "re" module is pretty involved, but nifty, and in most cases it is overkill.

you know what always helps me in my python coding...that little $10 pocket reference book. I cant tell you how many times I found exactly what I needed in that thing, instead of sifting through docs for 30 minutes. Best $10 I ever spent.

ghostdog74 · May 15, 2009, 10:17am

google with keywords takes less than 30 secs.. and free

research3 · May 15, 2009, 10:45am

That's right!

Unfortunately, but the logic of the script is changed.

cat text

001;test;test;test
123;test;test;test
test;test;test
000;;test;test;test

./python.py text

result

123;test;test;test
test;test;test000;;test;test;test
001;test;test;test

Any idea?

ghostdog74 · May 15, 2009, 11:11am

dude, so what should the output be like?

research3 · May 15, 2009, 11:18am

Only rows that begins with digit character!

001;test;test;test
123;test;test;test test;test;test
000;;test;test;test

ghostdog74 · May 15, 2009, 11:23am

just use the opposite condition

...
    if not items[0].isdigit():
...

this statement : data[n-1]= data[n-1]+items , means to append to previous. If you check for digit, then once digit is found , it will append previous line. Obviously, this is wrong. therefore you should check for NOT digit in the first character. make sense? you should also study about Python if you are going to use it. See the Python docs.

research3 · May 15, 2009, 11:29am

I have already tried but the command display the follow error message!

#!/usr/bin/env python

import sys

data=open(sys.argv[1]).read().split("\n")
for n,items in enumerate(data):
if not items[0].isdigit():
data[n-1]= data[n-1]+items
data.pop(n)

print '\n'.join(data)

./python.py text

Traceback (most recent call last):
File "./reconf-rowdata-other-char.py", line 11, in ?
if not items[0].isdigit():
IndexError: string index out of range

ghostdog74 · May 15, 2009, 11:40am

you did not do what i said. put your code in code tags!
name your script with a different name. eg myscript.py , and not python.py
here's mine and it works for me

#!/usr/bin/env python
data=open(sys.argv[1]).read().split("\n")
data=[i for i in data if i != ''] ## add this to get rid of all blank lines
for n,items in enumerate(data):
    if not items[0].isdigit(): # check first character is a digit, ie equivalent to [0-9]
        data[n-1]= data[n-1]+items
        data.pop(n)
print '\n'.join(data)

output:

# more file
001;test;test;test
123;test;test;test
test;test;test
000;;test;test;test

# ./test.py file
001;test;test;test
123;test;test;testtest;test;test
000;;test;test;test

most probably, you have blank lines in your file

research3 · May 15, 2009, 11:48am

I believe you but I can't make work out!

$ ./test.py text
Traceback (most recent call last):
File "./test.py", line 5, in ?
if not items[0].isdigit(): # check first character is a digit, ie equivalent to [0-9]
IndexError: string index out of range

$ cat test.py
#!/usr/bin/env python
import sys
data=open(sys.argv[1]).read().split("\n")
for n,items in enumerate(data):
if not items[0].isdigit(): # check first character is a digit, ie equivalent to [0-9]
data[n-1]= data[n-1]+items
data.pop(n)
print '\n'.join(data)

$ python -V
Python 2.4.3

research3 · May 15, 2009, 11:49am

cat text

001;test;test;test;
123;test;test;test
;test;test;test
000;;test;test;test;

ghostdog74 · May 15, 2009, 11:51am

see my code again. i added something. and for the last time, put your code in code tags!!

research3 · May 15, 2009, 11:56am

GHOSTDOG74

You are simply the best again !!

Many Thanks!