Parse "import a, b, c, d" into line-by-line expressions "import a\nimport b\nimport c\nimport d\n"

scriptus · April 5, 2014, 12:44pm

First post. I'm just getting to grips with sed.
I've learned the basic substitution commands.
But I'm a bit stuck on this problem.

I'm running through some python files to convert syntax from Gtk2 to Gtk3 notation.

Consider a simple line of python like this ..

import gtk, pango, gtksourceview2, gobject

The terms might be in any order ...

e.g.

import gtk, gobject, pango, gtksourceview2
import gtk, pango, gobject, gtksourceview2, gobject

or even some terms left out in some input files ...

import gtk, gobject

I would like to see the output from sed as single line expressions ...

import gtk
import pango
import gtksourceview2
import gobject

and from there I can run through these single line expressions again to finally convert to Gtk3.

I've read how to use the "^, "|" , "&" and "$" operators in sed ...

sed -r 's/^import.(gtk$|pango$|gtksourceview2$|gobject$)/&\n/g' input.py > output.py

but I'm not there yet.

....

Here is a compressed code (Gtk2) example

import gtk, pango, gtksourceview2, gobject
import sys, os, re, glob, subprocess, webbrowser, base64, cgi, urllib2, shutil, time, pgsc_spellcheck

This is how the final (Gtk3) output code should look.

from gi.repository import Gtk
from gi.repository import Pango
from gi.repository import GtkSource
from gi.repository import GObject

import sys
import os
import re
import glob
import subprocess
import webbrowser
import base64
import cgi
import urllib2
import shutil
import pgsc_spellcheck

....

So the question as in title .. how to parse the line ...

import a, b, c, d, e

... (module names in any random order and some might be missing) into line-by-line expressions?

Thanks.

bartus11 · April 5, 2014, 12:51pm

Try:

awk -F"," '/^import/{print $1;for (i=2;i<=NF;i++) print "import"$i;next}1' input.py

RudiC · April 5, 2014, 1:08pm

Would this do:

sed '/import/ s/,/\nimport/g' file
import gtk
import pango
import gtksourceview2
import gobject
import sys
import os
import re
import glob
import subprocess
import webbrowser
import base64
import cgi
import urllib2
import shutil
import time
import pgsc_spellcheck

scriptus · April 5, 2014, 1:52pm

@bartus11

Thanks for the suggestion but I prefer to stay with sed .. at least for now since I'm calling embedded sed from python subprocess. But I will try it out later.

@RudiC

That suggestion works great .. and in one line of code .. but I now have to analyse how it works. Don't know how I missed it in my reading about sed. I was trying overly complicated expressions.

Thank you both.

Now on to learning more sed (and awk I guess).

Akshay_Hegde · April 5, 2014, 2:00pm

$ awk '!/import/{$0 = "import"$0}1' RS=",|\n" file

import gtk
import pango
import gtksourceview2
import gobject
import sys
import os
import re
import glob
import subprocess
import webbrowser
import base64
import cgi
import urllib2
import shutil
import time
import pgsc_spellcheck

scriptus · April 5, 2014, 3:00pm

@Akshay

Thanks. I'm now beginning to understand the different features of sed vs awk.

From an article I have just read (as a new member I can't post the link yet) ..

https://davidlyness.com/post/the-functional-and-performance-differences-of-sed-awk-and-other-unix-parsing-utilities

sed and awk both work through as python subprocess call (executing command line in python).

Akshay_Hegde · April 5, 2014, 3:09pm

scriptus:

@Akshay

Thanks. I'm now beginning to understand the different features of sed vs awk.

From an article I have just read (as a new member I can't post the link yet) ..
https://davidlyness.com/post/the-functional-and-performance-differences-of-sed-awk-and-other-unix-parsing-utilities
sed and awk both work through as python subprocess call (executing command line in python).

Well you are learning, awk splits line into field if you just specify FS Field separator, default FS is space.

$0 --> line
$1 --> first field/column
$2 --> second field/column
$NF --> last field/column in a line

$ echo "1 2 3 4 5" | awk '{print "First Field "$1; print "Last Field "$NF}'
First Field 1
Last Field 5

if you are using other than space, you have to use FS

$ echo "1,2,3,4,5" | awk -F, '{print "First Field "$1; print "Last Field "$NF}'
First Field 1
Last Field 5

Here you don't have to split each line into field as in python for example, awk does

for line in open("file"):
columns = line.split("\t")
print columns[1]

scriptus · April 7, 2014, 7:05am

Final question in this thread ...

$ sed '/import/ s/,/\nimport/g' file

$ awk -F"," '/^import/{print $1;for (i=2;i<=NF;i++) print "import"$i;next}1' input.py

$ awk '!/import/{$0 = "import"$0}1' RS=",|\n" file

How would each of the above three working solutions (they work through terminal command) be executed by embedding in a python subprocess.Popen call?

e.g. this is invalid ...

file = "input.py"
subprocess.Popen(["sed", "/import/ s/,/\nimport/g", file])

error given:
sed: -e expression #1, char 13: unterminated `s' command

The awk arguments are harder to figure out to place in subprocess.Popen.

Akshay_Hegde · April 7, 2014, 7:44am

[akshay@aix tmp]$ cat f
import gtk, pango, gtksourceview2, gobject
import sys, os, re, glob, subprocess, webbrowser, base64, cgi, urllib2, shutil, time, pgsc_spellcheck

[akshay@aix tmp]$ cat test.py
#!/usr/bin/env python
from subprocess import Popen, PIPE
args = ["awk '!/import/{$0 = \"import\"$0}1' RS=',|\n' f"]
p = Popen(args,shell=True, stdout=PIPE, stderr=PIPE)
out, err = p.communicate()
print "return: ", p.returncode
print out.rstrip(), err.rstrip()

[akshay@aix tmp]$ ./test.py 
return:  0
import gtk
import pango
import gtksourceview2
import gobject
import sys
import os
import re
import glob
import subprocess
import webbrowser
import base64
import cgi
import urllib2
import shutil
import time
import pgsc_spellcheck

scriptus · April 7, 2014, 8:41am

Thanks again ..

I'll mark this as solved.

Akshay_Hegde · April 7, 2014, 10:00am

Glad to know that problem solved