Delete new lines based on search criteria

jayarkay · January 19, 2010, 12:05am

Hi all!

A bit of background: I am trying to create a script that formats SQL statements. I have gotten so far as to add new lines based on certain match criteria like commas, keywords etc. In the process, I end up adding newlines where I don't want.

For example: substr(colName, 1, 10) becomes:
substr(colName,
1,
10) - I don't want this particular comma to be replaced with a newline.

With that in mind, my questions are:

Is there a way I can ignore the commas within braces - ( )
If not, how can I delete the newlines on lines starting with numbers, i.e. look for numbers at the beginning of a line and remove the newline before it.

In other words, if my input is:
substr(colName,
1,
10)

my output should be: substr(colName, 1, 10).

Any help is really appreciated! Thanks in advance.
Jay.

---------- Post updated 01-19-10 at 12:05 AM ---------- Previous update was 01-18-10 at 11:13 PM ----------

Another thing I forgot to mention: can I make the script intelligent enough to recognize negative numbers as well? Another example along the lines of the substr above is:

add_months(somedate, -6)
is becoming
add_months(somedate,
-6).

Don't want this to happen either.

skmdu · January 19, 2010, 12:30am

Hope I got your question correctly, But I used the different approach to solve this issue ( i,e I didnt check if the new line appears before the line starts with the number ). It works for me. You can try this out.

$cat inputfile
substr(colName,
1,
10)
add_months(somedate,
-6)

$ sed '
 :loop
 $!N
 s/\(.*,\)\n\(.*,\)/\1\2/
 /\(.*\)\n\(.*)\)/{
 s/\(.*\)\n\(.*)\)/\1\2/
 p;d;
 }
 t loop' inputfile
substr(colName,1,10)
add_months(somedate,-6)

daptal · January 19, 2010, 12:32am

awk '{if ($0~ /\(/ && $0 ~ /\)/) {print $0} else {sub(","," \n",$0);print }}' infile

HTH,
PL

jayarkay · January 19, 2010, 1:36am

Thanks for your response, daptal. There seems to be a bit of a problem though. I am on a solaris machine, by the way. When I run your awk statement, I get an error:

awk: syntax error near line 1
awk: illegal statement near line 1

When I use nawk or gawk though, the command does run successfully, but does the opposite of what I was expecting, i.e. adds an extra newline instead of the comma. My output in these two cases was:

substr(colName

1

10)
add_months(somedate

-6)

Thanks!

---------- Post updated at 01:36 AM ---------- Previous update was at 01:26 AM ----------

skmdu:

Hope I got your question correctly, But I used the different approach to solve this issue ( i,e I didnt check if the new line appears before the line starts with the number ). It works for me. You can try this out.
$cat inputfile
substr(colName,
1,
10)
add_months(somedate,
-6)
$ sed '
 :loop
 $!N
 s/$.*,$\n$.*,$/\1\2/
 /$.*$\n$.*)$/{
 s/$.*$\n$.*)$/\1\2/
 p;d;
 }
 t loop' inputfile
substr(colName,1,10)
add_months(somedate,-6)

Thanks for your response, skmdu! This code works when my input is just the examples that I had quoted in my original post. There is a bit of a problem, though - and it is entirely because I didn't give you the entire picture. Here's a more complete input:

My input (if you notice, its all in one line):

SELECT  C.column1,  A.column4,  (SUBSTR(A.column8, 1, 2)),  (SUBSTR(A.column9, 3, 1)),  CASE WHEN (SUBSTR(A.column9, 3, 1) = 'V') THEN B.column1 ELSE SUBSTR(A.column9, 1, 2) END,  B.column2 FROM  TABLE1 A,  TABLE2 B,  TABLE3 C,  TABLE4 D,  TABLE5 E WHERE (((((A.column3 = C.column3) AND (A.column4 = B.column4)) AND (A.column5 = D.column5)) AND (A.column6 = E.column6)) AND (A.column6 IN ('1', '2')));

My output so far (without your sed statement):

SELECT
  C.column1,
  A.column4,
  (SUBSTR(A.column8,
 1,
 2)),
  (SUBSTR(A.column9,
 3,
 1)),
  CASE
WHEN (SUBSTR(A.column9,
 3,
 1) = 'V') THEN B.column1
ELSE SUBSTR(A.column9,
 1,
 2)
END,
  B.column2
FROM
  TABLE1 A,
  TABLE2 B,
  TABLE3 C,
  TABLE4 D,
  TABLE5 E
WHERE (((((A.column3 = C.column3)
AND (A.column4 = B.column4))
AND (A.column5 = D.column5))
AND (A.column6 = E.column6))
AND (A.column6 IN ('1',
 '2')));

Output when your sed statement is included in my script that produced the above output:

SELECT
  C.column1,
  A.column4,  (SUBSTR(A.column8, 1, 2)),  (SUBSTR(A.column9, 3, 1)),
  CASE
WHEN (SUBSTR(A.column9, 3, 1) = 'V') THEN B.column1
ELSE SUBSTR(A.column9, 1, 2)
END,
  B.column2
FROM
  TABLE1 A,
  TABLE2 B,  TABLE3 C,  TABLE4 D,
  TABLE5 E
WHERE (((((A.column3 = C.column3) AND (A.column4 = B.column4))
AND (A.column5 = D.column5)) AND (A.column6 = E.column6))
AND (A.column6 IN ('1', '2')));,

Notice how some lines are joined because of extra commas or braces being present on a specific line. Also, an additional comma was added at the end, after the semi-colon.

Again, I realize it was because I had not given you much to work with to begin with. Thanks again for your help so far! Really, really appreciate it.

Jay.