sed to remove newline chars based on pattern mis-match

chill3chee · July 7, 2016, 11:29am

Greetings Experts,
I am in AIX; I have a file generated through awk after processing the input files. Now I need to replace or remove the new-line characters on all lines that doesn't have a ; which is the last character on the line. I tried to use sed 's/\n/ /g' After checking through the forums got to know that this sed will not work as it will remove the new-line character during reading the line (my assumption). As per the post in unix - How can I replace a newline (\n) using sed? - Stack Overflow, tried to use

sed -e ':a' -e 'N' -e '$!ba' -e ' /;/! s/\n/ /g'

and ended with compatibility issues. As per my understanding on sed, sed '/;/! s#\n# #g' might resolve, but I am facing compatibility issues; After some search replaced ! with b as sed '/;/b s#\n# #g' and this script too faced combatibility issues.

Sample file contents:

Table1@Table2@SELECT COL1,
COL2,COL3, 
COL4,COL5 FROM
TABLE1 INNER JOIN TABLE2 ON
COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1,
COL2, COL3,
............

Expected output:

Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I need the new-lines to be eliminated on all rows that doesn't contain ; so that I can have them in a sinlge cell of the excel sheet and then split them based on @ delimited and do a lookup to get the third column.

cat output_file > excel_lookup_ready.xlsx

I have done it through awk as (not replace with space; but remove newline)

awk -F "@" { if ($0 ~ /;/) {print $0 > output_file_awk.txt} else { printf $0 > output_file_awk.txt }}' output_file.txt 
cat output_file_awk.txt > excel_lookup_ready.xlsx

and the final file has the contents as what I need;
I am not able to achieve it through sed Can you please help me..
Thank you for your time..

RudiC · July 7, 2016, 1:26pm

Why do you insist on sed and aren't happy with the awk solution?

Try

sed ':L; /;$/bX; N; bL; :X; s/\n/ /g' file
Table1@Table2@SELECT COL1, COL2,COL3,  COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2, COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;

chill3chee · July 7, 2016, 1:42pm

Learning awk and sed through the valuable posts in the forum. Just curious on how to achieve this through sed . Thank you RudiC, request you to please explain the code..

RudiC · July 7, 2016, 1:59pm

It's appending N ext lines and branching back to label L until it finds the ; , then branches to X , replaces \n with spaces and prints the resulting line.

MadeInGermany · July 7, 2016, 2:39pm

Your sed needs to substitute all \n where the preceding characters is not a ;

sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^;]\)\n/\1/g' file

The file must fit into memory.
The following does an early replacement; only the output line must fit into memory

sed -e ':a' -e '/;$/b' -e '$b' -e 'N; s/\n//; ba' file

bakunin · July 7, 2016, 5:00pm

You might - especially as a beginner - want to make life easier for you and write real sed programs instead of one-liners only experts can decipher. It is like starting to learn the double toe-loop when you are skating for the first time in your life.

sed is basically a rule-based language: you describe types of lines (through regexps) and the actions which should be done once such a line is encountered. Now, start describing what types of lines you have and what you want to do once you encounter such a line. Your secription probably looks like this:

1) If a line doesn't end with a ";" char store the line and continue with the next.
2) If the line does end with a ";" char change the newlines in the lines stored so far to spaces (thus making one big line of the collected lines), output that and clear the storage, then continue with the next line.

Now, just in case the last line doesn't end with a ";" (probably it should, but safe is better than sorry) there might be a third rule:

3) if the line is the last line treat it like rule2 but instead of continuing just exit.

Now let us start with the implementation: we start with rule 2, because this way we will immediately see some effect. In the following i will always use your sample input, with an added ";" as the last character in the last line:

sed '/;$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We might want to allow for trailing white space at the end of ";" lines so we slightly modify ("<b>" and "<t>" are literal blank and tab characters):

sed '/;[<b><t>]*$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We didn't "recall" anything yet because we have nothing stored, but we will do that now as we implement rule 1. There is a usable buffer in sed, which is called "hold space", in opposition to the "pattern space", which is what you have read in and are working on. You cannot directly modify the hold space, but you can add the current pattern space to it, replace it with the pattern space or add its contents to the pattern space.

We are going to implement rule 1 by adding the content of the pattern space to the hold space. In our handling of the ";"-endling lines (rule 2) we will pull that hold space backu into the pattern space. See the sed man page for the commands:

sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

The sequence "H->d->x" is a little trick: add the line to the hold space, then clear it (from the pattern space), finally exchange hold space and pattern space, thus having back everything in the pattern space and clearing the hold space at the same time.

Now, the last line (rule 3). You probably can do that yourself already. Only now we drop the final ";" i have added to the sample input and restore it to its original as you posted it. The last transformation means: replace an optional ";" (along with eventually trailing white space) with a single ";". This way we remove trailing whitespace and at the same time make sure there is a ";" at the end. I have also added the trailing-white space removal to the lines ending in ";":

sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             s/[<b><t>]*//
             p
          }
     $$ {
             H
             d
             x
             s/\n/ /g
             s/;*[<b><t>]*$/;/
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I hope this helps.

bakunin

Aia · July 8, 2016, 12:02am

An alternative in Perl:

perl -ple 'BEGIN{$\=$/=";\n"} s/\n/ /g' chill3chee.file