Joinging multiple lines that are backslashed

CDM · December 21, 2009, 9:22pm

I have the following input:

1 2 3 \
4 5

1 2 3 4 5 6 7 \
8 \
9 10

And I want to end up with the following:

1 2 3 4 5
1 2 3 4 5 6 7 8 9 10

In other words, how can I join multiple records/lines (2 or more) where the backslash character has been used to extend across multiple lines - using standard UNIX tools such as sed, awk, grep, sh, etc.

The actual application in this instance is joining backslashed multiple lines from the sudoers file.

CDM

daptal · December 21, 2009, 10:28pm

cat abc.txt | perl -e '
while (<>){ 
chomp;
if (m/\\/) {
      s/\\//g; print "$_" }
 else { 
     print "$_\n";
}
}'

HTH,
PL

jaduks · December 21, 2009, 11:14pm

Using awk:

$ awk '
{
    while ($0 ~ /\\$/ ) {
                              getline record;
                              $0 = $0 record
                              }
     print $0
}
' input.txt  | awk '{gsub(/\\/,"")};1'

rdcwayx · December 22, 2009, 12:12am

awk '{if ($0~/\\$/) {gsub(/\\/,"") ;printf $0} else {print $0}}' abc.txt

Scrutinizer · December 22, 2009, 3:23am

Technically there should be a space where the join took place, so IMO the \ and \n should be replaced with a space instead of just deleted:

sed ':a;N;s/\\\n/ /;ba' infile

CDM · December 22, 2009, 5:05pm

This seems to work fine (although only with gsed and not the sed that comes with Solaris 10).

Some of the lines that follow the backslah contain leading spaces/tabs and the backslahes are also preceded by a space. This is what I've found has worked, although it doesn't look particularly pretty:

# sed 's/^[1]//g' /etc/sudoers | \ # space & tab between brackets
gsed ':a;N;s/\\\n/ /;ba' | \
sed 's/,[ ]/,/g' # space & tab between brackets

Can I get all of this into a single sed statement?

Also, could you explain what't happening in the gsed statement you piointed out?

CDM

↩︎

jim_mcnamara · December 22, 2009, 5:14pm

This will fare better on Solaris I think -

sed -e :a -e '/\\$/N; s/\\\n//; ta'

a\
b
becomes ab - no space.
a \
b
becomes a b - one space

Scrutinizer · December 22, 2009, 5:31pm

This should get rid of spaces around the backslash:

sed ':a;N;s/[ \t]*\\\n[ \t]*/ /;ba' infile

It creates a label a ( a: )
It then joins two lines ( N )
After that leadings space before and after a \ followed by a linefeed (\n) get replaced by a single space
Then it branches (b) to label a.

CDM · December 22, 2009, 7:22pm

scrutinizer:

This should get rid of spaces around the backslash:
sed ':a;N;s/[ \t]*\\\n[ \t]*/ /;ba' infile
It creates a label a ( a: )
It then joins two lines ( N )
After that leadings space before and after a \ followed by a linefeed (\n) get replaced by a single space
Then it branches (b) to label a.

This works great. Thanks

OK, you've given me pretty much exactly what I asked for but this has only highlighted the fact that I mis-stated exactly what I wanted. Ultimately, I'm trying to generate a list of hosts that are represented in the sudoers file. This sed statement (which actually only works if I use gsed on my Solaris 10 host) now leaves me with something like this:

Host_Alias ALIAS1=host1,host2,host3

Host_Alias ALIAS2=host1,host3,host4,host5
Host_Alias ALIAS3=host1,host5,host6

etc.

OK, I can removed blank lines with grep or awk and I can chop off everything up to the = character with another sed statement and I might use tr, for example, to replace all the , characters with a newline character and I'm sure this would all get me what I want.

But ... could it ALL be done in a single command? List every host that's represented in the sudoers file?

CDM

---------- Post updated at 11:22 AM ---------- Previous update was at 11:11 AM ----------

This is the shortest pipeline I can come up with:

# gsed ':a;N;s/[ \t]\\\n[ \t]*//;ba;s/=//g' /etc/sudoers | \
awk -F= '/^Host_Alias/ {print $NF}' | \
tr ',' '\n' | \
sort | \
uniq

Scrutinizer · December 22, 2009, 7:38pm

Alternatively:

sed ':a;N;s/[ \t]*\\\n[ \t]*//;ba' /etc/sudoers|sed -n 's/^Host_Alias.*=//;s/,/\n/gp'|sort -u

CDM · December 22, 2009, 7:42pm

This doesn't filter out all the non Host_Alias entries in the file (sorry, that's me not stating the problem correctly again).

What's the significance of the -n and the p after the g?

CDM

Scrutinizer · December 22, 2009, 7:49pm

How about this?

sed ':a;N;s/[ \t]*\\\n[ \t]*//;ba' /etc/sudoers|sed -n '/^Host_Alias/{s/.*=//;s/,/\n/gp}'|sort -u

-n means do not print pattern space automatically
p means print the pattern space

CDM · December 22, 2009, 8:02pm

scrutinizer:

How about this?
sed ':a;N;s/[ \t]*\\\n[ \t]*//;ba' /etc/sudoers|sed -n '/^Host_Alias/{s/.*=//;s/,/\n/gp}'|sort -u
-n means do not print pattern space automatically
p means print the pattern space

Hmmm. OK, this works fine (but, again, only if I use gsed) but it does not capture entries that have only a single host listed like this:

Host_ALias ALIAA8=host12

I suspect because of the assumption of a comma being present on the line?

CDM

Scrutinizer · December 22, 2009, 8:46pm

I can do this:

sed ':a;N;s/[ \t]*\\\n[ \t]*//;ba' /etc/sudoers |
sed '/^Host_Alias/!d;{s/.*=//;s/,/\n/g;s/$/\n/p}'|sort -u

But that does not make it any easier and it prints a blank line

Or this:

awk -F '[, \t \\\]*' '/Host_Alias/,!/\\/{sub(/^Host_Alias.*=/,"");print}' /etc/sudoers |
grep -o '\w\+' |sort -u

But I don't see how this is an improvement really on what you created yourself (yawn).