Joining lines in a file - help!

I'm looking for a way to join lines in a file; e.,g consider the following

R|This is line 1
R|This is 
line 2
R|This is line 3
R|This is line 4
R|This is 
line 5

what i want to end up with is

R|This is line 1
R|This is line 2
R|This is line 3
R|This is line 4
R|This is line 5

so the 'real' lines start with R but there may be a line break in some of the lines therefore wanna get rid of the line break if the next line doesn't start with R|

make sense??
thanks in advance

PS been trying with sed and awk... but it's not very intuitive

One way:

awk 'NR==1{printf $0;next}/^R/{print ""}{printf $0}' file
1 Like

nice... thank you very much

now i'll try and figure out how that works!!

---------- Post updated at 08:38 AM ---------- Previous update was at 07:57 AM ----------

I've been playing around with this a little and was trying to fix a problem; i.e. there should be a space where the newline has been merged

so instead of

R|This isline 5

I want

R|This is line 5

changing to this:
awk 'NR==1{printf $0;next}/^R/{print ""}{printf " "$0}' file4

nearly worked!

another way:-

gawk '/^R/&&s{print s ; s=""}{s=s$0}END{printf s}' file.txt
o/p
R|This is line 1
R|This is line 2
R|This is line 3
R|This is line 4
R|This is line 5

BR
;);):wink:

1 Like

thanks, but that still leaves me the problem of spaces; so

R|This is 
Line1

changes to

R|This isLine1

but I want

R|This is Line1

any ideas??

I guyss I could use sed to do a substitution to put a space infront of every line that doesnt begin with R, that will solve it... but can anyone think of a neater way?

I am not facing this problem as per the o/p you see , but modify code as below:-

gawk '/^R/&&s{print s ; s=""}{s=s" "$0}END{printf s}' file.txt

I tried that... but that also gives a space at the beginning of the line
i.e.
. R|This is Line1
instead of
R|This is Line1

and to eliminate the space from beginning do the code:-

gawk '/^R/&&s{sub(/^[ ]/,"",s) ; print s ; s=""}{s=s" "$0}END{sub(/^[ ]/,"",s);printf s}' file.txt
1 Like
xargs -n 4 < infile

huaihaizi3:-

what if "This is Line 1" is containing more than 4 args?

---------- Post updated at 18:01 ---------- Previous update was at 17:49 ----------

or even you can do below:-

gawk '/^R/&&s{print s ; s=""}{s=s" "$0; sub(/^[ ]/,"",s) }END{printf s}' file.txt

:wink:

great stuff that works a treat

i was gonna put a space at the beginning of each line that doesnt start with R by the following:
sed -i '/R|/!s/^/ /' file4

then run the original solution, however yours is much cleaner

# cat file
R|This is line 1$
R|This is   $
line 2$
R|This is line 3$
R|This is line 4$
R|This is  $
line 5$
R|This is $
line 6$
R|This is$
line 7$
# awk '{if(!/^ *line/){x=$0;s1=gensub("(.*is).*","\\1",x);s=gensub("is(.*)","\\1",x);if(s~/ +/)s=" ";if(/line/)print x}\
else{xx=xx?xxFS$0:$0;print s1 s xx}}' file
R|This is line 1
R|This is line 2
R|This is line 3
R|This is line 4
R|This is line 5
R|This is line 6
R|This is line 7

regards
ygemici

with sed

 sed '
/[^[:digit:]]$/ {
N
s/\n//g
}' filename

My initial data was just an example; the data could be:

R|This is line 1 and so
it continues|next filesd
and more| still
R|This is line2

Ahmad's gawk line works for this aswell

i.e. gives result

R|This is line 1 and so it continues|next filesd and more| still
R|This is line2

Just one point though the last line didn't have a new line after it so I just ran
sed -i '$a\' ouputfile

to solve it

(I'm running the script opn multiple files then joining them after, so I need a nerw line at end of file)

Please don't give just an example data. This is a form to help people who really seeking for help. And if, you are looking for any answer please give exact data to help you out. Thanks.

After having some issues around spaces and format I decided to try a basic approach, so I wrote an easy to read script. It may not be the most optimal but it does the job and is easy to maintain.

So, the code joins lines into one (so that they all start with R| ). Then performs some sentiment analysis on the line and appens |positive |negative or |neutral at the end

BEGIN {
        FS = "|"
        OFS = "|"
        IGNORECASE = 1
}
{
  full_line = $0
  eof = 1
  while (eof > 0) {
        eof = getline next_line
        if (eof <= 0)
        {
                sentiment(full_line)
        } else if (next_line !~ /^R\|/)
        {
          if ((full_line ~ / $/) || (next_line ~ /^ /)) {
                full_line = full_line next_line
          } else {
                full_line = full_line " " next_line
          }
        } else if (next_line ~ /^R\|/)
        {
                sentiment(full_line)
                full_line = next_line
        }
  }

}

function sentiment(LINE) {
        if (match(LINE, /(no[t]*|wasn[']*t|isn[']*t|ain[']*t) (that |very )*\y(good|great|best|excellent|brilliant|amazing|magificient|fantastic)\y/))
        {
                print LINE, "negative"
        }
        else if (match(LINE, /\y(shit|crap|awful|worst|stupid|idiotic|arsehole|bastard|shite|disgusting|nightmare|hate|hated|angry)(s)*\y/))
        {
                print LINE, "negative"
        }
        else if (match(LINE, /\y(good|great|best|positive|excellent|brilliant|amazing|magificient|fantastic)\y/))
        {
                print LINE, "positive"
        }
        else
        {
                print LINE, "neutral"
        }

}