Combining lines in one line

scriptor · April 4, 2018, 4:05am

Hi
below is the input file snippet.
here i want that all the line which is coming after 1 shoud be in one line.
so for exanple if after 1 there is two lines which is starting with 2 should be combine in one line.
input file content

1,8091012,BATCH_1430903_01,21,T,2,808738,,,,21121:87:01,
2,A,79020:04:25,IC,,
2,D,89014:03:40,u,7239021674,
1,5390021,BATCH_1330903_02,21,T,2,306738,,,,21121:15:01,
2,A,6190:05:09,IC,,
1,9650123,BATCH_1110903_02,21,T,3,904428,,,,21121:34:11,
2,A,9270:00:22,IC,,
1,9992013,BATCH_1000903_06,21,T,7,50128,,,,21121:99:15,
2,A,7790:05:24,IC,,

Required O/P

1,8091012,BATCH_1430903_01,21,T,2,808731,,,,21121:87:01,|2,A,79020:04:25,IC,,|2,D,89014:03:40,u,7239021674,
1,5390021,BATCH_1330903_02,21,T,2,306730,,,,21121:15:01,|2,A,6190:05:09,IC,,
1,9650123,BATCH_1110903_02,21,T,3,904429,,,,21121:34:11,|2,A,9270:00:22,IC,,
1,9992013,BATCH_1000903_06,21,T,7,501283,,,,21121:99:15,|2,A,7790:05:24,IC,,

RudiC · April 4, 2018, 4:20am

Please show some learning curve and present an attempt of your own - with 170 posts in here some basic understanding could be expected, no?

scriptor · April 4, 2018, 5:46am

hi Rudic

earlier I tried

paste -sd "|"

but this was combining the complete line as below .

 
 1,8091012,BATCH_1430903_01,21,T,2,808738,,,,21121:87:01,|2,A,79020:04:25,IC,,|2,D,89014:03:40,u,7239021674,|1,5390021,BATCH_1330903_02,21,T,2,306738,,,,21121:15:01,|2,A,6190:05:09,IC,,|1,9650123,BATCH_1110903_02,21,T,3,904428,,,,21121:34:11,|2,A,9270:00:22,IC,,|1,9992013,BATCH_1000903_06,21,T,7,50128,,,,21121:99:15,|2,A,7790:05:24,IC,,

RudiC · April 4, 2018, 6:49am

One possible solution:

sed -n '/^1/ {1h; 1! {x; s/\n//g; p; }}; /^2/ {s/^/|/; H; }; $ {x; s/\n//g; p; } ' file
1,8091012,BATCH_1430903_01,21,T,2,808738,,,,21121:87:01,|2,A,79020:04:25,IC,,|2,D,89014:03:40,u,7239021674,
1,5390021,BATCH_1330903_02,21,T,2,306738,,,,21121:15:01,|2,A,6190:05:09,IC,,
1,9650123,BATCH_1110903_02,21,T,3,904428,,,,21121:34:11,|2,A,9270:00:22,IC,,
1,9992013,BATCH_1000903_06,21,T,7,50128,,,,21121:99:15,|2,A,7790:05:24,IC,,

RavinderSingh13 · April 4, 2018, 8:29am

Hello scriptor,

Could you please try following and let me know if this helps you.

awk '{printf("%s%s",$0~/^1/ && FNR>1?ORS:"",$0)} END{print ""}' Input_file

Thanks,
R. Singh

scriptor · April 5, 2018, 2:40am

Hi Ravinder,

yes it works.
can you please explain me this command

RavinderSingh13 · April 5, 2018, 3:11am

Hello scriptor,

Following is the explanation which may help you on same.

 printf("%s%s",$0~/^1/ && FNR>1?ORS:"",$0) means, using printf command %s%s means telling printf that there are 2 strings
                                          to be passed to it. Then while passing value of 1st string checking condition 
                                          $0~/^1/ if a line starts from 1 and FNR>1 and it's line number is greater than 1
                                          if condition is TRUE then execute statements after ? if not then execute statement after :
                                          For 2nd string simply printing $0.

Thanks,
R. Singh

RudiC · April 5, 2018, 3:59am

This minor extension of Ravindersingh13's proposal prints even the requested | separator:

awk '{printf("%s%s%s",/^2/?"|":"", /^1/ && FNR>1?ORS:"",$0)} END{print ""}' file

EDIT: or even

awk '{printf("%s%s",/^2/?"|":/^1/ && FNR>1?ORS:"",$0)} END{print ""}' file

EDIT: or even

awk '{printf("%s%s",/^1/?DL:"|",$0); DL=ORS} END{print ""}' file

scriptor · April 5, 2018, 7:22am

Hi Rudic

you suggestion works

 
 sed -n '/^1/ {1h; 1! {x; s/\n//g; p; }}; /^2/ {s/^/|/; H; }; $ {x; s/\n//g; p; } ' file

in this can you please explain the working of below part.

/^1/

--> this will print lines starting with 1. only this part I understand

1h;

1!

{x; s/\n//g; p; }
/^2/

{s/^/|/; H; };

{x; s/\n//g; p;

whoever I googled and found below things but still scratching my head to understand.

(h)function copies the contents of the pattern space into a holding area 
(g) function copies the contents of the holding area into the pattern space, 
destroying the previous contents of the pattern space
(H) function appends the contents of the pattern space to the contents of the holding area.
(x)function interchanges the contents of the pattern space and the holding area.

also I will be grateful to your if suggest me how should I also learnt or understand so that I can too build similar 1 line coding.

---------- Post updated at 04:52 PM ---------- Previous update was at 04:48 PM ----------

HI Ravinder,

in your syntax

awk '{printf("%s%s",$0~/^1/ && FNR>1?ORS:"",$0)} END{print ""}'

what is the working of

$0~

over net I found it means

represents your home folder

but this doesn't fit in this case I guess.

MadeInGermany · April 5, 2018, 7:42am

Another one-liner with sed (all versions)

sed -e ':L' -e '$!N;/\n1,/{P;D;}' -e 's/\n/|/;tL' file

Better readable as multi-liner

sed '
  :L
  $!N
  /\n1,/{
    P;D
  }
  s/\n/|/
  tL
' file

scriptor · April 5, 2018, 7:54am

thx everyone
if guys help me understand this .
I am scratching my head since morning to understand Rudic's one liner
but fails.

RudiC · April 5, 2018, 8:20am

Thank you.

Explaining sed 's intricate operation in depth exceeds my language capabilities as well probably space provided in here; on top, there's many texts on the topic in them there internet sites... once you're finished reading sed 's man and / or info pages as the principle sources of information.
In short, sed has a pattern space and a hold space; on the former all commands operate upon, the latter is only copied and / or appended to / from, or exchanged. The commands can be influenced by (ranges of) addresses, which themselves can be regex (important: man regex !) matches like /pattern/ ( /^1/ matches a char "1" in the first place of a line) or line numbers (1 is the first line in an input stream). Don't mix up the two! The most powerful sed command is s(ubstitute): s/\n//g globally substitutes the regex pattern \n (escape sequence interpreted as a <new line> char) with the empty string, effectively removing it.

L1 - reading - exercise - reading - exercise - goto L1

scriptor · April 5, 2018, 8:48am

thx a lot Rudic for valuable suggestion.

however if you can only explain your syntax in details I will very thankful to you

sed -n '/^1/ {1h; 1! {x; s/\n//g; p; }}; /^2/ {s/^/|/; H; }; $ {x; s/\n//g; p; } ' file

RudiC · April 5, 2018, 8:54am

How about you try to explain it as far as you get, and I / we will jump in and fill in the gaps and / or correct misperceptions. It might be worthwhile to use a paper slip and sketch pattern and hold space and their contents?

MadeInGermany · April 5, 2018, 9:36am

scriptor:

Hi Rudic
...
/^1/ 
--> this will print lines starting with 1. only this part I understand
Not print; it selects lines that start with a 1 (I suggest to also demand a following comma)
/^1,/
for running the following command, in this case a complete { code block }.

As you found out, it uses the hold space with the commands h H x
The hold space is a bit cumbersome because line 1 and the last line $ need special treatment. (Indeed my solution without the hold space does not need to handle these border cases.)
A bit odd is /^2/{ ... } , that selects lines that start with a 2; shouldn't it be not /^1,/ i.e. /^1,/!{ ... } ?

MadeInGermany · April 5, 2018, 3:14pm

RudiC's idea (with hold space), optimized, as multi-liner with comments

# sed -n : no default print
sed -n '
# if line starts with 1, then
  /^1,/ {
# exchange with hold space
    x
# substitute embedded newlines with | and print if successful (in line 1 will not print)
    s/\n/|/gp
# jump to :L
    bL
  }
# else append to hold space
  H
  :L
# if last line then
  $ {
# exchange with hold space (or copy from hold space with g)
    x
# substitute embedded newlines with | and print
    s/\n/|/g; p
  }
' file

And my idea (pattern space only), as multi-liner with comments

sed '
  :L
# append next line
# (Unix sed: N in the last line exits without default print, needs $b;N or $!N)
  $!N
# if the next line begins with 1, then
  /\n1,/ {
# print and delete the current line; D jumps to next cycle
    P;D
  }
# substitute embedded newline with |
  s/\n/|/
# if successful then jump to :L
  tL
# default print happens here
' file

scriptor · April 6, 2018, 3:02am

Hi Rudic

from your syntax

sed -n '/^1/ {1h; 1! {x; s/\n//g; p; }}; /^2/ {s/^/|/; H; }; $ {x; s/\n//g; p; } ' file

below is what I still not understand

{1h; 1!  ---->

not able to understand this part.
what does

1h

and

1!

means here and its working
below is what I understand.

 
 /^2/ {s/^/|/; H; ----> this part select line starting with 2 and appending pipe"|" at the starting.
H is putting this in holding area

 
 s/\n//g ---> this part replace newline char with empty line

now my confusion is, if below is the file

 
 cat v
a\nb\nc\nd 
a b c d 
\n
xyx 
 xyz

then when why here sed command is not replacing newline char in this case when I

sed 's/\n//g' v

may be my question seem silly to you ... but only this way I can clear my doubt. as I do not hv no other means

RudiC · April 6, 2018, 3:26am

scriptor:

. . .
what does

1h  -  as said before, you have addresses and commands; here: on first line, copy pattern space to hold space. Multiple addresses possible.

and

1!  -  ! means negation; some intricacy of sed: perform the command following on any but first line.

. . .

/^2/ {s/^/|/; H; ----> this part select line starting with 2 and appending pipe"|" at the starting.
H is putting this in holding area     ---  YES

. . . now my confusion is, if below is the file

 
 cat v
a\nb\nc\nd  - these don't seem to be <new line> chars but combinations / sequences of literal "\" char and "n" char.
a b c d     -  <new line> is the ascii char 0x0D, see ascii table.
\n
xyx 
 xyz

. . .

MadeInGermany · April 6, 2018, 3:33am

A \n an embedded newline character (not two consecutive characters \ and n).
sed loops over the lines in the input file, so the sed code only has one line in its "pattern space".
The \n character is only created after an append command like N or H or G or a.
Normally an RE can handle only one line, and ^ and $ mark the beginning and end of the line.
For example a line

line1

The RE can see

^line1$

After an append there is a \n between the two lines, like this

line1\nline2

An RE can see it like

^line1\nline2$

RudiC · April 9, 2018, 4:36am

Post#10 shows a very interesting approach by MadeInGermany, both in method as well as esp. in script length. Here's a revised version of mine stripped down to minimum length:

sed -n '/^1/bL; {s/^/|/;H;}; ${:L;x;s/\n//gp;}' file