Views How to replace a CRLF char from a variable length file in the middle of a string in UNIX?

My sample file is variable length, with out any field delimiters. It has min of 18 chars length and the 'CRLF' is potentially between 12-14 chars. How do I replace this with a space? I still want to keep end of record, but just want to remove these new lines chars in the middle of the data.

sample data: in this data record type statrting with '1234' continued on nextline due to a newline char in the middle of the record.
1123xxsdfdsfsfdsfds
1234ddfxxy
fffrrr
1123dfdffdfdxxxxxxxxx
1234ydfyyy
zmkn

------ Post updated at 02:42 PM ------

the output I want is :

1123xxsdfdsfsfdsfds
1234ddfxxyfffrrr
1123dfdffdfdxxxxxxxxx
1234ydfyyyzmkn

Thank you!

Hi chandrath...
Welcome to these froums.

Which OS are you using.
Which shell have you got,
Any preferred tools,
And finally your coding attempts at solving your problem...

On top of what wisecracker already said, a few additional questions:

Didn't you say you want a space introduced between the partial lines? Your sample output doesn't have such.
And, your "completed" lines (2 and 4) don't have the required minimum length of 18 chars.
Is it always one single line to be added to an incomplete one, or can there be more?

Try putting the following code into a command file for sed:

/^1234/ {
         N
         s/\n/ /
        }

How to use it:

sed -f commandFile inputFile > outputFile

Once a line starting with "1234" is read, the next line is appended into sed, separated by a new_line. Then that new_line is replaced with a space.

HTH

P.S. It was noted you wanted a space when you wanted to take out the CRLF but this was missing in your example output. If you don't want a space, take out the space after the second slash in the s command.

@wbport, thank you!The 1234 record may not have a new line chars too.. The updated sample input file:

1123xxsdfdsfsfdsfds
1234ddfxxy
fffrrr
1123dfdffdfdxxxxxxxxx
1234ydfyyy
zmkn
1234asdafxxfrrrfrrr
1123werwetrretttrretertre

output file:

1123xxsdfdsfsfdsfds
1234ddfxxyfffrrr
1123dfdffdfdxxxxxxxxx
1234ydfyyyzmkn
1234asdafxxfrrrfrrr
1123werwetrretttrretertre

------ Post updated at 10:01 PM ------

Thank you for all inputs. Sorry for the confusion about the problem example. Here I am reposting the problem.

My sample file is variable length, with out any field delimiters. It has min of 18 chars length and the 'CRLF' is potentially (not always) between 11-15 chars.
How do I replace this type of new line char with a space? I still want to keep end of record, but just want to replace these new lines chars in the middle of the data with a space.
In below example record # 2, #4 (for a record type of begining '1234') has a 'CRLF' char in b/w 11-15 char of record which I want to replace with space.
my OS: Unix AIX 7.2
updated Input:

1123xxsdfdsfsfdsfdssa
1234ddfxxyff
frrrdds
1123dfdffdfdxxxxxxxxxas
1234ydfyyyzm
knsaaass
1234asdafxxfrrrfrrrsaa
1123werwetrretttrretertre

Expected output:

1123xxsdfdsfsfdsfdssa
1234ddfxxyfff rrrdds
1123dfdffdfdxxxxxxxxxas
1234ydfyyyzm knsaaass
1234asdafxxfrrrfrrrsaa
1123werwetrretttrretertre

What I tried:

sed '/^.\{15\}$/!N;s/./ /11' filename

But it's just adding space @ col 11, not removing CRLF b/w cols 11-15.

Thank you!

I do not see any CR (0x0D, \r) just LF (0x0A, \n)

 od -c example.file
0000000   1   1   2   3   x   x   s   d   f   d   s   f   s   f   d   s
0000020   f   d   s   s   a  \n   1   2   3   4   d   d   f   x   x   y
0000040   f   f  \n   f   r   r   r   d   d   s  \n   1   1   2   3   d
0000060   f   d   f   f   d   f   d   x   x   x   x   x   x   x   x   x
0000100   a   s  \n   1   2   3   4   y   d   f   y   y   y   z   m  \n
0000120   k   n   s   a   a   a   s   s  \n   1   2   3   4   a   s   d
0000140   a   f   x   x   f   r   r   r   f   r   r   r   s   a   a  \n
0000160   1   1   2   3   w   e   r   w   e   t   r   r   e   t   t   t
0000200   r   r   e   t   e   r   t   r   e  \n
0000212

In case this is useful.

perl -pe 'if((length) < 18 && not $seen) {s/\n/ /; ++$seen} else { undef $seen }' example.file

1123xxsdfdsfsfdsfdssa
1234ddfxxyff frrrdds
1123dfdffdfdxxxxxxxxxas
1234ydfyyyzm knsaaass
1234asdafxxfrrrfrrrsaa
1123werwetrretttrretertre

I see if your 1234 line is followed by a 1123 line you don't want to combine lines. For the following I assume your data will never have a tilde character "~". If it does substitute some other "never happen" character or stream of characters in the sed command file.

Here I change the newline to a tilde. If the characters after that are 1123, change the tilde back to a newline. If not, delete the tilde and the line following 1234 will remain appended.

HTH

/^1234/ {
         N
         s/\n/~/
         s/~1123/\n1123/
         s/~//
        }

Hope this helps...

awk '{x[NR]=$0} END {
     for (i=1;i<=NR;i++) {
         if (length(x) >= 18)
            print x
         else
            print x,x[++i]
     }
}' file