Remove \n <newline> character inside the records.

machomaddy · February 1, 2012, 6:31am

Hi,
In my file, I have '\n' characters inside a single record. Because of this, a single records appears in many lines and looks like multiple records. In the below file.

File 1
====
1,nmae,lctn,da\n
t
2,ghjik,o\n
ut,de\n
fk
 
Expected output after the \n removed
 
File 2
=====
1,nmae,lctn,dat
2,ghjik,out,defk

I tried

 perl -e 'while (<>) { if (! /\,$/ ) { chomp; } print ;}'

But this will work only if the record ends with the delimiter. Dose, someone have any suggestion on this.

birei · February 1, 2012, 7:17am

Hi machomaddy,

Try with sed. Here a test:

$ cat infile
1,nmae,lctn,da
t
2,ghjik,o
ut,de
fk
$ cat script.sed
## First line.
1 {
        ## If last one, print and quit.
        $ {
                p
                q
        }

        ## Else, save in hold space and read next line.
        h
        b
}

## Lines not beginning with number.
/^[0-9]/! {
        ## Append to hold space.
        H

        ## If last line, get content of hold space, remove newlines
        ## and print.
        $ {
                x
                s/\n//g
                p
        }
        b
}

## Lines beginning with number.
/^[0-9]/ {
        ## Exchange content with hold space. Save this line there and
        ## get past lines. Remove newlines and print.
        x
        s/\n//g
        p

        ## If last line, don't save current line, print it instead.
        $ {
                x
                p
        }
        b
}
$ sed -n -f script.sed infile
1,nmae,lctn,dat
2,ghjik,out,defk

Regards,
Birei

Scrutinizer · February 1, 2012, 7:49am

Try:

awk '/^[0-9]*,/{if(p)print p; p=$0; next}{p=p$0}END{print p}'  infile

durden_tyler · February 1, 2012, 8:47am

$
$ cat -n f56
     1  1,nmae,lctn,da
     2  t
     3  2,ghjik,o
     4  ut,de
     5  fk
$
$ perl -ne 'chomp; print "\n" if /^\d/ && $.>1; printf; END{print "\n"}' f56
1,nmae,lctn,dat
2,ghjik,out,defk
$
$

tyler_durden

machomaddy · February 2, 2012, 7:07am

Thanks both the code works fine :). But out of curiosity, how to handle if the first column is made of alphabets rather than numerics? In that case all records will become as a single record, correct? Sorry, if I am wrong.

Scrutinizer · February 2, 2012, 7:27am

Yes, to reconstruct the records, we used a certain characteristic of that record, in this case the first field consists of a number. If that were not the case then it would be more difficult, and we would need to use something else..