How to remove new line character and append new line character in a file?

sasikari · October 26, 2010, 1:14pm

Hi Experts,

I have data coming in 4 columns and there are new line characters \n in between the data. I need to remove the new line characters in the middle of the row and keep the \n character at the end of the line.

File is comma (,) seperated.

Eg:

ID,Client ,SNo,Rank
37,Airtel \n Private \n limited,100,999\n
38,Vodaphone India \n Private Limited,200,888\n
39,Dell Limited,300,777\n
40,HP India Ltd\n,400,666\n

Delete red colour new line characters

Output should be :

ID,Client ,SNo,Rank
37,Airtel Private  limited,100,999\n
38,Vodaphone India Private Limited,200,888\n
39,Dell Limited,300,777\n
40,HP India Ltd,400,666\n

Here my requirement is , don't delete the \n when it comes after every 3rd comma and delete new line character (\n) in between.

Pl provide the solution to achive this.

Regards
Kari

ctsgnb · October 26, 2010, 1:20pm

sed 's: *\\n *: :g;s: *, *:,:g' infile

This would also remove space touching coma.

or

sed 's:\\n: :g;s:  *: :g' infile

s substitute
: separator
\\n pattern to replace (the \ is used to desactivate the \n )
<space> new pattern
g subtitution takes place globaly in the line even if more than 1 occurrence occur

the second substitution crush any number of space character into one in the line so you will never have more than 1 single space

# cat in
ID,Client ,SNo,Rank
37,Airtel \n Private \n limited,100,999\n
38,Vodaphone India \n Private Limited,200,888\n
39,Dell Limited,300,777\n
40,HP India Ltd\n,400,666\n
# sed 's:\\n: :g;s:  *: :g' in
ID,Client ,SNo,Rank
37,Airtel Private limited,100,999
38,Vodaphone India Private Limited,200,888
39,Dell Limited,300,777
40,HP India Ltd ,400,666
#

... could be gather in one substitution

sed 's: *\\n *: :g' in

a \n preceeded or followed by any number of space (even 0) is replaced by one single space

sasikari · October 27, 2010, 12:09am

Hi,
Thanks a lot for the reply.

I also get the data like below ,

ID,Client ,SNo,Rank
37,Airtel 
 Private  
limited,100,999
38,Vodaphone India 
 Private Limited,200,888
39,Dell Limited,300,777
40,HP India Ltd
,400,666
41,Orange

 India 
Private 


Limited,500,555

In the last row , there are more new lines.

There are new line characters in between the lines, and so when i view the data in test pad i can see the new line characters in between, but actually it shoud be at only endof the line. (This is occuring like this, because in source data is entering in xl doc and there are multiple Alt + Enters in particular fields)

Output should be

ID,Client ,SNo,Rank
37,Airtel  Private  limited,100,999
38,Vodaphone India Private Limited,200,888
39,Dell Limited,300,777
40,HP India Ltd,400,666
41,Orange India Private Limited,500,555

Here requirement is to delete the new line characters which occur in between the every 3 commas.
Like select first 3 commas and delete the new line characters in between and then again select the next 3 commans and delete if any new line in between (Don't delete new line which is coming after 3 commas) .....like this delete till the end of the file.

Pl provide me the solution for this

Regards
Kari

k_manimuthu · October 27, 2010, 4:14am

use strict;
use warnings;

undef $/;
my $file_name="unix.txt";
open (FIN, "$file_name");
my $file=<FIN>;
close (FIN);

$file=~ s{([^,]+,){3}\w+}{&clean_up($&)}ges;

open (FOUT, ">Output_$file_name");
print FOUT $file;
close (FOUT);

sub clean_up
{
	my ($text)=@_;
	$text=~ s{\n}{ }g;
	$text=~ s{  +}{ }g;
	$text=~ s{ ,}{,}g;
	$text=~ s{^ +}{}g;
	return "$text\n";
}

ctsgnb · October 27, 2010, 5:34am

echo `/usr/xpg4/bin/grep -vE "^[:blank:]*$" inputfile | sed 's|^\([0-9]\)|:\1|;s|\([0-9]\)$|\1:|' | tr '\n' ' '`| sed 's|: *:|:|g' | awk -F: '{print$0}' RS=:

# cat in
ID,Client ,SNo,Rank
37,Airtel
Private
limited,100,999
38,Vodaphone India
Private Limited,200,888
39,Dell Limited,300,777
40,HP India Ltd
,400,666
41,Orange

India
Private


Limited,500,555

# echo `/usr/xpg4/bin/grep -vE "^[:blank:]*$" in | sed 's|^\([0-9]\)|:\1|;s|\([0-9]\)$|\1:|' | tr '\n' ' '`| sed 's|: *:|:|g' | awk -F: '{print$0}' RS=:
ID,Client ,SNo,Rank
37,Airtel Private limited,100,999
38,Vodaphone India Private Limited,200,888
39,Dell Limited,300,777
40,HP India Ltd ,400,666
41,Orange India Private Limited,500,555


#

sasikari · October 27, 2010, 8:28am

Thanks a lot , it is perfectly working fine.

And mainly i have 128 columns in my source file , could you please let me know where exactly need changes to this code OR how to handle this scenario when i have 128 columns in the file.

Could you also explain me about this code (Sorry i couldn't understand this code , but it is perfectly working to my scenario).

Please provide me the solution for this.

Thanks a ton for the help

Regards,
Kari

ctsgnb · October 27, 2010, 9:43am

/usr/xpg4/bin/grep -vE "^[:blank:]*$" inputfile remove blank lines
sed 's|^$[0-9]$|:\1|;s|$[0-9]$$|\1:|' add ":" at beginning of line starting with a number and at the end of lines ending with a number
| tr '\n' ' ' put everything within one line
sed 's|: *:|:|g' substitute the pattern "two colons separated by any number of space (even 0)" by a single colon
awk -F: '{print$0}' RS=: print result using colon as Field separator and record separator

sasikari · October 27, 2010, 1:05pm

Thanks for the code provided by k_manimuthu

Thanks a lot for the details CTSGNB

I m trying to use the below code to handle the file having 128 columns, its basically delete new line characters occurs in between 127.

Code:

echo `/usr/xpg4/bin/grep -vE "^[:blank:]*$" in | sed 's|^\([0-9]\)|:\1|;s|\([0-9]\)$|\127:|' | tr '\n' ' '`|
  sed 's|: *:|:|g' | awk -F: '{print$0}' RS=:

Please correct me if i m doing wrong.

When i execute the above code i m getting error like below .

Error

ksh: 21555 Memory fault(coredump)

Where exactly we are mentioning the delimeter used in file , suppose if my file is pipe (|) delimiter , where to change the code.

Kindly provide me the information to use above code.

Thanks in Advance .

Regards,
Kari

ctsgnb · October 27, 2010, 6:09pm

Nope, the s|$[0-9]$$|\127:| just says if a lines ended by a figure between 0 and 9 then, append '27:'

so a lines like
.....0
.....1
.....2
.....3
.....4
...
.....9
after this substitution will look like
.....027:
.....127:
.....227:
.....327:
.....427:
...
.....927:
... which is not what you really expect ...

---------- Post updated 2010-10-28 at 12:09 AM ---------- Previous update was 2010-10-27 at 11:22 PM ----------

Maybe try this

FLAG=0
grep -vE "^[:blank:]*$" input | { while read line
do
( echo "$line" | grep ^[0-9] >/dev/null 2>&1 ) && FLAG=1
if ! ( echo "$line" | grep [0-9]$ >/dev/null 2>&1 ) && [ $FLAG -ne 0 ]
then
        printf "%s " $line
else
        echo "$line"
fi
done } >output