Remove Special Characters Within Text

Hi,

I have a "|" delimited file that is exported from a database.
There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text.
Hence, when i export the data, this record with the new line character splits the record into 2.
Hence, the load fails. Following is an example :

1|user|i am happy.
2|user|i live
in india.
3|user|i am male.

I tried to replace the characters but was unable to do it between the 3rd column. And replace caused legal ones to be replaced as well.
Basically, i want to search a particular special character within the string of nth column.

Appreciate any inputs on this.

In your example, it appears that you have an extra <CR> and <LF> in the 3rd field that are before the end of field/line. Is that what you are trying to eliminate?

Perhaps this will get you thinking:

$ echo "jim|jones|hairy" | sed "s/$/|~/g" | awk '{FS=OFS="|"; $0=$0; gsub("\151","e",$3); print }'
jim|jones|haery|~

I added a delimiter | and ~ characters to end of each line; this may be needed so as only to change CR and LF within the 3rd field.
the \151 is the octal representation for i in hairy. Thus, find the codes for CR and LF and try the above for your example.

Yes, they are between the string.

Assuming you want to get rid of embedded carriage returns and change embedded newlines to spaces, you could try something like:

awk '
BEGIN {	FS = OFS = "|"}
{	gsub(/\r/, "")}
NF==3 {	if(out != "") print out
	out = $0
}
NF==1 {	out = out " " $0}
END {	if(out != "") print out}' input_file

If you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /usr/bin/awk .

Relying on what you describe, i.e. that the <CR> char is the indicator of the inappropriate line split (implying there will be no <CR> in the last line), this might do:

awk '/\r$/{sub(/\r/," ");getline X;$0=$0 X}1' file