Remove Special Characters Within Text

tarun.trehan · April 14, 2014, 9:22am

Hi,

I have a "|" delimited file that is exported from a database.
There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text.
Hence, when i export the data, this record with the new line character splits the record into 2.
Hence, the load fails. Following is an example :

I tried to replace the characters but was unable to do it between the 3rd column. And replace caused legal ones to be replaced as well.
Basically, i want to search a particular special character within the string of nth column.

Appreciate any inputs on this.

joeyg · April 14, 2014, 9:54am

In your example, it appears that you have an extra <CR> and <LF> in the 3rd field that are before the end of field/line. Is that what you are trying to eliminate?

Perhaps this will get you thinking:

$ echo "jim|jones|hairy" | sed "s/$/|~/g" | awk '{FS=OFS="|"; $0=$0; gsub("\151","e",$3); print }'
jim|jones|haery|~

I added a delimiter | and ~ characters to end of each line; this may be needed so as only to change CR and LF within the 3rd field.
the \151 is the octal representation for i in hairy. Thus, find the codes for CR and LF and try the above for your example.

tarun.trehan · April 14, 2014, 10:11am

Yes, they are between the string.

Don_Cragun · April 14, 2014, 1:33pm

Assuming you want to get rid of embedded carriage returns and change embedded newlines to spaces, you could try something like:

awk '
BEGIN {	FS = OFS = "|"}
{	gsub(/\r/, "")}
NF==3 {	if(out != "") print out
	out = $0
}
NF==1 {	out = out " " $0}
END {	if(out != "") print out}' input_file

If you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /usr/bin/awk .

RudiC · April 14, 2014, 5:46pm

Relying on what you describe, i.e. that the <CR> char is the indicator of the inappropriate line split (implying there will be no <CR> in the last line), this might do:

awk '/\r$/{sub(/\r/," ");getline X;$0=$0 X}1' file