Awk to remove carriage return from 65th field

pinnacle · March 21, 2012, 2:10pm

Hi,

I have a pipe delimited file. There are around 700 columns in the file.

The 65th column has carriage return which is causing read issue with our ETL process. I would like to replace the new line characters in 65th field with "nothing"

i have return the following code and need help to fix it.

 
nawk -F"|" 'NF=65 && NF like \n {gsub(NF,\n,"") } file

Please help me fix this.

Corona688 · March 21, 2012, 2:18pm

\n is a newline, not a carriage return. Which do you mean?

If you mean an extra newline is breaking the line early, awk doesn't know the difference between the 'right' newlines and the 'wrong' ones, they're all just bytes; and it won't be able to delete a newline it hasn't read yet.

But you can count the number of fields to see if it broke early and save the partial record for next time.

awk -F"|" 'NF==65 { T=$0; next}; T { print T $0; T=""; next} 1

pinnacle · March 21, 2012, 4:23pm

When i do octal dump on the line of concern, it looks like the following. It has \r and \n. This is happening at field 65. The only thing i like to do is if the new line character is found in 65th field i want to join this line with the next line.

Help is really appreciated.

---------- Post updated at 03:23 PM ---------- Previous update was at 03:21 PM ----------

In the 65th field we are expecting data like

"ABC DEF"

but what we are receiving is

"ABC
DEF"

we like to convert to "ABC DEF" using awk

Appreciate help.

Corona688 · March 21, 2012, 4:41pm

Did my code do anything?

I'd make a slight change knowing there's actual CR's in there:

Whenever it sees a line ending in \r, it'll change \r to space, save that line, fetch the next, and print both together. Otherwise it will print lines unmodified.

awk -F"|" '/\r/ { sub(/\r/, " "); T=$0; getline; print T $0; next } 1'

pinnacle · March 21, 2012, 6:42pm

corona688:

Did my code do anything?

I'd make a slight change knowing there's actual CR's in there:

Whenever it sees a line ending in \r, it'll change \r to space, save that line, fetch the next, and print both together. Otherwise it will print lines unmodified.
awk -F"|" '/\r/ { sub(/\r/, " "); T=$0; getline; print T $0; next } 1'

Thanks Corona this works but needs few changes.

These works but need 3 small changes.

The replacement should not happen for all records, should happen only when the carriage return is present in the 65th field.
[/list]

Carriage return needs to be replaced with "" (empty) instead of " " space
And to replace double quotes " in 65th field with empty

Appreciate help

Corona688 · March 22, 2012, 1:41am

Why would carriage returns be wanted in any of the records, unless the entire file is delimited with \r\n ? Whether it is would be good to know.

awk -F"|" 'NF==65 { sub(/\r/, ""); T=$0; getline; print T $0; next} 1'

pinnacle · March 22, 2012, 11:43am

corona688:

Why would carriage returns be wanted in any of the records, unless the entire file is delimited with \r\n ? Whether it is would be good to know.
awk -F"|" 'NF==65 { sub(/\r/, ""); T=$0; getline; print T $0; next} 1'

Thanks Corona !!

There is a possibility that all the files at the end may have carriage return as the file is ftped to us, hence i want to look only in the 65th field.

This works

I modified and tried the below but does not work. I dont know where i am going wrong.

 
awk -F"|" 'NF==65 { sub(/\r/, ""); sub(/"/, " "); T=$0; getline; print T $0; next} 1' 

awk -F"|" 'NF==65 { sub(/\r/, ""); sub(/\"/, " "); T=$0; getline; print T $0; next} 1' 

awk -F"|" 'NF==65 { sub(/\r/, ""); sub(/\"\"/, " "); T=$0; getline; print T $0; next} 1'

Help is appreciated.

Corona688 · March 22, 2012, 11:51am

Try sub(/"" $/, " ", $65);