I have a pipe delimited file. There are around 700 columns in the file.
The 65th column has carriage return which is causing read issue with our ETL process. I would like to replace the new line characters in 65th field with "nothing"
i have return the following code and need help to fix it.
nawk -F"|" 'NF=65 && NF like \n {gsub(NF,\n,"") } file
\n is a newline, not a carriage return. Which do you mean?
If you mean an extra newline is breaking the line early, awk doesn't know the difference between the 'right' newlines and the 'wrong' ones, they're all just bytes; and it won't be able to delete a newline it hasn't read yet.
But you can count the number of fields to see if it broke early and save the partial record for next time.
awk -F"|" 'NF==65 { T=$0; next}; T { print T $0; T=""; next} 1
When i do octal dump on the line of concern, it looks like the following. It has \r and \n. This is happening at field 65. The only thing i like to do is if the new line character is found in 65th field i want to join this line with the next line.
Help is really appreciated.
---------- Post updated at 03:23 PM ---------- Previous update was at 03:21 PM ----------
I'd make a slight change knowing there's actual CR's in there:
Whenever it sees a line ending in \r, it'll change \r to space, save that line, fetch the next, and print both together. Otherwise it will print lines unmodified.
awk -F"|" '/\r/ { sub(/\r/, " "); T=$0; getline; print T $0; next } 1'