I have a huge file (50 Mil rows) which has certain non-printable ASCII characters in it. I am cleaning the file by deleting those characters using the following command -
Please note that I am excluding the following -
tab, linefeed, carriage-return and all keyboard characters while cleaning the file.
However, besides cleansing the file (by the above command) I also need to identify the rows which have these non-printable ASCII characters and redirect them to another file.
As stated earlier, can anyone please advise how I can capture these rows (with non-printable characters) in another file ?
With default record separators, <newline> characters are stripped from $0 when each line is read and the default print command (used when the condition evaluates to TRUE and there is no action section specified) will add a <newline> to the output. So, the two commands:
produce exactly the same output for any input file. (But, the results are unspecified if the last character in a non-empty input file is not a <newline> character.)
And, as MadeInGermany said, <carriage-return> is not a normal character in a UNIX/Linux text file. Unless you're processing DOS format text files, you probably want to copy lines containing <carriage-return> characters from unclean_file to nonPrint_lines .