I noticed some strange behavior when reading from a windows input file with awk. Is there some trick to do this? Do you have to change the input file to a unix input file?
Yes, very simple. Put the following at the beginning of your awk script (but not in a BEGIN or END section!):
{ sub(/\r$/, "") }
It simply deletes a \r (CR) before the Unix newline (LF).
Where would you put this if it is a one liner? I was doing this that you saw in the other post.
awk '/^--/{prev = prev ";"} {print prev} {prev = $0} END {print}' file
awk '{ sub(/\r$/, "") } /^--/{prev = prev ";"} {print prev} {prev = $0} END {print}' file
You don't have to, yet you still can, with e.g. dos2unix windows_input_file.txt
(if it's not available in your flavor of Unix, you can try implementing it by e.g. using awk command mentioned above)
You can also modify RS (the Record Separator) so it ignores an optional Carriage Return before a NewLine.
The RS can be a pattern (in GNU/Awk), and the matched bytes are removed before the line is put into $0.
This works with both Linux and Windows files (but probably not MacOS).
$ printf '12345\r\n678\n\r\nTen\r\n' | awk -v RS='\r?\n' '{ print length, $0; }'
5 12345
3 678
0
3 Ten
$
Thank you :).
I have it in my version. Just didn't know if there was a trick with awk to handle it.
tr -d '\r' will also remove
That doesn't work in Unix right? I am using Sunos.
Thank you :).
unfortunately I don't have a sunos system to test, but tr has been around *ix since the 70's . Are you saying you've tried it on your system and it doesn't exist/work something else ?
Sorry the first part was directed at Paul_Pedant. I edited the previous post to make it clearer.
Does tr provide output? It did not provide any error messages so not sure if it actually worked right. These are the two ways to use tr right? I was looking at this tutorial so think I did it correctly.
-bash
: tr -d '\r' < file
-bash
: cat file | tr -d '\r'
-bash
:
What are the : ??
If you typed them in then they are no-op commands.
Perhaps the Solaris tr does not recognize the \r as a CR character?
Then instead use \015
See
man ascii
If you use awk anyway then use the sub() function rather than an extra tr command.
AT&T Unix SysVR4 shipped with oawk and nawk, see the history in this article
Unfortunately the Solaris /usr/bin/awk is a softlink to oawk (old poor awk), so you must use nawk or the Posix-compliant /usr/xpg4/bin/awk
(BTW all other Unix vendors link awk to nawk).
nawk '{sub(/\r$/, "")} /^--/{prev = prev ";"} {print prev; prev = $0} END {print prev}' file
awk on Solaris has always been problematical.
There are three or four versions available, and I don't have anywhere to test this properly. I did test GNU/awk with awk --traditional, and it just uses the literal first character of RS as the record separator, and ignores the rest of RS.
SunOS kept the original awk unchanged, because some of the sysadmin and other basic tools were written in that dialect, and Sun didn't want to risk breaking any of that.
Sun released nawk (new awk) which was a later version. I found this message on a forum called All awk's on Solaris are broken!, dated 2010, so some nawks do allow a RE and older ones do not.
- fixed: nawk doesn't handle RS as a RE but as a single character, PR/30294
Solaris also may have /usr/xpg4/bin/awk or /usr/xpg6/bin/awk installed, and you can install GNU/awk (which may be one of those variants by another name).
There might be a Solaris-compatible mawk around, too. mawk 1.3.3 accepts a pattern for RS on my Linux Mint 19.3.
If the RS pattern does not work, you should be able to remove the CR that is left over when awk terminates a line at the NewLine, using { sub ($0, /\r$/, ""); }
That should be faster than removing all CRs (because it is anchored to the end of the line), and it is just possible that CR can be intended as a valid character in the data.
It is added to the beginning of every line you type on. Its just how my terminal displays. It doesn't bother me so I never changed it.
Like this?
-bash
: tr -d \015 < file
-bash
: tr -d '\015' < file
-bash
: cat file | tr -d '\015'
-bash
: cat file | tr -d \015
-bash
:
It looks like awk is linked nawk. I guess the admins changed it to make their lives easier?
-bash
: ls -l /usr/bin/awk
lrwxrwxrwx 1 root root 18 Jul 28 2021 /usr/bin/awk -> ../../usr/bin/nawk
-bash
: ls -l /usr/ccs/bin/awk
lrwxrwxrwx 1 root root 18 Jul 28 2021 /usr/ccs/bin/awk -> ../../usr/bin/nawk
I have the Posix compliant one.
-bash
: ls -l /usr/xpg4/bin/awk
-r-xr-xr-x 1 root root 122248 Jan 24 2023 /usr/xpg4/bin/awk
I have the first one.
-bash
: ls -l /usr/xpg4/bin/awk
-r-xr-xr-x 1 root root 122248 Jan 24 2023 /usr/xpg4/bin/awk
-bash
: ls -l /usr/xpg6/bin/awk
/usr/xpg6/bin/awk: No such file or directory
Is gnu awk and gawk the same thing? I think I have gawk.
-bash
: ls -l /usr/bin/gawk
lrwxrwxrwx 1 root root 14 Apr 9 2018 /usr/bin/gawk -> ../gnu/bin/awk
-bash
: type -a gawk
gawk is /usr/bin/gawk
gawk is /usr/ccs/bin/gawk
-bash
: ls -l /usr/ccs/bin/gawk
lrwxrwxrwx 1 root root 14 Apr 9 2018 /usr/ccs/bin/gawk -> ../gnu/bin/awk
I do not have mawk.
-bash
: mawk
-bash: mawk: command not found
Ah ok, the newline and the : are part of your shell prompt.
echo "$PS1"
I guess your "file" is empty, or there is a \0 byte at the beginning. Check with
ls -l file
od file
Obviously. nawk fixes a lot, and I am 99.9% sure it won't break anything.
Yes, by convention gawk stands for GNU awk.