I have following expression:
echo "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
I ran above code in AIX box and output is as follows
Sun 12 Jul BST 2014
Sun 12 Jul 2014
I ran above code in Linux box and output is as follows
Sun 12 Jul 2014
What I am missing?
Is there something not according to POSIX standards.
Linux Box Detail
>uname -a
Linux ****** 2.6.18-348.3.1.el5 #1 SMP Tue Mar 5 13:19:32 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
AIX box detail
# uname -a
AIX ****** 1 6 00F63E7C4C00
I get this on my Linux box, the results are actually the same, my linux system just doesn't do an actually process the new line...but the results are the same.
[josephgr@oc0887178221 ~]$ echo "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014
When I do this; however, on linux...
$ echo -e "Sun 12 Jul BST 2014 \n Sun 12 Jul 2014 \n Sun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print}'
Sun 12 Jul 2014
-E is the default for echo on linux and that means "disable interpretation of backslash escapes"
$ uname -a
Linux oc0887178221.ibm.com 2.6.32-431.21.1.el6.x86_64 #1 SMP Tue Jun 3 19:11:40 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
RudiC
July 28, 2014, 10:41am
3
What's your awk
version? Some Linuxes use mawk
which doesn't recognize the extended regex [A-Z]{3}. And, use echo -e
to have the "\n" interpreted correctly.
1 Like
Most awk on Linux do not recognize [A-Z]{3} which is called an interval expression, but it can be enabled like so:
echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk --re-interval '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
Sun 12 Jul BST 2014
Sun 12 Jul 2014
From man page:
--re-interval Enable the use of interval expressions in regular expression matching (see Regular Expressions, below).
Interval expressions were not traditionally available in the AWK language. The POSIX standard added them, to make awk
and egrep consistent with each other. However, their use is likely to break old AWK programs, so gawk only provides
them if they are requested with this option, or when --posix is specified.
Thanks in2nix4life, answer is specific one....
I got the point.
But is there any other way to handle interval expression so that it will work in my AIX and Linux version both.
Because
[A-Z]{3}
is an basic functionality, hence there must be something else which can replace it in my Linux version.
What's wrong with --re-interval ? It does exactly what you want...
You could also use --posix...
Also, "echo -e" is not crossplatform. Use "printf" to get consistent behavior.
printf "%s\n" "line1" "line2"
Simplest solution... Wrap your awk command in a script and detect which OS the script is being executed on and run the correct awk command.
Thanks RudiC, Corona,
Thanks in2nix4life solution is only and simplest solution i got for this after many googling.
Summary for others to help in future.
Simplest solution is to detect OS and Run the regex statement accordingly
case "`uname -s`" in
"AIX")
echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
;;
"Linux")
echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk --re-interval '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
;;
*)
echo "Unidenfied Operating System"
;;
esac
Rudic Solution will be best in case there are not many old scripts which uses interval expression.
I am using Rudic Solution by the way.
[A-Z][A-Z][A-Z]