awk regex expression works in AIX but not in Linux

kamlesh_pradhan · July 28, 2014, 10:14am

I have following expression:

echo "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'

I ran above code in AIX box and output is as follows

Sun 12 Jul BST 2014
Sun 12 Jul 2014

I ran above code in Linux box and output is as follows

Sun 12 Jul 2014

What I am missing?
Is there something not according to POSIX standards.

Linux Box Detail

>uname -a
Linux ****** 2.6.18-348.3.1.el5 #1 SMP Tue Mar 5 13:19:32 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

AIX box detail

# uname -a
AIX ****** 1 6 00F63E7C4C00

blackrageous · July 28, 2014, 10:30am

I get this on my Linux box, the results are actually the same, my linux system just doesn't do an actually process the new line...but the results are the same.

[josephgr@oc0887178221 ~]$ echo "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014

When I do this; however, on linux...

$ echo -e "Sun 12 Jul BST 2014 \n Sun 12 Jul 2014 \n Sun 12 Jul IS 2014" | awk '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print}'
 Sun 12 Jul 2014

-E is the default for echo on linux and that means "disable interpretation of backslash escapes"

$ uname -a
Linux oc0887178221.ibm.com 2.6.32-431.21.1.el6.x86_64 #1 SMP Tue Jun 3 19:11:40 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

RudiC · July 28, 2014, 10:41am

What's your awk version? Some Linuxes use mawk which doesn't recognize the extended regex [A-Z]{3}. And, use echo -e to have the "\n" interpreted correctly.

in2nix4life · July 28, 2014, 10:48am

Most awk on Linux do not recognize [A-Z]{3} which is called an interval expression, but it can be enabled like so:

echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk --re-interval '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
Sun 12 Jul BST 2014
Sun 12 Jul 2014

From man page:

--re-interval Enable the use of interval expressions in regular expression matching (see Regular Expressions, below). 
Interval expressions were not traditionally available in the AWK language. The POSIX standard added them, to make awk 
and egrep consistent with each other. However, their use is likely to break old AWK programs, so gawk only provides 
them if they are requested with this option, or when --posix is specified.

kamlesh_pradhan · July 28, 2014, 11:05am

Thanks in2nix4life, answer is specific one....
I got the point.

But is there any other way to handle interval expression so that it will work in my AIX and Linux version both.

Because

[A-Z]{3}

is an basic functionality, hence there must be something else which can replace it in my Linux version.

Corona688 · July 28, 2014, 11:11am

What's wrong with --re-interval ? It does exactly what you want...

You could also use --posix...

Also, "echo -e" is not crossplatform. Use "printf" to get consistent behavior.

printf "%s\n" "line1" "line2"

in2nix4life · July 28, 2014, 11:19am

Simplest solution... Wrap your awk command in a script and detect which OS the script is being executed on and run the correct awk command.

RudiC · July 28, 2014, 11:20am

Try [A-Z][A-Z][A-Z]

kamlesh_pradhan · July 28, 2014, 12:25pm

Thanks RudiC, Corona,

Thanks in2nix4life solution is only and simplest solution i got for this after many googling.

Summary for others to help in future.

Simplest solution is to detect OS and Run the regex statement accordingly

case "`uname -s`" in 
	"AIX")
		echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk  '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
		;;
	"Linux")
		echo -e "Sun 12 Jul BST 2014\nSun 12 Jul 2014\nSun 12 Jul IS 2014" | awk --re-interval '/(Sun)+( 12)+( Jul )+([A-Z]{3} )?(2014)/{print;}'
		;;
	*)
		echo "Unidenfied Operating System"
		;;
esac

Rudic Solution will be best in case there are not many old scripts which uses interval expression.
I am using Rudic Solution by the way.

[A-Z][A-Z][A-Z]