Multi platform script perl or awk

Hi gurus, I am trying to match records in following format:

(-,username,domain1.co.uk)\
(-,username,domain2.co.uk)

either awk or perl must be used. I am using cygwin. I wrote following code which works and matches both above entries:

awk 'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' netgroup

But if I try to modify this regexp to be more specific the output is nothing:

# 1st: match record then last backslash and then match newline

"(-,username,[^)]+.co.uk)\\$"

# 2nd: match new line immediatelly after record without backslash

"(-,username,[^)]+.co.uk)$"

So i decided to rewrite script into perl, hoping that perl can deal with backslashes and end of line symbols. For this purpose I used a2p this way:

echo  'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' | a2p.exe 
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
    if $running_under_some_shell;
                        # this emulates #! processing on NIH machines.
                        # (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
                        # process any FOO=bar switches

$, = ' ';               # set output field separator
$\ = "\n";              # set output record separator

$musr = '(-,username,[^)]+.co.uk)';

while (<>) {
    chomp;      # strip record separator
    if ($_ =~ $musr) {
        print $_;
    }
}

This generated perl script also matches both entries, however if I try modify this script to more specific I get the following errors:

1st:

$musr = "(-,username,[^)]+.co.uk)\\";
Trailing \ in regex m/(-,username,[^)]+.co.uk)\/ at perlmatch.pl line 18, <> line 1.

2nd:

$musr = "(-,username,[^)]+.co.uk)$";
Final $ should be \$ or $name at perlmatch.pl line 14, within string
syntax error at perlmatch.pl line 14, near "= "(-,username,[^)]+.co.uk)$""
Execution of perlmatch.pl aborted due to compilation errors.

3rd:

$musr = "(-,username,[^)]+.co.uk)\$";
[the output is nothing]

What I am doing wrong ? My question is also pointing to fact that if somebody needs to use script on several platforms (aix, solaris, linux) than using perl should be better approach that dealing with (non)GNU utils and various (g|n)awk versions etc. Regards

both awk and perl are going to take the ( ) as grouping brackets, not literal ones, unless you escape them.

The nice thing about awk/perl is you don't actually have to make big giant do-everything regexes to handle the logic.

$ cat mmatch.awk

{ N=0 }
     (NF==3) && ($1 == "(-") && ($2 == "username") && /\)\\$/ { T=$0 ; getline; N=1 }
N && (NF==3) && ($1 == "(-") && ($2 == "username") && /\)$/ { print T; print }

$ awk -F"," -f mmatch.awk data

(-,username,domain1.co.uk)\
(-,username,domain2.co.uk)

$

Thank you for reply, sorry but I do not understand what you have done in awk script

For each line that comes in:

1) Set N=0
2) If there's 3 tokens in the line, it begins with "(-", the second token is "username", and it ends with ")\", save the line in T, increment N, and get the next line.
3) If N is nonzero, the line has three tokens, begins with "(-", the second token is "username", and it ends with ")", print this line and the last.

A slight change to your original awk should get the job done:

awk 'BEGIN {musr="[(]-,username,[^)]+.co.uk[)]\\\\?"} $0~musr' netgroup

Two of the \ chars are stripped by the shell, leaving \\ for awk

No, awk just needs an irrational amount of backslashes:

$ echo 'BEGIN {musr="[(]-,username,[^)]+.co.uk[)]\\\\?"} $0~musr'

BEGIN {musr="[(]-,username,[^)]+.co.uk[)]\\\\?"} $0~musr

$
1 Like

You are providing the regular expression as a string literal. Strings literals have to be parsed. The string parser has its own set of escape sequences. The \\ sequences that you intend for the regular expression are being interpreted by the string parser first, and for each such pair it emits a single backslash. You can confirm this by printing the variable's value.

At a later time, the string-valued variable is used where a regular expression is expected. AWK then passes this string to the regular expression parser for compilation. At this point, the error occurs.

You have two choices. You can double the number of backslases or you can specify the regular expression using a regular expression literal instead of a string literal (/pattern/ instead of "pattern").

For more info, see Computed Regexps - The GNU Awk User's Guide (this applies to all awk implementations, not just gawk).

Regards,
Alister

---------- Post updated at 11:58 AM ---------- Previous update was at 11:53 AM ----------

Since they occur within single quotes, the shell isn't consuming any backslashes. As I said above, it's awk's string parser.

An improved version of your suggestion:

awk '$0 ~ /pattern_with_half_the_backslashes/' netgroup

Regards,
Alister

1 Like

Thanks guys I know something was consuming the additional backslashes.