passing a regex as variable to awk and using that as regular expression for search

Hi All,

I have a sftp session log where I am transferring multi files by issuing "mput abc*.dat". The contents of the logfile is below -
#################################################

Connecting to 10.75.112.194...
Changing to: /home/dasd9x/testing1
sftp> mput abc*.dat
Uploading abc140212095613.dat to /home/dasd9x/testing1/abc140212095613.dat
abc140212095613.dat                                                                                                        100%   21     0.0KB/s   00:00
Uploading abc140212095639.dat to /home/dasd9x/testing1/abc140212095639.dat
abc140212095639.dat                                                                                                        100%   25     0.0KB/s   00:00
Uploading abc140212095648.dat to /home/dasd9x/testing1/abc140212095648.dat
abc140212095648.dat                                                                                                        100%   43     0.0KB/s   00:00
Uploading abc140212095658.dat to /home/dasd9x/testing1/abc140212095658.dat
abc140212095658.dat                                                                                                        100%   35     0.0KB/s   00:00
Uploading abc140212095710.dat to /home/dasd9x/testing1/abc140212095710.dat
abc140212095710.dat                                                                                                        100%   27     0.0KB/s   00:00
Uploading abc140212095719.dat to /home/dasd9x/testing1/abc140212095719.dat
abc140212095719.dat                                                                                                        100%   40     0.0KB/s   00:00
Uploading abc14022012.dat to /home/dasd9x/testing1/abc14022012.dat
abc14022012.dat                                                                                                            100%   52     0.0KB/s   00:00
sftp> ls -l
drwxr-xr-x    0 600598020 600598020     1024 Feb 16 14:35 .
drwx------    0 600598020 600598020     1024 Feb 16 14:34 ..
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 a.dat
-rw-r--r--    0 600598020 600598020       21 Feb 16 14:35 abc140212095613.dat
-rw-r--r--    0 600598020 600598020       25 Feb 16 14:35 abc140212095639.dat
-rw-r--r--    0 600598020 600598020       43 Feb 16 14:35 abc140212095648.dat
-rw-r--r--    0 600598020 600598020       35 Feb 16 14:35 abc140212095658.dat
-rw-r--r--    0 600598020 600598020       27 Feb 16 14:35 abc140212095710.dat
-rw-r--r--    0 600598020 600598020       40 Feb 16 14:35 abc140212095719.dat
-rw-r--r--    0 600598020 600598020       52 Feb 16 14:35 abc14022012.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 b.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 c.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 d.dat
sftp> quit

#################################################

This log has been captured in a file called sftp_log. Now I am out of the sftp session and I have this sftp_log for my referance. I want to check the log and find out if all the files (resembling abc*.dat) are transferred as per size. So, I want to find the lines where the abc*.dat files were long listed. I have the abc*.dat captured in a variable named TRANSFERRING_FNAME. So I used this variable to be passed in awk and search the desired lines by below command -

awk -v fname="$TRANSFERRING_FNAME" 'substr($1,1,1) == "-" && $9 ~ fname {print $9 "|" $5}' sftp_log

But it is not returning anything. Actually I needed only below lines from the sftp_log -
###################################################

-rw-r--r--    0 600598020 600598020       21 Feb 16 14:35 abc140212095613.dat
-rw-r--r--    0 600598020 600598020       25 Feb 16 14:35 abc140212095639.dat
-rw-r--r--    0 600598020 600598020       43 Feb 16 14:35 abc140212095648.dat
-rw-r--r--    0 600598020 600598020       35 Feb 16 14:35 abc140212095658.dat
-rw-r--r--    0 600598020 600598020       27 Feb 16 14:35 abc140212095710.dat
-rw-r--r--    0 600598020 600598020       40 Feb 16 14:35 abc140212095719.dat
-rw-r--r--    0 600598020 600598020       52 Feb 16 14:35 abc14022012.dat

###################################################

From these lines I want to extract the file name and their corresponding size like below -
############################################

abc140212095613.dat|21
abc140212095639.dat|25
abc140212095648.dat|43
abc140212095658.dat|35
abc140212095710.dat|27
abc140212095719.dat|40
abc14022012.dat|52

############################################

And the value in $TRANSFERRING_FNAME can vary so we can't manupulate on hard coded value like 'abc'. Could anyone please advise.

Thanks & Regards,
Bijitesh

Just look for - at the beginning of the line and print the fields you want.

awk '/^-/ { print $9 "|" $5 }' datafile
1 Like

Thanks. But in that case I'll end up getting the lines -
##############################################

-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 a.dat
-rw-r--r--    0 600598020 600598020       21 Feb 16 14:35 abc140212095613.dat
-rw-r--r--    0 600598020 600598020       25 Feb 16 14:35 abc140212095639.dat
-rw-r--r--    0 600598020 600598020       43 Feb 16 14:35 abc140212095648.dat
-rw-r--r--    0 600598020 600598020       35 Feb 16 14:35 abc140212095658.dat
-rw-r--r--    0 600598020 600598020       27 Feb 16 14:35 abc140212095710.dat
-rw-r--r--    0 600598020 600598020       40 Feb 16 14:35 abc140212095719.dat
-rw-r--r--    0 600598020 600598020       52 Feb 16 14:35 abc14022012.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 b.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 c.dat
-rw-r--r--    0 600598020 600598020        0 Feb 16 14:32 d.dat

##############################################

and selecting their name and size will get -
###############################

a.dat|0
abc140212095613.dat|21
abc140212095639.dat|25
abc140212095648.dat|43
abc140212095658.dat|35
abc140212095710.dat|27
abc140212095719.dat|40
abc14022012.dat|52
b.dat|0
c.dat|0
d.dat|0

###############################

But I don't want the details of a.dat, b.dat, c.dat and d.dat.

Regards,
Bijitesh

What are the exact contents of this TRANSFERRING_FNAME? abc*.dat is not regular expression, it's a glob -- try to interpret it as a regex and it will work wrong. It will match abc.dat, abcadat, abcccc.dat, abccccccadat, and so forth, since * means "one or more of the previous character" and . means "any single character".

For a regex I think you'd want '^abc.*\.dat'.

1 Like

Thanks Corona688 for correcting me. I am not good at regex and thought abc*.dat as regex. But TRANSFERRING_FNAME variable contains abc*.dat only. Actually it is taken from the understanding of working "ls -lrt abc*.dat" which means all files - abc.dat, abca.dat, abcccc.dat, abcccccca.dat, abc140212095613.dat, abc140212095639.dat, abc14022012.dat etc. The script first does multiple put in a sftp session as "mput $TRANSFERRING_FNAME". And then from this clue, i.e. $TRANSFERRING_FNAME I have to search the sftp_log. And it is obvious that I can't modify the contents of $TRANSFERRING_FNAME as '^abc.*\.dat'. Please help.

That's a slight fallacy known as useless use of ls *, it's not ls that understands what * means -- it's the shell itself. That's why * works the same way with every command.

mput is independent of the shell however, and does its own processing of *, but it works the same way as the shell.

You're going to need to modify your shell script I think. Perhaps you can tell it the prefix you want, instead of giving it the entire "abc*.dat", and the shell can replace that itself, and you can feed that into awk in the form you want as well.

To modify your shell script I'll need to actually see it of course.

---------- Post updated at 10:01 AM ---------- Previous update was at 09:55 AM ----------

This kludge might also work, but it's not pretty. It replaces all . by \., all * by .*, prepends ^, and appends $ to turn simple globs into regular expressions. If your awk doesn't have gsub, use nawk.

awk -v STR="*.dat" 'BEGIN { gsub(/[.]/, "\\.", STR); gsub(/[*]/, ".*", STR); STR="^" STR "$" }; /^-/ && ($9 ~ STR) { print $9 "|" $5 }'
1 Like

Thanks a lot Corona688 for this valuable advise. I'll check if this is possible. One thing is that the TRANSFERRING_FNAME would be assigned with a value that the users passes as the argument to the shell script. So it can contain anything starting from absolute value abc140212095639.dat to abc*.dat or abc?.dat.

I was afraid of that. It's going to take pages of substitutions to allow awk to accept every possible kind of shell glob. I don't want to even think about getting ranges and backslashes to work...

How about giving awk a list of files, instead of giving it a glob? The shell can turn a glob into a list, and awk can handle a list easily.

VAR="*.ext"

# The shell (and not ls!) will turn *.ext into a list of matching files, 
# which we store in /tmp/$$.  $$ is the script's process ID, which we
# simply use as a convenient unique number.
ls $VAR > /tmp/$$

sftp ... > ftplog

awk -v LIST="/tmp/$$" 'BEGIN { while(getline<LIST) A[$1]=1 }; /^-/ && A[$9] { print $9 "|" $5 }' ftplog

rm -f /tmp/$$