Slow runnin script. The problem seems to be the sed calls.
In summary the script reads list of users in file1. For each
username search two files (file 1 & file2) for the username
and get the value in the next line after "=". Compare these
values with each other.
If the same then output to a file and if not output to another file.
For approx 8000 lines in file1 it takes approx 15 minutes to run?
not very good. Any suggestions on removing the bottleneck?
awk '
BEGIN {
F = "file1"
while (( getline line < F ) > 0 )
{
A1[line]
}
close (F)
F = "file2"
while (( getline line < F ) > 0 )
{
n = split ( line, V, "=" )
if ( V[2] in A1 )
{
i = V[2]
getline line < F
n = split ( line, V, "=" )
A2 = V[2]
}
}
close (F)
F = "file3"
while (( getline line < F ) > 0 )
{
n = split ( line, V, "=" )
if ( V[2] in A1 )
{
i = V[2]
getline line < F
n = split ( line, V, "=" )
A3 = V[2]
}
}
close (F)
}
END {
for ( k in A1 )
{
if ( A2[k] == A3[k] && A2[k] && A3[k] )
print k > "matches.out"
else
print k > "no_matches.out"
}
}
' /dev/null
Let me know how long it took to complete execution.
You are reading and processing two data files, through six, processes every line. Not very efficient.
I'd try a language like awk to make recalling the data much easier, but the format of your data files is very difficult too. Is that fixed? Whichever way you choose, it's much easier to just have lines of username gud in them.
I note that the original ksh script (when fixed to remove the syntax errors and properly terminate the function) includes user4 in matches.out and Yoda's awk script includes user4 in no_matches.out. When an entry in File1 does not appear in File2 or File3, should that entry be:
added to matches.out,
added to no_matches.out,
ignored, or
issue a diagnostic saying the entry was not found?
I don't understand why Yoda didn't use FS="=" instead of splitting lines after reading them, but until I know how to handle the issue above, I'm not going to post my awk script.
As always, if you try running this on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /bin/awk or /usr/bin/awk .
awk -F= ' # Set field separator to "="
# 1st two input files consist of pairs of lines using the format:
# name=user
# gud=value
# in this order.
# So with FS set to "=", $1 will be "name" or "gud" and $2 will be the name of
# the user or the value associated with that user name.
# 3rd input file consists of lines using the format:
# user
# f number of files seen
# lf name of last file seen
# u user from name=user line in 1st two input files
# r1 recorded value from 1st file from gud=value line for name u
# r2 recorded value from 2nd file from gud=value line for name u
FILENAME != lf { # If this is the first time we have a line from this file
f++ # increment the number of files seen and
lf = FILENAME # save the name of the current file for comparison.
}
$1 == "name" { # If this is a name= line in 1 of the 1st 2 files,
u = $2 # save the user name, and
next # skip to next input line.
}
$1 == "gud" { # If this is a gud= line in 1 of the 1st 2 files...
# If we are processing a line from the 1st file, set r1[] for the
# current user name to the value found on this line from the 1st input
# file; otherwise set r2[] for the current user name to the value found
# on this line from the 2nd input file.
f == 1 ? r1 = $2 : r2 = $2
}
f == 3 { # If this line is from the 3rd input file...
# If the user on this line was not in either of the 1st 2 files, save
# the name in no_results.out.
if(!($1 in r1) && !($1 in r2)) print > "no_results.out"
# otherwise,
# if the user on this line was not in one of the 1st 2 files or if the
# value associated with this user was an empty string in one of the
# files, save the name in no_gud.out.
else if(!($1 in r1) || r1[$1] == "" ||
!($1 in r2) || r2[$1] == "") print > "no_gud.out"
# otherwise,
# if the value for the user is different in the 1st 2 files, save the
# name in no_matches.out.
else if(r1[$1] != r2[$1]) print > "no_matches.out"
# otherwise,
# the user is in both files and the values match; save the name in
# matches.out.
else print > "matches.out"
}' File2 File3 File1 # The input files are File2, File3, and File1