Performance problem in Shell Script

Hi,
I am Shell script beginner.
I wrote a shell programming that will take each line of a file1 and search for it in another file2 and give me the output of the lines that do not exist in the file2.
I wrote it using do while nested loop but the problem here is its running for ever . Is there a way I can improve the performance of the script.
Both of my files contains 700K records each.

Hi sakthisivi, welcome to the forums.

To enhance your script, please share what you have tried so far.

Greetings

Hi,

no need of a shell script to do such a thing, grep can do it, alone.

And, share small but representative samples of the files.

[edit] I answered the wrong thread: :o

can you tell me how can i do it with a grep command?

Typically, one would use:

grep -vxFf file2 file1

or try awk:

awk 'NR==FNR{A[$0]; next} !($0 in A)' file2 file1

These two approaches only work if the lines are exactly the same, with no leading or trailing whilespace in one file, that is missing in the other...

--
Otherwise you could try this adaptation of the awk approach:

awk '{p=$0; $1=$1} NR==FNR{A[$0]; next} !($0 in A){print p}' file2 file1

--
On Solaris use /usr/xpg4/bin/grep and /usr/xpg4/bin/awk

from man grep:

GREP(1)                     General Commands Manual                    GREP(1)



NAME
       grep, egrep, fgrep - print lines matching a pattern

SYNOPSIS
       grep [OPTIONS] PATTERN [FILE...]
       grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

...

       -F, --fixed-strings
              Interpret  PATTERN  as  a  list  of  fixed strings, separated by
              newlines, any of which is to be matched.

...

       -f FILE, --file=FILE
              Obtain patterns  from  FILE,  one  per  line.   The  empty  file
              contains zero patterns, and therefore matches nothing.

...

       -v, --invert-match
              Invert the sense of matching, to select non-matching lines.

grep -F or fgrep:

fgrep -xvf file2 file1

For an unknown reason (hash implementation?) awk is faster than grep

awk 'NR==FNR {s[$0]; next} !($0 in s)' file2 file1
1 Like

For 700K records, either will do. Handy as awk is, using grep doesn't require one to learn an entire new programming language

2 Likes

i think the above grep command compares 1st line with first line and second line with second line and third line with third line in both the files but I need to compare 1st line of the first file with all the lines in the second file and second line of the first file with all the lines in the second file respectively and print only the lines in the first file that are not matched with the second file.
Thanks for your help on this.

You are mistaken - it tries to match every pattern (or entire line in case you use fgrep ) in the pattern file- the one supplied to the -f option - to every line in the the files you present as "targets". But - you must make sure that the patterns are formed in a way that they can be matched in the target files. Here e.g. DOS line terminators can be a killer!

1 Like

Thank you.
AWK command worked like charm.
Is there any links where i can get understand these ?

---------- Post updated at 01:44 PM ---------- Previous update was at 01:43 PM ----------

@RudiC.
Thanks for the explanation.