Big pattern file matching within another pattern file in awk or shell

nitin_daharwal · November 19, 2015, 8:55pm

Hi

I need to do a patten match between files .
I am new to shell scripting and have come up with this so far. It take 50 seconds to process files of 2mb size . I need to tune this code as file size will be around 50mb and need to save time.
Main issue is that I need to search the pattern from Keys in one file(File1) and then that file becomes the pattern for another two files (File2 and File3).

Is there any other way to do it?

File content looks like below.

**File1**

    
    20150816,ab311914,ab,abc040,2
    20150817,ab311914,ab,abc040,3
    20150818,ab311914,ab,abc040,4
    20150819,ab311914,ab,abc040,5
    20150820,ab311914,ab,abc040,6
    20150821,ab311914,ab,abc040,7
    20150822,ab311914,ab,abc040,8
    20150823,ab311914,ab,abc040,9
    20150824,ab311914,ab,abc040,10
    20150825,ab311914,ab,abc040,11

**File2**

    
    20150816,ab311914,ab,abc040,1
    20150817,ab311914,ab,abc040,2
    20150818,ab311914,ab,abc040,3
    20150819,ab311914,ab,abc040,5
    20150820,ab311914,ab,abc040,6
    20150821,ab311914,ab,abc040,7
    20150822,ab311914,ab,abc040,8
    20150823,ab311914,ab,abc040,9
    20150824,ab311914,ab,abc040,10
    20150825,ab311914,ab,abc040,1

**File3**

  
    20150816,ab,0
    20150817,ab,1
    20150818,ab,2
    20150819,ab,3
    20150820,ab,4
    20150821,ab,5
    20150822,ab,6
    20150823,ab,7
    20150824,ab,8

**Keys**

 ab311914,1

Sample output

  
     20150816,ab311914,ab,abv040,61
     20150817,ab311914,ab,abv040,62
     20150818,ab311914,ab,abv040,63
     20150819,ab311914,ab,abv040,64
     20150820,ab311914,ab,abv040,65
     20150821,ab311914,ab,abv040,66
     20150822,ab311914,ab,abv040,67
     20150823,ab311914,ab,abv040,68
     20150824,ab311914,ab,abv040,69
     20150825,ab311914,ab,abv040,70

** shell script code so far**

                awk -F "," keys.txt '{print $1}'|while read key_
		do
		echo "$key_" 
		grep $key_ file1.txt |grep -v ",-1$"|while read line; 
		do 
		patterna1=`echo $line|awk -F "," '{print $2 "," $3 "," $4 "," $5 "$"}' `
		patterna2=`echo $line| awk -F "," '{print $1 "," $2 "," $3 "," $4}'`
		patternb1=`grep $patterna1 file2.txt|head -1|awk -F "," '{print $1}'`
		patternb2=`grep $patternb1 file3.txt|awk -F "," '{print $3}'`
		echo $patterna2,$patternb2 
		done  >> final.txt
		done

Don_Cragun · November 19, 2015, 10:43pm

I am totally confused.

There is nothing in your code (which you imply is working but is running too slow), that explains why the output has abv040,61 through abv040,70 when abv040 does not appear anywhere in any of the input files and the values 61 through 70 do not appear anywhere in any of the input files.

Furthermore, your code seems to only output two fields; not five.

And, there is no line in File3 containing the string (or date) 20150825 ; so why is there a line in the output containing that string?

Please explain more clearly what you are trying to do.

RudiC · November 20, 2015, 4:55am

Some comments on top of what Don Cragun said:
awk -F "," keys.txt '{print $1}' can't possibly work (reverse order of arguments) and is superfluous - you could simply read IFS="," key_ REST; ... < keys.txt
grep -v ",-1$" is pointless as (at least in the samples given) there's no line ending in "-1"
And, for each line in keys.txt times each matching line in file1.txt, you run 10 processes to extract a few fields - no surprise that is slow.