Search pattern in a file taking input from another file

imrandec85 · August 26, 2016, 7:02am

Hi,

Below is my requirement

File1:
svasjsdhvassdvasdhhgvasddhvasdhasdjhvasdjsahvasdjvdasjdvvsadjhv
vdjvsdjasvdasdjbasdjbasdjhasbdasjhdbjheasbdasjdsajhbjasbjasbhddjb
svfsdhgvfdshgvfsdhfvsdadhfvsajhvasjdhvsajhdvsadjvhasjhdvjhsadjahs

File2:
sdh
hgv

I need a command such that it should fetch the rows from file2 and search for matching rows [character 6 to 8th] in file1 and print the matching lines in output_file1.txt and non-matching lines in output_file2.txt.

For example, pattern "sdh" from file2 match with row (6th to 8th character) in file1 hence the row from file1 should be printed to output_file1.txt and 2nd row from file1 does not match with any pattern in file2 hence it should be printed to output_file2.txt.

I have cut the lines from file1 by using cut command as follows by could not reach the desired solution. Please Help!

cut -c6-8 file1
sdh
dja
hgv

Thanks,
Imran.

vbe · August 26, 2016, 7:18am

cut will only give you the char defined by -c6-8 ...
First the search pattern is the command grep!

give it a try

imrandec85 · August 26, 2016, 7:36am

cut -c6-8 file1, the output of this command with be my search pattern to search file2.

I am not sure on how to use output of a command as search pattern in grep command.

vbe · August 26, 2016, 7:44am

Sorry I did see properly your request: Matching 3 charaters on position 6-8 against a file containing the 3 char patterns...
more complex... looks like we need some awk...
Unfortunately I am at a remote office with a PC with no access (yet...) to unix boxes ( I am here for that...) other will have help you through as I have no ways of testing what I would suggest...

RudiC · August 26, 2016, 7:44am

Try

awk 'FNR==NR {T[$1]; next} !(substr($0, 6, 3) in T) {print > "output_file2.txt"; next} 1' file2 file1

RavinderSingh13 · August 26, 2016, 7:51am

Hello imrandec85,

Following may help you in same.

awk 'FNR==NR{A[$0];next} ((substr($0,6,3)) in A){print $0 > "output_file1.txt";delete A[substr($0,6,3)];next} !((substr($0,6,3)) in A){print > "output_file2.txt"}'   Input_file2   Input_file1

Where matching lines(with strings of Input_file2) will be stored into output_file1.txt and non-matching will be stored in output_file2.txt file.
EDIT: Adding a non-one liner form of solution as follows too.

awk 'FNR==NR{
                A[$0];
                next
            }
     ((substr($0,6,3)) in A){
                                print $0 > "output_file1.txt";
                                delete A[substr($0,6,3)];
                                next
                            }
    !((substr($0,6,3)) in A){
                                print > "output_file2.txt"
                            }
    '  Input_file2   Input_file1

Thanks,
R. Singh

imrandec85 · August 26, 2016, 8:08am

Thank you RudiC and R. Singh.

Both of your solutions meet my requirement.

I am not very fond of unix commands.Could you please explain what is actually happening in awk command.

awk 'FNR==NR{A[$0];next} ((substr($0,6,3)) in A){print $0 > "output_file1.txt";delete A[substr($0,6,3)];next} !((substr($0,6,3)) in A){print > "output_file2.txt"}'   Input_file2 Input_file1

Thanks once again!

RavinderSingh13 · August 26, 2016, 8:22am

Hello imrandec85,

Following may help you in same.

awk 'FNR==NR{                   #### FNR and NR are the awk's built-in variables, so FNR will be representing the current Input_file's line number same as NR, only deference between them is FNR will be RESET on each Input_file's starting and NR will keep increasing till the last Input_file is completely read. So FNR==NR condition will be TRUE when 1st Input_file1 will be read  not the second one.
A[$0];                          #### Creating an array named A whose index is $0(complete line) of Input_file2.
next}                           #### next is awk's built-in keyword which will be used to skip all further statements. So here it will skip all next mentioned statements.
((substr($0,6,3)) in A)         #### substr is again awk's built in keyword which is mostly used to get specific string out of a line, it's syntax is substr(line/variable's value, starting position number, Number of characters you need to get from the starting position mentioned). So here as per your requirement we have to get 6,7,8 characters in Input_file2 so mentioning ($0,6,3) means complete line's 6th character to 3 characters take and then check if they are present in array A(which we created while Input_file2 was getting read).
{                               #### If that substring is present in array A then execute following statements.
print $0 > "output_file1.txt";  #### print the complete line($0) to file named "output_file1.txt" as per your requirement.
delete A[substr($0,6,3)];       #### Deleting the array's element to remove duplication here.
next}                           #### Using next keyword here to skip all further statements here.
!((substr($0,6,3)) in A)        #### Now checking substring for characters 6,7,8 which are NOT present in array A(means to get lines which are present in Input_file and not in Input_file2)
{print > "output_file2.txt"}    #### printing those lines into file named "output_file2.txt" as per your requirements.
'   Input_file2   Input_file1   #### Mentioning Input_file1 and Input_file2 here.

Thanks,
R. Singh

RudiC · August 26, 2016, 9:33am

Should the patterns from file2 be recognized more than once? Then: do not delete A[...] !