How to use sed to search a particular pattern in a file backward after a pattern is matched.?

saurabh_kumar · July 17, 2013, 1:44pm

Hi,
I have two files file1.txt and file2.txt. Please see the attachments.

In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to search backward for the path and print the corresponding path (/home/saurabh/file1.txt). Please note that there may be many entries between the path(/home/saurabh/file1.txt) and the line containing the pattern 1172c1172.

file1.txt:

/home/saurabh/file7.txt
117c117
<          
---
>          

/home/saurabh/file1.txt
76c76
---
79c79
---
1172c1172
<          apple, banana, orange
---
>          apple, banana, mango

/home/saurabh/file7.txt
117c117
<   	silent, helpful       
---
>  	silent

file2.txt:

2388,2391d2387
< 1172c1172
< <          apple, banana, orange
< ---
< >          apple, banana, mango

Any suggestions will be highly appreciated.

Thanks,
Saurabh

rajamadhavan · July 18, 2013, 1:38am

May not be the best way, but it can do the job for you..

var=$(grep "< [^- ><]" file2.txt | sed -e 's/< //g')
awk -v pat=$var '/^\/.+/{a=$0;} $0 ~ pat{print a;exit}' file1.txt

RudiC · July 18, 2013, 8:13am

Try

awk 'NR==2 {SRC=$2; next} $1==SRC {print PATH} $1~/^\// {PATH=$1}' file2 file1
/home/saurabh/file1.txt

saurabh_kumar · July 18, 2013, 12:27pm

Thank you raja and RudiC.

These solutions seems to work for most of the test-cases. Haven't had much experience with awk. Please let me know ,if I understand correctly these solutions

Raja,
In the first line you extracted the pattern from file2 and stored them in a variable var. Then in the second line for any line starting with / the whole line($0) is stored in a and then you perform a match with pat($0~pat) and print a .Can you please explain how does matching $0 with pat works because $0 will contain the path and pat will contain the pattern(1172c1172).

RudyC,
Can you please explain your solution .Will this solution work if file2 has more than one entry

2388,2391d2387 < 1172c1172 < <          apple, banana, orange < --- < >          apple, banana, mango
1277,1280d1276 < 117c117 < <   	silent, helpful < --- < >  	silent

RudiC · July 18, 2013, 2:01pm

That solution keeps the second field in the second line of the second file as the search pattern. Then it reads file1 line by line; if it finds a PATH, it keeps it; if it finds the search pattern, it prints PATH, which is the last one found.

Neither of the solutions will handle two or more search patterns. You didn't specify that. How can search patterns be identified in file2? Is it always < ....c.... ? Or what patterns do you have in mind now?

saurabh_kumar · July 18, 2013, 3:37pm

Thanks RudyC for the explanation. The pattern will be

[digit]{1,}[acd][digit]{1,}

i.e atleast one digit followed by any one of the characters a,c,d and followed by one or more digits.

Thanks

rajamadhavan · July 19, 2013, 1:15am

$0 ~ pat will be true only when awk approaches the line containing 1172c1172. Thats when the script prints the previously saved path string (variable 'a') and exit. I hope you can get this.

RudiC · July 19, 2013, 2:24pm

saurabh kumar:

Thanks RudyC for the explanation. The pattern will be
[digit]{1,}[acd][digit]{1,}
i.e atleast one digit followed by any one of the characters a,c,d and followed by one or more digits.

Which is true for these two lines in file2

I'd say your regex is not selective enough...?

saurabh_kumar · July 20, 2013, 4:49am

Yes, the regex is not selective enough. Apologies for the miss. I have to add starting with < or > to the above regex.

RudiC · July 20, 2013, 4:00pm

Well, try this on multiple patterns in file2:

awk     '/< [0-9]+[acd][0-9]+/          {SRC[$2]; next}
         $1 in SRC                      {print PATH}
         $1 ~ /^\//                     {PATH = $1}
        ' file2 file1