Grep for text between twp strings for multiple occurances.

I need to get text between two strings <app-deployment file=" and </app-deployment> as it appears more than once in the file then how can i store the text between each occurrence in a separate file ?

Thus, i need the below to go in found1.tmp

and the below to go in found2.tmp

Reference: Search between two strings for multiple occurances

Hello mohtashims,

Could you please try following and let me know if this helps you.

awk '/<app-deployment file/{Q++} {E=E?E ORS $0:$0} /<\/app-deployment>/{print E > "found"Q".tmp";E=""}'   Input_file

Thanks,
R. Singh

I am passing the Input_file as an variable entry and also changing the file name how it is saved.

But it is failing
below is what i m doing

ls *.xml | while IFS= read -r entry; do
echo "ENTRY:"$entry
awk '/<foreign-server name/{Q++} {E=E?E ORS $0:$0} /<\/foreign-server>/{print E > $entry"_found"Q".tmp";E=""}'  $entry

Hello mohtashims,

Could you please try following and let me know if this helps you.

for file in *.xml
do
     awk '/<app-deployment file/{Q++} {E=E?E ORS $0:$0} /<\/app-deployment>/{print E > "found"Q".tmp";E=""}'  Input_file
done

But problem here is you haven't told us like multiple xml files are there which you want to parse. Now if even you run above command finally only 2 files will be created. So in case you want to append all the output to files then change a little to above command.

for file in *.xml
do
     awk '/<app-deployment file/{Q++} {E=E?E ORS $0:$0} /<\/app-deployment>/{print E >> "found"Q".tmp";E=""}'  Input_file
done

Let me know if your requirement is different or you have additional conditions too with it. I hope this helps.

Thanks,
R. Singh

Thank you for responding. You have not addressed the complete issue i am facing.

Let me elaborate.

I read all *.tmp file from a folder in a do-while loop in the variable called entry .

deploy.tmp in the OP is just one such file out of the many.

Now i want the parsing to happen on each of the files one by one and should be saved as below (considering that the loop picked deploy.tmp first)

and like wise depending of the number of matches found for text between <app-deployment file=" and </app-deployment>
Likewise for update.tmp in entry variable i should be update.found1.tmp update.found2.tmp and likewise.

Can you help ?

Hello mohtashims,

I apologies for not understanding your requirement completely. I have tried my best to address the issue as per your information provided into your posts, I want to request you please let us know your complete requirement(in spite of giving in bits and pieces). In my previous post I have told you myself like output will be overwritten into same file, because we are reading multiple xml files by a loop.

Let me ask you some questions here for solving this problem.

i- You have mentioned previous post you are having multiple xmls, but in your previous post you have shown us *.tmp, how these 2 are related?
ii- If you have multiple files(xmls) and you have made a single big output file then did you try to consider my first post's solution?
iii- Please re-phrase your problem with sample Input_file and expected Output_file again in case you think information provided is confusing people(At least I am confuse). Please help us here to help you.

Thanks,
R. Singh

i- You have mentioned previous post you are having multiple xmls, but in your previous post you have shown us *.tmp, how these 2 are related?

The are the same i overlooked .xml for .tmp

ii- If you have multiple files(xmls) and you have made a single big output file then did you try to consider my first post's solution?

I have multiple .tmp files but each .tmp file which will act as Input_file file in the do-while loop and then it needs to be parsed with the two string mentioned in the OP.

For each match found it should save each parsed data in a new file
thus for two match found between strings it should save output file with this name.

Input_file_found1.tmp
Input_file_found2.tmp.

Check my previous post for the example.

iii- Please re-phrase your problem with sample Input_file and expected Output_file again in case you think information provided is confusing people(At least I am confuse). Please help us here to help you.

Sample input file is the deploy.tmp in the OP but like i said i have many files like the deploy.tmp which i will be reading one by one and using yr suggestions to parse them.

In the OP the output filename should now be deploy.tmp_found1.tmp and deploy.tmp_found2.tmp and likewise depending on the number of matches found and deploy.tmp being the Input file name what will change as we read files from that folder in the do-while.

I hope i hv explained my requirement.

Hello mohtashims,

Could you please try following and let me know how it goes from there then. It will put files names like for an example File1found1.tmp and File1found2.tmp .

for file in *.xml
do
     awk '/<app-deployment file/{Q++} {E=E?E ORS $0:$0} /<\/app-deployment>/{print E > FILENAME"found"Q".tmp";E=""}'  $file
done

You could change for file in *.xml to for file in *.tmp etc as per your need.

Thanks,
R. Singh

I fully second RavinderSingh13 saying that your specifications could be WAY clearer and more precise FROM THE START. Amongst other incertainties, your output file names have been specified to be

  • found1.tmp
  • deploy_found1.tmp
  • update.found1.tmp
  • deploy.tmp_found1.tmp
  • Input_file_found1.tmp (might be a variation of No. 2)

Wouldn't it save YOUR time, too, making up your mind first and then posting?

How about

awk '
FNR == 1                {FCNT = 0
                        }
/<app-deployment/       {FN = FILENAME
                         sub ("\.", ".found" ++FCNT ".", FN)
                        }
/<app-deployment/,
/<\/app-deployment/     {print > FN
                        }
' *.tmp

cf *found*
file3.found1.tmp:
<app-deployment file="file1">
<name>cert</name>
<target>CS1</target>
<module-type>war</module-type>
</app-deployment>
file3.found2.tmp:
<app-deployment file="file2">
<name>Security</name>
<target>CS2</target>
<module-type>ear</module-type>
</app-deployment>
file4.found1.tmp:
<app-deployment file="file3">
<name>cert</name>
<target>CS1</target>
<module-type>war</module-type>
</app-deployment>
file4.found2.tmp:
<app-deployment file="file4">
<name>Security</name>
<target>CS2</target>
<module-type>ear</module-type>
</app-deployment>