search within the file in script

Maya_Pillai · May 17, 2010, 6:32am

Hi all,

I have to search a particular folder which conatins around 2000 XML files.
I need to identify those files which have a value GSM under Product Value

 
<item name="Product Value">
<value>GSM</value>
</item>

I need to copy those file names to another file.

This is my script. When I print $line, it is showing the entire line. But the grep statememt is taking it word by word. Also,the variables FOUNIT and FOUNDENT are not being set correctly.can somebody help me out?

 
#!/bin/ksh
cd /files/gsm
logfile=/users/gsm.txt
found=0
for i in `ls`
do
 
while read line
do
 
if [[ $found -eq 1 ]] ; then
 
found=0
FOUNDENT=`grep "GSM" ${line}`
if [ -z ${FOUNDENT} ]; then
echo $i |tee -a $logile
break
fi
fi
FOUNDIT=`grep "Product Value" ${line}`
if [ ! -z ${FOUNDIT} ]; then
found=0
else
found=1
fi
 
done< $i
done

zaxxon · May 17, 2010, 7:13am

Another approach:

for f in *; do tr -d '\n' < $f| grep -l '<item name="Product Value"><value>GSM</value></item>' && echo $f; done

It takes any file in the folder and removes the newlines for every file being processed. Then it greps for the string you are looking for which would be else on 3 different lines. If grep is successful, it prints the name $f of the file. At this point you could place your copy command.
The for loop will only work if no file has a blank/spaces in it. Else a while/read loop should be used for example.

Maya_Pillai · May 17, 2010, 7:36am

Thanks zaxxon.

I tried to execute the script you have given. But it did not echo any file where there are so many files which match this condition.Can you suggest any remedies here?

zaxxon · May 17, 2010, 7:44am

I tried it with some example files using exactly your pattern and it worked; maybe there is a small typo or something?

$> cat file1
aowidaw
awd
yyppp
yyppp
yyppp
$>
$> cat file2
aoipwjda
aoipwjda
aoipwjda
<item name="Product Value">
<value>GSM</value>
</item>
aoipwjda
$>
$> cat file3
inow23nr23
inow23nr23
inow23nr23
inow23nr23
inow23nr23
$>
$> for f in *; do tr -d '\n' < $f| grep -l '<item name="Product Value"><value>GSM</value></item>' && echo $f; done
file2

Maya_Pillai · May 17, 2010, 8:12am

Tried again after making sure there are no typos. It did not return anything.

Not sure why.

Also, I need to fetch all filenames which have a value contains GSM (like GSM management, GSM networks etc) within the value column inside Product value. How do we modify ?

Franklin52 · May 17, 2010, 8:35am

If the lines with <item name> and <value> are consecutive you can try something like this:

gawk -F"[<>]" '/Product Value/{getline;if(match($3,"GSM"))print FILENAME;nextfile}' *.xml

Remove the coloured part if you don't use gawk.

Maya_Pillai · May 17, 2010, 8:42am

Gives error. The file names have no extensions here.

$ gawk -F"[<>]" '/Product Value/{getline;if(match($3,"GSM"))print FILENAME;nextfile}' *

gawk: not found

$ awk -F"[<>]" '/Product Value/{getline;if(match($3,"GSM"))print FILENAME;nextfile}' *

awk: syntax error near line 1
awk: illegal statement near line 1

Franklin52 · May 17, 2010, 8:43am

Use nawk or /usr/xpg4/bin/awk on Solaris.

Maya_Pillai · May 17, 2010, 9:00am

This also gives error with the first file.
ACCESS_POLICY_MANAGER
nawk: illegal statement
input record number 146, file ACCESS_POLICY_MANAGER
source line number 1

Franklin52 · May 17, 2010, 9:18am

Remove the nextfile statement from the code if you don't use gawk:

nawk -F"[<>]" '/Product Value/{getline;if(match($3,"GSM"))print FILENAME}' *.xml

shahhe · May 17, 2010, 4:08pm

Which operating systems are you using?
gawk is GNU awk and you may have to install it if you are not using Linux. On Linux awk is gawk.

zaxxon's approach may fail if the xml files are big. grep has limitation on length of line it can process.