Grep pattern file and count occurances

Guys
I am trying grep to read from pattern file and count occurances of each word.
input is

firstplace
secondplace
secondpot

patternfile is

place
first
second

i want the following.
1-count number of times keywords in patternfile occurs in begining of each line in input file.

so result should be

place 0
first 1
second 2

2-count number of times keywords in patternfile occurs at end of each line in input file.

so result should be

place 2
first 0
second 0

i tried

grep -of patternfile input >result

but it didnt work.

any help here?

Closest I can get quickly:

while read line; do echo "^$line"; done <file2 | grep -of- file1 | sort | uniq -c
      1 first
      2 second
while read line; do echo "$line\$"; done <file2 | grep -of- file1 | sort | uniq -c
      2 place

For sth. more refined you might need an awk (or equivalent) solution.

1 Like

Is this homework/coursework?

let variable a is your keyword list, then
let d a file

$ cat d
firstplace
secondplace
secondpot

a='first second place pot'

for o in $a; do
 echo $o at the beginning of line is `grep -Poce '^'$o'' d` time; echo $o at the second following the previous found is `grep -Poce '^\w+?'$o'' d` time;echo
done

first at the beginning of line is 1 time
first at the second following the previous found is 0 time

place at the beginning of line is 0 time
place at the second following the previous found is 2 time

second at the beginning of line is 2 time
second at the second following the previous found is 0 time

pot at the beginning of line is 0 time
pot at the second following the previous found is 1 time

a bit verbose, but....
awk -f ah.awk patternFile inputFile
where ah.awk is:

FNR==NR {
  f2[$0]
  next
}
{
  for(i in f2) {
    if($0 ~ ("^"i))
      beg++
    if($0 ~ (i"$")) {
      end++
  }
}
}
END {
    print "begining"
    for(i in f2)
       print i,beg+0

    print "\nend"
    for(i in f2)
       print i,end+0
}
1 Like

Rudi thanks.
your command works fine but as both files are huge grep fails i think.
can you give me something equivalent in awk?

---------- Post updated at 10:04 PM ---------- Previous update was at 09:58 PM ----------

i think awk is the solution as both files are huge but ah.awk you gave doesnt work..the script just keeps running.no output.
can you check if there is any error in the codes please?

---------- Post updated at 10:22 PM ---------- Previous update was at 10:04 PM ----------

for my ah.awk i tried adding #! /usr/bin/awk -f in the top line.
still didnt work.

thanks guys for helping.

---------- Post updated at 10:30 PM ---------- Previous update was at 10:22 PM ----------

i think the issue here is not grep.
i think the while read line doesnt work for big files.
how do i replace it with awk?

works on the sample files you gave me. there must be something different with your actual files - and not just the size.

do cat -vet patternFile and post a sample output here (using code tags).

And answer the question asked in post #3! Until you answer the question you are not likely to get any more help.

Homework and coursework questions can only be posted in the Homework & Coursework Questions forum under special homework rules.

Please review the rules, which you agreed to when you registered, if you have not already done so.

If you did not post homework, please explain the company you work for and the nature of the problem you are working on.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

zeros$
zest$
zests$
zilla$
zillas$
zimbabwe$
zinc$
zine$
zines$
zing$
zings$
zion$
zip$
zippy$
zips$
zodiac$
zombie$

should i use the same ah.awk code you posted above or do i need to add anything else to ah.awk?

thanks for helping.

looks good - should work.
Please address Don's ask in post #8 before proceeding any further.

Hmm i am not a student so this isnt homework/coursework.
im learning these commands as i need to analyze some big project files.
kinda work for me.
is this ok?

Working through / analysing a huge data file with a huge pattern file will take its time no matter which tool you deploy, be it grep or awk or what have you; so, saying "doesn't work" may be premature. You should see the process working in e.g. top , piling up memory and / or processor time. Did any of the proposals given in here "work" (i.e. yield the desired / anticipated result) for smaller data sets and patterns? If yes, the logics are OK, and you have to address the performance question. Like e.g. split the patterns into smaller chunks.

1 Like

ok.got it.
thanks guys.