Grep pattern file and count occurances

ahfze · February 13, 2018, 7:15am

Guys
I am trying grep to read from pattern file and count occurances of each word.
input is

firstplace
secondplace
secondpot

patternfile is

place
first
second

i want the following.
1-count number of times keywords in patternfile occurs in begining of each line in input file.

so result should be

place 0
first 1
second 2

2-count number of times keywords in patternfile occurs at end of each line in input file.

so result should be

place 2
first 0
second 0

i tried

grep -of patternfile input >result

but it didnt work.

any help here?

RudiC · February 13, 2018, 8:48am

Closest I can get quickly:

while read line; do echo "^$line"; done <file2 | grep -of- file1 | sort | uniq -c
      1 first
      2 second
while read line; do echo "$line\$"; done <file2 | grep -of- file1 | sort | uniq -c
      2 place

For sth. more refined you might need an awk (or equivalent) solution.

MadeInGermany · February 13, 2018, 1:38pm

Is this homework/coursework?

abdulbadii · February 13, 2018, 6:59pm

let variable a is your keyword list, then
let d a file

$ cat d
firstplace
secondplace
secondpot

a='first second place pot'

for o in $a; do
 echo $o at the beginning of line is `grep -Poce '^'$o'' d` time; echo $o at the second following the previous found is `grep -Poce '^\w+?'$o'' d` time;echo
done

first at the beginning of line is 1 time
first at the second following the previous found is 0 time

place at the beginning of line is 0 time
place at the second following the previous found is 2 time

second at the beginning of line is 2 time
second at the second following the previous found is 0 time

pot at the beginning of line is 0 time
pot at the second following the previous found is 1 time

vgersh99 · February 13, 2018, 8:21pm

a bit verbose, but....
awk -f ah.awk patternFile inputFile
where ah.awk is:

FNR==NR {
  f2[$0]
  next
}
{
  for(i in f2) {
    if($0 ~ ("^"i))
      beg++
    if($0 ~ (i"$")) {
      end++
  }
}
}
END {
    print "begining"
    for(i in f2)
       print i,beg+0

    print "\nend"
    for(i in f2)
       print i,end+0
}

ahfze · February 13, 2018, 10:30pm

rudic:

Closest I can get quickly:
while read line; do echo "^$line"; done <file2 | grep -of- file1 | sort | uniq -c
   1 first
   2 second
while read line; do echo "$line\$"; done <file2 | grep -of- file1 | sort | uniq -c
   2 place
For sth. more refined you might need an awk (or equivalent) solution.

Rudi thanks.
your command works fine but as both files are huge grep fails i think.
can you give me something equivalent in awk?

---------- Post updated at 10:04 PM ---------- Previous update was at 09:58 PM ----------

i think awk is the solution as both files are huge but ah.awk you gave doesnt work..the script just keeps running.no output.
can you check if there is any error in the codes please?

---------- Post updated at 10:22 PM ---------- Previous update was at 10:04 PM ----------

for my ah.awk i tried adding #! /usr/bin/awk -f in the top line.
still didnt work.

thanks guys for helping.

---------- Post updated at 10:30 PM ---------- Previous update was at 10:22 PM ----------

i think the issue here is not grep.
i think the while read line doesnt work for big files.
how do i replace it with awk?

vgersh99 · February 14, 2018, 11:30am

ahfze:

Rudi thanks.
your command works fine but as both files are huge grep fails i think.
can you give me something equivalent in awk?

---------- Post updated at 10:04 PM ---------- Previous update was at 09:58 PM ----------

i think awk is the solution as both files are huge but ah.awk you gave doesnt work..the script just keeps running.no output.
can you check if there is any error in the codes please?

---------- Post updated at 10:22 PM ---------- Previous update was at 10:04 PM ----------

for my ah.awk i tried adding #! /usr/bin/awk -f in the top line.
still didnt work.

thanks guys for helping.

---------- Post updated at 10:30 PM ---------- Previous update was at 10:22 PM ----------

i think the issue here is not grep.
i think the while read line doesnt work for big files.
how do i replace it with awk?

works on the sample files you gave me. there must be something different with your actual files - and not just the size.

do cat -vet patternFile and post a sample output here (using code tags).

Don_Cragun · February 14, 2018, 11:58am

And answer the question asked in post #3! Until you answer the question you are not likely to get any more help.

Homework and coursework questions can only be posted in the Homework & Coursework Questions forum under special homework rules.

Please review the rules, which you agreed to when you registered, if you have not already done so.

If you did not post homework, please explain the company you work for and the nature of the problem you are working on.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

ahfze · February 14, 2018, 12:11pm

zeros$
zest$
zests$
zilla$
zillas$
zimbabwe$
zinc$
zine$
zines$
zing$
zings$
zion$
zip$
zippy$
zips$
zodiac$
zombie$

should i use the same ah.awk code you posted above or do i need to add anything else to ah.awk?

thanks for helping.

vgersh99 · February 14, 2018, 12:14pm

ahfze:

zeros$
zest$
zests$
zilla$
zillas$
zimbabwe$
zinc$
zine$
zines$
zing$
zings$
zion$
zip$
zippy$
zips$
zodiac$
zombie$
should i use the same ah.awk code you posted above or do i need to add anything else to ah.awk?

thanks for helping.

looks good - should work.
Please address Don's ask in post #8 before proceeding any further.

ahfze · February 14, 2018, 12:15pm

Hmm i am not a student so this isnt homework/coursework.
im learning these commands as i need to analyze some big project files.
kinda work for me.
is this ok?

RudiC · February 14, 2018, 12:36pm

Working through / analysing a huge data file with a huge pattern file will take its time no matter which tool you deploy, be it grep or awk or what have you; so, saying "doesn't work" may be premature. You should see the process working in e.g. top , piling up memory and / or processor time. Did any of the proposals given in here "work" (i.e. yield the desired / anticipated result) for smaller data sets and patterns? If yes, the logics are OK, and you have to address the performance question. Like e.g. split the patterns into smaller chunks.

ahfze · February 14, 2018, 1:22pm

ok.got it.
thanks guys.