Search string within a file and list common words from the line having the search string

Hi,

Need your help for this scripting issue I have. I am not really good at this, so seeking your help.

I have a file looking similar to this:

Hello, i am human and name=ABCD.
How are you?
Hello, i am human and name=PQRS.
I am good.
Hello, i am human and name=ABCD.
Good bye.
Hello, i am human and name=XYZ.

Now I need to search "human" keyword within this file and then list out the values after "name=" and also the count of common values.
i.e. output would be:

Count of "human"=4
Count for name "ABCD"=2
Count for name "PQRS"=1
Count for name "XYZ"=1

Please help me how to do this.
I can get the lines having "human" with this and the total count of 'human':

COUNT=`echo $(grep "human" $LFILE | wc -l)`
grep "human" $LFILE

Please help
Thanks

Hi,

Could you please use the following script for same.

count_xyz=0
count_pqrs=0
count_abcd=0
abcd="ABCD"
xyz="XYZ"
pqrs="PQRS"
icount_human=0
count_human=0
while read line
do
check_human=`echo $line | grep "human"`
name_check=`echo $line | grep "name"`
value_name_check=`echo $name_check | awk -F"=" '{print$2}' | grep -v '^$' | cut -f1 -d.`

if [[ -n ${check_human} ]]
then
let "count_human = count_human + 1"
fi
if [[ "$value_name_check" == "$abcd" ]]
then
let "count_abcd = count_abcd + 1"
fi

if [[ "$value_name_check" == "$pqrs" ]]
then
let "count_pqrs = count_pqrs + 1"
fi

if [[ "$value_name_check" == "$xyz" ]]
then
let "count_xyz = count_xyz + 1"
fi

done < "requirement_check_count"

echo "Count for human is=" $count_human
echo "Count for ABCD=" $count_abcd
echo "Count for PQRS=" $count_pqrs
echo "Count for XYZ=" $count_xyz

Output will be as follows then.

$ ksh check_requirement_check_count.ksh
Count for human is= 4
Count for ABCD= 2
Count for PQRS= 1
Count for XYZ= 1

Thanks,
R. Singh

Hi R Singh,

Thanks for your reply. I missed mentioning one info that is the values after "name=" are not static/fixed but can change/dynamic. I used ABCD/PQRS/XYZ as examples only.

Its like extracting the string after "name=" for all lines having "human" and then count the total occurances for each value of "name="

An awk solution:

$ awk '/human/ {humans++; values[$2]++} END {printf "Humans: %s\n", humans; for (i in values) {printf "%s: %s\n", i, values}}' FS== file
Humans: 4
XYZ.: 1
ABCD.: 2
PQRS.: 1

Note that this will pick up anything after = - if you want name= specifically (and not the period on the end) then it will be a little more complex.

Little modification in CarloM code (used match function and pass variable to awk)

awk -v var="human" '/human/{count++; match($0,/=(.+?)$/) ; a=substr($0,RSTART+1,RLENGTH-2);array[a]++} END { print "count of "var"="count; for (i in array) {print "count of name "i "="array;} }' filename

Hello,

here is script solution for same.

> Output_latest
k=0
null=`echo ""`
count_human=0
count=0
zero=0

while read line
do
count=0
check_human=`echo $line | grep "human" | grep -v '^$`
name_check=`echo $line | grep "name" | grep -v '^$'`
set -A value_name_check
value_name_check=`echo $name_check | awk -F"=" '{print$2}' | grep -v '^$' | cut -f1 -d.`

                while read line1
                do
                check_count=`echo $line1 | grep -v '^$' |  grep "$value_name_check".`
                                if [[ -n ${check_count} ]]
                                then
                                let "count = count + 1"
                                fi

                done < "requirement_check_count"
                if [[ -n ${check_human} ]]
                then
                let "count_human = count_human + 1"
                fi
        if [[ "$value_name_check" != "$null" ]]
        then
        echo "Count for" $value_name_check "is" $count >> Output_latest
        fi
done < "requirement_check_count"
echo "Count for human is=" $count_human
 

awk '!x[$0]++' Output_latest

Output will be as follows.

$ ksh check_requirement_check_count1.ksh
Count for human is= 4
Count for ABCD is 2
Count for PQRS is 1
Count for XYZ is 1

Thanks,
R. Singh

The formatting may not be exactly the expected one but all the needed info will still be there:
Assuming your file is exactly formatted as you stated

$ sed '/human/!d;s/.*name=/human\n/' yourfile | sort | uniq -c

Hi ctsgnb,
Kindly please explain the command you have pasted. It gives the output without formatting - its fine.
Also if the line would have been ...
Hello, i am human and name="ABCD" now.

In this case how would we extract ABCD.

thanks in advance!

Try if you want to count all field this

$ awk -F'[ ,=]' '/^Hello/{for(i=1;i<=NF;i++)A[$i]++}END{for (i in A)if(i!="")print "count of " "\""i"\"""=" OFS A}' OFS=\\t file
count of "i"=    4
count of "XYZ."=    1
count of "and"=    4
count of "name"=    4
count of "Hello"=    4
count of "ABCD."=    2
count of "am"=    4
count of "PQRS."=    1
count of "human"=    4

if you want to count only few fields you can do like this

$ awk -F'[ ,=]' '/^Hello/{for(i=1;i<=NF;i++){if(($i~"ABCD")||($i~"PQRS")||($i~"XYZ")||($i~"human"))A[$i]++}}END{for (i in A)print "count of " "\""i"\"""=" OFS A}' OFS=\\t file
count of "XYZ."=    1
count of "ABCD."=    2
count of "PQRS."=    1
count of "human"=    4

Like this ?

sed '/human/!d;s/.*name=//' file | sort | uniq -c
1 Like
grep "human" infile | awk ' { for (i=1;i<=NF;i++)
if(($i) ~ /name=/) count[$i]++} 
END { print "human = "NR;
for (a in count)
{ t=length(a)-6; print substr(a,6,t),"=",count[a]}}'