Script to find a word in multiline text

I have a problem with this bash/awk script. I am trying to find words like SELECT, FROM, or WHERE and to print the number of the line. I've tried many things but always I got many numbers displayed...

#!/bin/bash

search_words() {
    local sql_array="$1"
    local query="$2"
    local words=("${@:3}")
    # local results=()
    declare -A results  # asoc. array

    # create array of words, this starts with a query word like SELECT
    local all_words=("$query" "${words[@]}")
    # iterate through the words
    for word in "${all_words[@]}"; do
    result=$(awk -v word="$word" '{
        if( match($0, word) ) { 
            print NR; exit;  # ONLY THE LINE NUMBER OF THE MATCH IS NEEDED HERE. THIS COULD BE PROBLEM.
        }
    }' <<< "$sql_array")
# read -p "WORD:$word RESULT:$result searched in $sql_array"
        if [ -z "$result" ]; then
            result=0
        fi
        results["$word"]="$result"  # OR MAYBE HERE IS PROBLEM IN SAVING THE RESULT?
    done

    # RETURN THE RESULTS ARRAY ... OR MAYBE EVEN HERE?
    echo "${results[@]}"
}

# MAIN TEST WITH A DUMMY CODE
clear
sql_array="
    'SELECT'    => 'f.*',
    'FROM'      => array(
        FORUMS_TABLE        => 'f'
    ),
    'LEFT_JOIN' => array(),
    'ORDER_BY'  => 'f.left_id',
    'WHERE'     => 'f.id = 1'
"

# HERE ARE THE QUERY WORDS
words=("FROM" "LEFT_JOIN" "ON" "WHERE")

# LET'S CALL THE FUNCTION
results=$(search_words "$sql_array" "SELECT" "${words[@]}")

echo "SELECT: ${results["SELECT"]}"
read -p "Press key to continue"
# This should print the words from array with each one number:
for word in "${!results[@]}"; do
    echo "$word: ${results[$word]}"
done

for word in "${words[@]}"; do
    echo "$word: ${results[$word]}"
done

@visitor123

NB: I have not looked at your script in any detail, but to find the line(s) a word appears on, try something like the following

cat visitor123.txt
#this contains a simple sql query

SELECT
  first_name,
  last_name,
  salary
FROM 
  employees
where 
  salary > 3800;


SELECT empid, first_name, last_name, startdate FROM hrinfo;

#awk example

awk '/FROM|WHERE/ {print NR}' visitor123.txt
7
13

#grep case insensitive example

grep -n -i -E '(WHERE|FROM)' visitor123.txt
7:FROM 
9:where 
13:SELECT empid, first_name, last_name, startdate FROM hrinfo;

there are many ways .. .perl also a popular choice.

I have some success using awk, but I fail when it comes to save or access the values in bash associative array. Can you help with the last part? This is the outcome so far:

WORD:SELECT RESULT:2 searched in sql_array
WORD:FROM RESULT:3 searched in sql_array
WORD:LEFT_JOIN RESULT:6 searched in sql_array
WORD:ON RESULT: searched in sql_array
WORD:WHERE RESULT:8 searched in sql_array
SELECT: 
FROM: 
LEFT_JOIN: 
ON: 
WHERE:
#!/bin/bash

# Funkce pro prohledání slov v textu
search_words() {
    local sql_array="$1"
    local query="$2"
    local words=("${@:3}")
    # local results=()
    declare -A results=()  # Deklarace asociativního pole pro ukládání výsledků

    sql_array=$(echo "$sql_array" | sed -e 's/^[ \t]*//') # Trim begin of the lines

    # Vytvoření pole slov s query na prvním místě
    local all_words=("$query" "${words[@]}")
    # Procházení slov a hledání jejich výskytu v stringu
    for word in "${all_words[@]}"; do
    result=$(echo "$sql_array" | awk -v word="$word" '{
	    if ($0 ~ word) {
	        print NR;  # Vypíše číslo řádku, pokud se slovo nachází na začátku řádku
	        next;  # Přejde na další řádek
	    }
           }')
read -p "WORD:$word RESULT:$result searched in sql_array"
        if [ -z "$result" ]; then
            result=0
        fi
        # results+=("$result")
        results["$word"]="$result"  # Uložení výsledku pod klíčem slova
    done

    # Vrácení asociativního pole výsledků
    # echo "${results[@]}"
    results+=("$result")
}

# Hlavní kód
clear
sql_array="
    'SELECT'    => 'f.*',
    'FROM'      => array(
        FORUMS_TABLE        => 'f'
    ),
    'LEFT_JOIN' => array(),
    'ORDER_BY'  => 'f.left_id',
    'WHERE'     => 'f.id = 1'
"

# Pole se slovy, která hledáme
words=("FROM" "LEFT_JOIN" "ON" "WHERE")

# Volání funkce pro prohledání slov
results=$(search_words "$sql_array" "SELECT" "${words[@]}")

# Vypsání výsledků
echo "SELECT: ${results[0]}"
for ((i=1; i<${#words[@]}+1; i++)); do
    echo "${words[i-1]}: ${results[i]}"
 done
read -p "stop"
echo "SELECT: ${results["SELECT"]}"
read -p "Press key to continue"
# Vypsání výsledků
for word in "${!results[@]}"; do
    echo "$word: ${results[$word]}"
done

for word in "${words[@]}"; do
    echo "$word: ${results[$word]}"
done

Use https://www.shellcheck.net (or a locally installed version) - it might give you some hints on what you're attempting to "break" in your code, and why.

Just to give you more insight:
In a function, using declare -A results has exactly the same effect as using local -A results - your array named results exists only inside the function.

results+=("$result") inside function is (most likely) not used correctly - it appends your associative array with some "indexed" elements.

Since you do not modify a global results array, and you're assigning an empty string to your results variable outside the function, there's no way it can display what you want correctly.

2 Likes

Thank you.
"Corrected" the global declaration. And this:
results["$word"]="$result"

I got this error

results=$(search_words "$sql_array" "SELECT" "${words[@]}")
^-- [SC2178](https://www.shellcheck.net/wiki/SC2178) (warning): Variable was used as an array but is now assigned a string.

Yes, either you stick to only modifying the global array variable contents (inside the function), and not try to assign anything outside (as the function's execution result)
OR
you assign to a global variable (and optionally convert to an array) function's stdout
- currently you're trying to mix both, don't.

Your function does not echo anything, so there's nothing to assign. And even if it did echo something to stdout, it would be assigned as a string to any globally declared variable, when you assign a result of command substitution variable=$(function)

Good practice - try not to overuse the same names for globally declared and locally declared (inside functions) variables, using distinct names will make your life easier.

2 Likes

So the problem is that the bash function will always return string?
So comment # echo "${results[@]}"
and this should be called with no results assign?
$(search_words "$sql_array" "SELECT" "${words[@]}")

There's no assignment statement variable=$(search_words "$sql_array" "SELECT" "${words[@]}"),
then command substitution $(search_words "$sql_array" "SELECT" "${words[@]}") on its own
is pretty much useless, just run search_words "$sql_array" "SELECT" "${words[@]}" as a separate command.

1 Like

Thank you so I have solved it:

#!/bin/bash
declare -A results=()  

search_words() {
    local sql_array="$1"
    local query="$2"
    local words=("${@:3}")

    sql_array=$(echo "$sql_array" | sed -e 's/^[ \t]*//') # Trim begin of the lines

    # COMPLETE WORDS ARRAY
    local all_words=("$query" "${words[@]}")
    # ITERATE THROUGH WORDS TO BE FOUND
    for word in "${all_words[@]}"; do
        result=$(echo "$sql_array" | awk -v word="$word" '$0 ~ word {print NR; exit}')
read -p "WORD:$word RESULT:$result searched in sql_array"
        if [ -z "$result" ]; then
            result=0
        fi

        # SAVE THE RESULT TO GLOBAL ARRAY
        results["$word"]="$result"
    done
}

# MAIN CODE
clear
sql_array="
    'SELECT'    => 'f.*',
    'FROM'      => array(
        FORUMS_TABLE        => 'f'
    ),
    'LEFT_JOIN' => array(),
    'ORDER_BY'  => 'f.left_id',
    'WHERE'     => 'f.id = 1'
"

# WORDS TO FIND (SECONDARY ONES)
words=("FROM" "LEFT_JOIN" "ON" "WHERE")

# SIMPLE CALL
search_words "$sql_array" "SELECT" "${words[@]}"

# ITERRATE THROUGH
for word in "${!results[@]}"; do
    echo "WORD:$word RESULT:${results[$word]} searched in sql_array"
done

echo "--------------"
# EXPECTED RESULTS:
echo "SELECT:${results[SELECT]}"
echo "FROM: ${results[FROM]}"
echo "LEFT_JOIN: ${results[LEFT_JOIN]}"
echo "ON: ${results[ON]}"
echo "WHERE: ${results[WHERE]}"