Awk: passing shell variables through and extracting text

bathtime · February 19, 2018, 9:35am

Hello, new to the forums and to awk. Glad to be here. :o

I want to pass two shell ( #!/bin/sh ) variables through to awk and use them. They will determine where to start and stop text extraction.

The code with the variables hard-coded in awk works fine; the same code, but with the shell variables, does not.

Here is the hard-coded (working) code:

echo "Here is a blah ... blah ... blah very nice string." | awk -F 'Here is a' '{print $2}' RS='very nice string.'

The result�and it is the result I seek�is:

blah ... blah ... blah

Here is the code with variables:

textA="Here is a"
textB="very nice string."

echo "Here is a blah ... blah ... blah very nice string." | awk  -v var1="$textA" -v var2="$textB" -F var1 '{print $2}' RS=var2

The result is that just a blank line�not nothing, but a return carriage.

Any ideas?

rdrtx1 · February 19, 2018, 10:09am

textA="Here is a"
textB="very nice string."

echo "Here is a blah ... blah ... blah very nice string." | awk  -v var1="$textA" -v var2="$textB" '{sub(var2, ""); sub(var1, ""); print}'

bathtime · February 19, 2018, 1:52pm

Thank you for such a timely response!

I don't think I was clear in my question: I wanted just the text in between the two phrases and that alone�nothing except that. If there is no match of the two phrases in one field then I want nothing to be returned, so if there is additional text before and after textA and/or textB, that text should not be there.

Okay, let me just give you guys the code I'm using and not dilly dally:

Working code to extract the local weather in my city.

#!/bin/bash

wget -q -O- "hamiltonweather.ca/traffic/" | awk -F '<h2>' '{sub(/ �/, ""); print $2}' RS='</h2>' | grep -o '[^,]*$'

Returns:

Light Rain 4.3�C

The code works beautifully as is (I added some of your code in there rdrtx to get rid of a dot! Grep clears a blank line at the end) when hard-coded, but as soon as I add the above variables it doesn't work as intended:

Unworking Code with variables added:

#!/bin/bash

textBefore='<h2>'
textAfter='</h2>'

wget -q -O- "hamiltonweather.ca/traffic/" | awk -v var1="$textBefore" -v var2="$textAfter" -F var1 '{sub(/ �/, ""); print $2}' RS=var2 | grep -o '[^,]*$'

Returns nothing at all. How to make the awk understand and use the variables as in the working code?

rdrtx1 · February 19, 2018, 1:58pm

textA="Here is a"
textB="very nice string."

echo "123123123 Here is a blah ... blah ... blah very nice string. aaa aaa aaa" | awk  -v var1="$textA" -v var2="$textB" '$0 ~ var1 ".*" var2 {sub(var2 ".*", ""); sub(".*" var1, ""); print}'

Don_Cragun · February 19, 2018, 2:38pm

Your sample data included in your echo statement is nothing at all like the data you are looking for in the output from wget .
Maybe the following would come closer to doing what you want with a little less bother:

#!/bin/bash
textBefore='<h2>'
textAfter='</h2>'

wget -q -O- "hamiltonweather.ca/traffic/" | awk -F "($textBefore|$textAfter)" 'NF==3{print $2}'

I can't test it on my laptop since it doesn't have a wget utility, but it should come close to doing what you want. If it doesn't, please show us the output the above wget command produces (in CODE tags), so we can see what the data you're trying to process really looks like.

Please also get into the habit of letting us know what operating system and shell you're using whenever you start a thread in this forum. The utilities (and the options they support) vary from system to system and shell features vary from shell to shell. Telling us details about your environment helps us provide you with suggestions that will work in your environment.

And, please use CODE tags (not ICODE and not QUOTE tags) for sample full-line and multi-line input, output, and code segments. Use ICODE tags when displaying partial-line sample input, output, and code segments in-line with other text.

RudiC · February 19, 2018, 4:26pm

You're not too far off. Try

wget -q -O- "hamiltonweather.ca/traffic/" | awk -F "$textBefore" -v RS="$textAfter"  '{sub(/ �/, ""); print $2}'

EDIT: Or

wget -q -O- "hamiltonweather.ca/traffic/" | awk -vT1="$textBefore" -vT2="$textAfter" 'match ($0, T1 ".*" T2) {print substr ($0, RSTART+4, RLENGTH-9)}'

bathtime · February 19, 2018, 5:11pm

rdrtx1:

textA="Here is a"
textB="very nice string."

echo "123123123 Here is a blah ... blah ... blah very nice string. aaa aaa aaa" | awk  -v var1="$textA" -v var2="$textB" '$0 ~ var1 ".*" var2 {sub(var2 ".*", ""); sub(".*" var1, ""); print}'

Works like a charm!

I can't test it on my laptop since it doesn't have a wget utility, but it should come close to doing what you want. If it doesn't, please show us the output the above wget command produces (in CODE tags), so we can see what the data you're trying to process really looks like.

Please also get into the habit of letting us know what operating system and shell you're using whenever you start a thread in this forum. The utilities (and the options they support) vary from system to system and shell features vary from shell to shell. Telling us details about your environment helps us provide you with suggestions that will work in your environment.

And, please use CODE tags (not ICODE and not QUOTE tags) for sample full-line and multi-line input, output, and code segments. Use ICODE tags when displaying partial-line sample input, output, and code segments in-line with other text.

Thank you. And for me it does indeed work. I found that I didn't even need to use the NF==3 code as it was my original code that necessitated its use in the first place. :rolleyes:

I'm on Debian, using bash, mksh, and sh, all of which work with this code... It seems I used the incorrect code tags�I see them now. I thought something was a trifle amiss. :rolleyes:

RudiC,

wget -q -O- "hamiltonweather.ca/traffic/" | awk -F "$textBefore" -v RS="$textAfter"  '{sub(/ �/, ""); print $2}'

This code worked, but for some reason it adds another line as mine had done. It's great to see some resolution on my own attempt!

wget -q -O- "hamiltonweather.ca/traffic/" | awk -vT1="$textBefore" -vT2="$textAfter" 'match ($0, T1 ".*" T2) {print substr ($0, RSTART+4, RLENGTH-9)}'

Also, works. Looks like a lot more code to get the job done though. I've have to look at man pages and find out what it all does. This is my third day using awk.

bathtime · February 20, 2018, 5:16pm

Not sure if anyone cares, but I ran some speed tests on the code you guys helped me with:

#!/bin/sh

# Run:    time ./testtime

A='center">'
B='</p>'
C='>'
D='&deg'

count=0
while [ $count -le 10000 ]; do

	# mawk  -v v1="$A" -v v2="$B" -v v3="$C" -v v4="$D" '$0 ~ v1 ".*" v2 {sub(v2 ".*", ""); sub(".*" v1, ""); print $0; next;} $0 ~ v3 ".*" v4 {sub(v4 ".*", ""); sub(".*" v3, ""); print $0 "�C"; exit}' canada.html

        # Var  Run x  bash input     CPU
	
        # awk  10000x mksh piped   @ 55%  0m23.27s real     0m14.97s user     0m09.03s system
	# awk  10000x mksh unpiped @ 51%  0m21.67s real     0m12.68s user     0m08.17s system
	# nawk 10000x mksh unpiped @ 51$  0m21.59s real     0m12.63s user     0m08.27s system
	# gawk 10000x mksh unpiped @ 51%  0m21.64s real     0m12.55s user     0m08.40s system
	# mawk 10000x bash unpiped @ 53%  0m07.83s real     0m04.54s user     0m02.81s system
	# mawk 10000x mksh unpiped @ 51%  0m07.00s real     0m04.29s user     0m02.18s system
	# mawk 10000x dash unpiped @ 52%  0m06.49s real     0m04.02s user     0m01.91s system
	# mawk 10000x sh   unpiped @ 52%  0m06.40s real     0m04.01s user     0m01.93s system


	# mawk -vT1="$A" -vT2="$B" -vT3="$C" -vT4="$D" 'match ($0, T1 ".*" T2) {print substr ($0, RSTART+146, RLENGTH-150); next;} match ($0, T3 ".*" T4) {print substr ($0, RSTART+31, RLENGTH-35) "�C"; exit;}' canada.html

	# Var  Run x  bash input     CPU
	
	# awk  10000x mksh piped   @ 55%  0m21.14s real     0m12.78s user     0m09.29s system
	# awk  10000x mksh unpiped @ 51%  0m19.70s real     0m10.71s user     0m08.22s system
        # nawk 10000x mksh unpiped @ 51%  0m19.68s real     0m10.72s user     0m08.27s system
        # gawk 10000x mksh unpiped @ 51%  0m19.68s real     0m10.81s user     0m08.15s system
	# mawk 10000x bash unpiped @ 53%  0m07.70s real     0m04.59s user     0m02.71s system
	# mawk 10000x mksh unpiped @ 51%  0m07.01s real     0m04.33s user     0m02.16s system
	# mawk 10000x sh   unpiped @ 52%  0m06.34s real     0m04.00s user     0m01.86s system
	# mawk 10000x dash unpiped @ 52%  0m06.33s real     0m03.99s user     0m01.87s system

	count=$((count+1))
done
beep

RudiC's code ran fastest in these tests with rdrtx1's code close behind. Mawk was fastest overall unpiped and run in sh/dash. I would have tested the other code but all other code did not work with this site. It also seems that these two implementations worked on every site I used them in!

Also, I had the output print to screen as I have not yet looked into redirecting it. The input was saved html script from a weather website; no net was used.

Amazing the difference it can make to switch a few things around.

One more thing. Here is a weather script that anyone can use, incase someone bumps into this on the net through a search:

#!/bin/sh

# Hamilton is used in this example. Replace with your own city's link. 
URL="accuweather.com/en/ca/hamilton/l8l/weather-forecast/55490"

# Formulas:
# RSTART  = +length of A
# RLENGTH = -length of (A + B)

A="txt : '"
B="',"
C="temp:'"
D="',  realfeel:'"

# Use 'echo' as a hack to make both print outputs print on the same line
#echo $(wget -q -O- "$URL" | mawk -v v1="$A" -v v2="$B" -v v3="$C" -v v4="$D" '$0 ~ v1 ".*" v2 {sub(v2 ".*", ""); sub(".*" v1, ""); print $0; next;} $0 ~ v3 ".*" v4 {sub(v4 ".*", ""); sub(".*" v3, ""); print $0 "�C"; exit;}')
echo $(wget -q -O- "$URL" | mawk -vT1="$A" -vT2="$B" -vT3="$C" -vT4="$D" 'match ($0, T1 ".*" T2) {print substr ($0, RSTART+7, RLENGTH-9); next;} match ($0, T3 ".*" T4) {print substr ($0, RSTART+6, RLENGTH-20) "�C"; exit;}')

Anyways, that's all. Thank you guys!