How to tail sed and awk in one line?

Hello,
I am trying to create an iptables script with tail , sed and awk .

1st Request: Search keyword "secret" in access.log file
2nd Request: Get first column matching lines (ip address)
3rd Request: Save it to a file

This is what I did so far:

grep.sh

#!/bin/bash
while true;
do
tail -f /var/log/apache2/access.log | awk '{print $1}' 
#tail -f /var/log/apache2/access.log | grep --line-buffered secret | awk '/secret/' | awk '{print $1}'     #not working as expected
#tail -f /var/log/apache2/access.log | grep --line-buffered debug | awk '$1' > ip.txt                          #gives all lines but not column1
#echo "iptables -A INPUT -s "$1" -j DROP" >> black
#echo "$A" >> black
#chmod 755 black
#./black
done

It prints out all ip address but not searching for the keyword "secret".
When I clear shebang symbol existing at the beginning of the following lines, it does not print matching output.

Is it possible to assign the pipe output to a variable so that I can use that value in following steps for iptables command?

Thanks
Boris

tail -f /var/log/apache2/access.log | awk '/secret/{print $1}' 

Hello baris35,

If you want to continuously read the lines and search for a string and put that's matched line's first field to an output_file then following may help you too in same.

tail -f /var/log/apache2/access.log | awk '/TEST1/{print $1 >> "Output_file"}'

When you are done with tailing the logs(You could come out of above command) then you could check the file named Output_file for matched logs.

NOTE: In above command you could change search string from TEST1 to your required text.

Thanks,
R. Singh

1 Like

I'm confused. People seem to be ignoring the tail -f flag in the pipelines in this code. Note that the command:

tail -f /var/log/apache2/access.log | awk '{print $1}'

will run forever. No command following this in your script will ever be executed.

If you want to took at the current contents of the log (instead of looking at the current and future contents of the log), use:

awk '{print $1}' /var/log/apache2/access.log

With tail -f , you don't need (or want) a while true loop, but you need to combine all of the tail and awk pipelines into a single pipeline (not in a loop).

Regarding the while true loop, are you concerned that the log file might be rotated/aged/re-written/whatever? If so, does your version of tail have the -F or --retry flags? These will keep looking for the file if it becomes inaccessible, so you keep reading the new log files that are created by normal ageing processes.

The alternate would be that you would have to have some way to terminate your tail and then restart it, all within the while true loop.

Can you put this code into context as to why you have written it like this so far. I'm sure we can help you to find a solution.

Kind regards,
Robin

Hello,
Thanks for your answers. In comply with your suggestions, I changed the script as shown below:

#!/bin/bash
#while true;                                                               ----removed according to your suggestions
#do                                                                           ----removed according to your suggestions
prefix="/sbin/iptables -A INPUT -s "
tail -f /var/log/apache2/access.log | awk '/debug/{print $1 >> "ip2"}' | awk '!a[$0]++' ip2 >> ip | echo "$prefix$ip" > fail2ban_ip | sed -e 's/$/ \
-p tcp -dports 80 -j DROP/' -i fail2ban_ip > block | chmod 755 block | ./block

It did not give an output. Then I tested below from command line, I see that it's giving just an empty file. I checked access.log and I saw searched word in latest lines.

tail -f /var/log/apache2/access.log | awk '/debug/{print $1 >> "ip2"}'

Output file ip2 is empty.
I will keep searching a solution for this

Thanks
Boris

You don't seem to understand the concept of a pipeline. In a pipeline, the standard output from the first command in the pipeline becomes the standard input to the second command in the pipeline. If there are more than two commands in the pipeline, the output from every command in the pipeline but the last become the standards input for the next command in the pipeline. And all stages in the pipeline are free to run asynchronously as long as each command reading from its standard input has data to read and each process writing data to the next element of the pipeline is still there to read from the pipe. But in your pipeline:

tail -f /var/log/apache2/access.log |
  awk '/debug/{print $1 >> "ip2"}' |
  awk '!a[$0]++' ip2 >> ip |
  echo "$prefix$ip" > fail2ban_ip |
  sed -e 's/$/ -p tcp -dports 80 -j DROP/' -i fail2ban_ip > block |
  chmod 755 block |
  ./block

The first awk command writes nothing to standard output. The second awk does not read anything from its standard input. (It reads from a file named ip2 which might or might not exist when that awk tries to open it since it is being created asynchronously by the previous awk . If the file does exist, there is a good chance that it will be processed by the 2nd awk before the 1st awk adds anything to it.) And, the second awk creates a file named ip , but does not write anything to the next stage of the pipeline, but that may be OK because echo never reads from its standard input. Note also that the variable ip which is being passed to echo is not defined by this script. And, since the output from echo is directed to a file, again nothing is written to the standard input of the next stage of the pipeline. But, again, that may be OK because sed when given a filename operand does not read from its standard input. And, sed when invoked with a -i option either gives you a syntax error for an unrecognized option or updates its input file in place. In either of these cases sed writes nothing to its standard output, so the file created by the redirection of standard output from sed will be an empty file. Then we have the next stage of the pipeline ( chmod 755 block ) which neither reads from standard input nor writes to standard output. And, finally, we have the last stage of the pipeline ( ./block ) which either executes an empty file (if the chmod completed before this stage starts) or fails because it is not executable (if the file hadn't been created yet, or had been created but had not yet been made executable).

I don't know what you are trying to do with this pipeline, but there are so many different concurrent operations going on here with nothing to guarantee that they happen in the proper sequence that the actual behavior will vary considerably depending on what shell you use to run this script, what operating system you use to run this script, the number of other processes that are running on the system when you run this script, and the phase of the moon.

I don't have an apache2 access log file to look at, so I have no idea what is in some of the data you seem to be trying to extract from records in that file. And, you haven't really given us an explanation of what you are trying to do with this script. Without a much better description of what you're trying to do and a sample log file to use as a test case, I don't see how I can help you.

1 Like

Hello Don,
Thanks for your help. I understood the point .
Here is the final status:

#!/bin/bash
while true; do
tail -f /var/log/apache2/access.log | awk '/myword/' /var/log/apache2/access.log | awk '{print $1}' | awk '!a[$0]++' > ip | while \ 
inotifywait -e close_write ip; do ./ban.sh; done
done

I also have a script to block ip and it kicks just in second.

Kind regards
Boris

Hello Boris,
I still don't quite think you get it. Without seeing an example of what is in your log file, I'm just guessing, but try this instead of what you have now:

#!/bin/bash
while true
do	awk '/myword/ && !a[$1]++ {print $1}' /var/log/apache2/access.log > ip
	./ban.sh
done

The tail in your pipeline isn't doing anything but wasting system resources. I think the awk above does the same thing as the three awk commands in your pipeline. I don't have an inotifywait utility on my system, but it looks like it is waiting for file opened by the awk command to be closed. If that is what it is doing, it makes MUCH more sense to just wait for the awk command to complete instead of sticking more stuff in a pipeline that doesn't belong in a pipeline. Your awk command output is being redirected to a file, so there is nothing for your while loop to read so your while loop should not be in the pipeline.

If the intent is to call your script every time a line of data is written to the file ip by your awk script, that would be something more like:

#!/bin/bash
while true
do	awk '/myword/ && !a[$1]++ {print $1}' /var/log/apache2/access.log |
	  while IFS= read -r ip
	  do	printf '%s\n' "$ip" >> ip
		./ban.sh
	  done
done

Hope this helps,
Don

---
Note, the missing pipe symbol noted in post #14 in this thread has now been fixed above to avoid confusing anyone else reading this thread.

Hello Don,

Here is the sample log :

11.22.33.44 - - [13/Oct/2016:21:51:06 -0400] "GET /mydrive/admin/load.php?&action=get_current&JsHttpRequest=1-xml HTTP/1.1" 200 510 "http://vps_ip:44056/mydrive/?myword" "Mozilla/5.0"

Below code is not printing ip file

#!/bin/bash
while true
do	awk '/myword/ && !a[$1]++ {print $1}' /var/log/apache2/access.log
	while IFS= read -r ip
	do	printf '%s\n' "$ip" >> ip
		./ban.sh
	done
done

Final working status:

#!/bin/bash
while true 
do awk '/myword/' /var/log/apache2/access.log | awk '{print $1}' | awk '!a[$0]++' > ip | while inotifywait -e close_write ip; do ./ban.sh; done
	PID=`ps -eaf | grep syncapp | grep -v grep | awk '{print $2}'`
if [[ "" !=  "$PID" ]]; then
  echo "killing $PID"
  kill -9 $PID
fi

done

Terminal Output:

root@root:~# ./grep3.sh
Setting up watches.
Watches established.
ip CLOSE_WRITE,CLOSE
Setting up watches.
Watches established.

Could you please let me know if I could make it shorter or make system less busy?

Thanks in advance
Boris

Hello baris35,
My repeated attempts to explain how pipelines work have clearly failed. You are still stringing together things in a pipeline that are completely unrelated to each other.

You say you now have final working code. It contains what appears to be an infinite loop inside an infinite loop. It contains a ps , grep , awk , awk pipeline and an if statement in the outer loop that will never be executed because the inner loop will never terminate. It runs the command ./ban.sh inside the inner loop, but we have no idea what that does, nor why you would want to run it repeatedly. You have given us no indication of why you need an infinite loop waiting for a file to be opened and closed repeatedly then that file that will only be opened and closed once by the preceding code.

You say that the code I suggested doesn't print the list of IP addresses it found. This is true; it only writes them to a file (the file named ip ) just like your earlier code did.

If you're only going to show us code that doesn't work and then complain that our suggestions to help you improve it don't work, there is nothing we can do. If you're willing to write a clear description of what your code is trying to do, explain what your current code is doing correctly, and explain what your current code is failing to do; then we might be able to help you correct your code so it will do what you want it to do.

1 Like

Hello Don,
I mean that it's not creating a file named ip

Thanks for your time and suggestions

Kind regards
Boris

Of course not. At least, not immediately. ip is not written to by the awk command. The awk prints to screen, the while loop reads from stdin --> you're supposed to enter the IPs found manually (which I doubt is what you really want).

As has been pointed out before, the three awk s can be concentrated into one as done in the example before. Redirecting stdout into the file ip PLUS piping it into something (here: while loop) doesn't work - use either. (Ususally, redirection takes precedence)
A while loop checking a closed file is somewhat pointless - either it exists or it does not.
The ban.sh is unknown, and so is grep3.sh , so there is not a snowball's chance in hell for to judge

Wouldn't it make sense you depict the entire situation (file/directory structure, input samples, scripts used, final action/output desired) so you can get useful help?

2 Likes

Ouch. Yes, the script I suggested:

#!/bin/bash
while true
do	awk '/myword/ && !a[$1]++ {print $1}' /var/log/apache2/access.log
	while IFS= read -r ip
	do	printf '%s\n' "$ip" >> ip
		./ban.sh
	done
done

was missing a pipe symbol (as pointed out by RudiC). I intended to write:

#!/bin/bash
while true
do	awk '/myword/ && !a[$1]++ {print $1}' /var/log/apache2/access.log |
	  while IFS= read -r ip
	  do	printf '%s\n' "$ip" >> ip
		./ban.sh
	  done
done

But, as I said before, I have no idea if this is what you want to do since you still have not specified what you are trying to do! And, we have no idea what ./ban.sh does, whether you want to run it each time an IP address is added to the file ip , or if you just want to run it once each time awk processes your apache2 log file.

1 Like

Hello,

At first, I have to express that I admire your approach as you paid attention to all my questions.

Here is my ban.sh file

#!/bin/bash
prefix="/sbin/iptables -A INPUT -s "
while read -r line
do
 echo "${prefix}$line"
done <ip > fail2ban_ip
sed -e 's/$/ -p tcp --dport 44056 -j DROP/' -i fail2ban_ip
chmod 755 fail2ban_ip
./fail2ban_ip
sleep 2
exit 0

Don's latest code is waiting for a new attack at the background, it works!

Thank you Don & Rudic

Kind regards
Boris

When you're using sed anyhow, why, then, the while loop?
How about

sed  's:^:/sbin/iptables -A INPUT -s :; s:$: -p tcp --dport 44056 -j DROP:' ip | sh

to replace the entire script?

1 Like

Thanks Rudic,

#!/bin/bash
sed  's:^:/sbin/iptables -A INPUT -s :; s:$: -p tcp --dport 44056 -j DROP:' ip | sh
exit 0

Kind regards
Boris

When I look back to what has been discussed in this thread, I think the entire overall script can be condensed into one single awk line, making ban.sh unnecessary (unless it is used elsewhere as well). Please try

awk '/myword/ && !a[$1]++ {print "/sbin/iptables -A INPUT -s " $1 " -p tcp --dport 44056 -j DROP"}' /var/log/apache2/access.log | sh

and come back with the results, commented.

1 Like

I was looking at this too. To get the continuous looping effect as updates are added to the log file, I think we need three processes instead of just two:

tail -f  /var/log/apache2/access.log |
  awk '/myword/ && !a[$1]++ {print "/sbin/iptables -A INPUT -s " $1 " -p tcp --dport 44056 -j DROP"}' |
  sh

If the log file rotates, you would need to either restart this script when the log rotates; or, if the tail utility on your system has an option to reopen its input file if the input file name changes, add the appropriate option(s) to make that happen.

1 Like

Just a comment that needs to be tested and proven right or wrong: You may want to play with buffering (e.g. man stdbuf on linux) to improve reaction times ...