Grok filter to extract substring from path and add to host field in logstash

Hii,

I am reading data from files by defining path as *.log etc,

Files names are like app1a_test2_heep.log , cdc2a_test3_heep.log etc

How to configure logstash so that the part of string that is string before underscore (app1a, cdc2a..) should be grepped and added to host field and removing the default host.

Eg:

fileName: app1a_test2_heep.log

host => app1a

Here my path field is like,
path => /data/app1a_test2_heep.log
I want to extract the string before that first underscore and add to host field by removing the default host. What could be the filter for this.

Thanks in advance,
Regards,
Ravi

I am unable to decipher what you are trying to do.
Are you trying to get a list of files to read?
Are you trying to extract a list of hosts from a list of files?
Are you trying to create a list of pathnames to process from a list of files?
Is the list of files in a file, or the current directory, or some other directory?

Please clearly explain what you are trying to do, show us what you have done (using CODE tags), show us the output you're getting from what you have done (using CODE tags), and show us the output you're trying to get (using CODE tags).

What operating system are you using?
What shell are you using?
What tools are you trying to use?

I am trying to extract host name from the filenames.

Hello Ravi,

Could you please try following and let me know if this helps.

echo "/data/app1a_test2_heep.log" | awk '{match($0,/\/.*_/);gsub(/.*\//,X,$0);gsub(/_.*/,Y,$0);print $0}'
OR
echo "cdc2a_test3_heep.log" | awk '{match($0,/\/.*_/);gsub(/_.*/,Y,$0);print $0}'

You can use any of above as per your need and let me know if you have any queries.

Thanks,
R. Singh

1 Like

Could you please explain, why did you use match function here ? what it does here ?

You refused to answer my questions about where your filenames are located and what you're really trying to do. Maybe this will help a little bit:

#!/bin/ksh
for path in /data/*.log *.log
do	host=${path##*/}
	host=${host%%[_.]*}
	printf 'pathname: %s\nhost: %s\n\n' "$path" "$host"
done

This was written and tested using the Korn shell, but will work with any shell that performs basic parameter substitutions as required by the POSIX standards. Depending on what files are present in /data and in the current directory, it produces output similar to the following:

pathname: /data/app1a_test2_heep.log
host: app1a

pathname: /data/cdc2a_test3_heep.log
host: cdc2a

pathname: abc_xyz.log
host: abc

pathname: xyz.log
host: xyz

Thanks Akshay for pointing it out, I was trying first someting else with match before and later used gsub while posting I forgot to remove match from it. It can be as follows too.

echo "cdc2a_test3_heep.log" | awk '{gsub(/.*\//,X,$0);gsub(/_.*/,Y,$0);print $0}'
OR
echo "/data/app1a_test2_heep.log" | awk '{gsub(/.*\//,X,$0);gsub(/_.*/,Y,$0);print $0}'

Thanks,
R. Singh

Okay Ravinder, keep in mind before posting answers, sometimes it confuses readers, ultimately you can simplify like this

$ echo "cdc2a_test3_heep.log" | awk 'gsub(/.*\/|_.*/,"")'
cdc2a

$ echo "/data/app1a_test2_heep.log" | awk 'gsub(/.*\/|_.*/,"")'
app1a