Grep command to search a regular expression in a line an only print the string after the match

Hello,

one step in a shell script i am writing, involves Grep command to search a regular expression in a line an only print the string after the match

an example line is below

/logs/GRAS/LGT/applogs/lgt-2016-08-24/2016-08-24.8.log.zip:2016-08-24 19:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

The line can be of variable length but will always have a date time stamp
In above line the timestamp is 2016-08-24 19:12:48,602

i want the output of the command to be

[ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

if i run the below command i get the timestamp as output

echo $line | grep -Eo '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-3][0-9][ ][0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]'

Please suggest.

I'm afraid you can't remove the search pattern from grep 's result. Either use a different pattern ( e.g. "[ttp" ), or use a different tool ( like sed or awk )

1 Like

Hello Ramneekgupta91,

Could you please try following(tested with GNU awk ).

awk --re-interval '{match($0,/[0-9]{4}-[0-2][0-9]-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9].*/);print substr($0,RSTART+24,RLENGTH-24)}'  Input_file
OR
awk --re-interval '{match($0,/[0-9]{4}-[0-2][0-9]-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9],[0-9]{3} .*/);print substr($0,RSTART+24,RLENGTH-24)}'  Input_file

Output will be as follows.

[ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

Adding more solution here.

awk --re-interval '{sub(/.*[0-9]{4}-[0-2][0-9]-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9],[0-9]{3} /,X,$0);print}'  Input_file

EDIT: Above solutions may get month more than 12 and time more than 23 too(though one could trust that data will be in correct time format but for exact match and safer side), so edited regex above solutions as follows.
So let's say we have following Input_file:

/logs/GRAS/LGT/applogs/lgt-2016-08-24/2016-08-24.8.log.zip:2016-08-24 19:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-08-24/2016-08-24.8.log.zip:2016-18-24 19:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S$$$$$$$"
/logs/GRAS/LGT/applogs/lgt-2016-28-24/2016-08-24.8.log.zip:2016-28-24 59:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-29-24/2016-08-24.8.log.zip:2016-29-24 59:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-29-24/2016-08-24.8.log.zip:2016-12-24 59:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-29-24/2016-08-24.8.log.zip:2016-11-24 59:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-29-24/2016-08-24.8.log.zip:2016-13-24 59:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"
/logs/GRAS/LGT/applogs/lgt-2016-29-24/2016-08-24.8.log.zip:2016-13-24 79:12:48,602 [ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

I had put some more scenarios in it by adding 29th month in dates and having more than 59 mins of time too. So following code may resolve those things.

awk --re-interval '{match($0,/[0-9]{4}-(0[1-9]||1[0-2])-(0[1-9]||1[0-9]||2[0-9]||3[0-1]) ([0-1][1-9]||[0-2][0-3]):[0-5][0-9]:[0-5][0-9],[0-9]{3} .*/);if(substr($0,RSTART+24,RLENGTH-24)){print substr($0,RSTART+24,RLENGTH-24)}}'  Input_file

Output will be as follows.

[ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

Because only 1st line meets the criteria so it is printing only that one.

Adding a non-one liner form of solution too now.

awk --re-interval '{
                        match($0,/[0-9]{4}-(0[1-9]||1[0-2])-(0[1-9]||1[0-9]||2[0-9]||3[0-1]) ([0-1][1-9]||[0-2][0-3]):[0-5][0-9]:[0-5][0-9],[0-9]{3} .*/);
                        if(substr($0,RSTART+24,RLENGTH-24)){
                                                                print substr($0,RSTART+24,RLENGTH-24)
                                                           }
                   }
                  '   Input_file
 

Thanks,
R. Singh

1 Like

Depending on your shell you could do something like:

echo ${line/*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-3][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]/}
sed -n '/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\},[0-9]\{3\}/s/.*[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\},[0-9]\{3\} *\(.*\)/\1/p' logfile

rdrtx1's impressing proposal boiled down somewhat:

sed -n '/^.*[0-9]\{4\}\(-[0-9]\{2\}\)\{2\} \([0-9]\{2\}:\)\{2\}[0-9]\{2\},[0-9]\{3\} \(.*\)/ s//\3/p' file
[ttp-/57.20.70.159:8111-35] ERROR com.lufthansa.lgt.exception.filter.LGTExceptionResolver - The error is : For input string: " S"

or, if your sed offers EREs, try

sed -En '/^.*[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:?){3},[0-9]{3} (.*)/ s//\3/p' file

If you put your sed command in a file, it shouldn't be a problem.

sed -n -f commands.sed

Your commands.sed file would be something like:
/regular expression you are looking for/{
s/xxx/xxx/
p
d
}

The "-n" prevents it from displaying a line without a p command. Change the s command to format it as needed, no problem in using more than one. The p will display it and the d deletes it (perhaps not necessary).

HTH

There is no use of -E (ERE) so the default (BRE) is okay to get the timestamp:

timestamp='[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-3][0-9][ ][0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]'
echo "$line" | grep -o "$timestamp"

sed uses BRE (by default), and you want to get everything behind the timestamp, so you simply need to cut it - substitute it with nothing

timestamp='[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-3][0-9][ ][0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9][0-9][0-9]'
echo "$line" | sed "s/$timestamp//"

and maybe want to cut everything before the timestamp and some space after it

 echo "$line" | sed "s/.*$timestamp *//"

Thanks R. Singh for the information.

I have used

awk --re-interval '{sub(/.*[0-9]{4}-[0-2][0-9]-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9],[0-9]{3} /,X,$0);print}'  Input_file

and it works perfectly fine.

Though i could only understand the regex part of the command.
Can you help me explain the meaning of each character in the command apart from regex

Also could you guide me on how can i learn awk and sed ?

Thanks
Ramneek

Hello Ramneekgupta91,

Following may help you in same.

awk --re-interval   ####Enable the use of interval expressions in regular expression matching.
'{sub(              #### sub is awk's built-in keyword which substitutes the matching pattern with given pattern into a variable or a line(depending upon whatever you are mentioning in sub)
/.*[0-9]{4}         #### .* means take everything from starting to [0-9](means digits from 0 to 9) {4} digits should come 4 times eg--> 2016 is a year which has 4 digits in it similarly to match any year here.
-[0-2][0-9]         #### - means -(dash) only [0-2] means match digit from 0 to 2(to match any date).
-[0-9]{2}           #### - means -(dash) only where 0 to 9 comes 2 times,{2} denotes 2 continuous occurrences of 0-9 digits like dates.
[0-2][0-9]          #### [0-2] means from 0 to 2 any digit and [0-9] means from 0 to 9 any digits, so any combination could come of these eg--> 09 or 07 or 21 etc for hours.
:[0-5][0-9]         #### :(colon) [0-5] means from 0 to 5 any digit, [0-9] means from 0 to 9 digits, so their combinations should match here, eg--> 51 or 02 etc for minutes.
:[0-5][0-9]         #### :(colon) [0-5] means from digit 0 to 5 [0-9] means from digit 0 to 9, so their combinations should match here ,eg--> 51, 23, 02 etc.
,[0-9]{3} /         #### match [0-9] 0 to 9 digits {3}(3 times continuously) with a space after them(as per your Input_file shown).
,X,                 #### As mention above we could replace pattern with any variable or value so here as per your requirement I am substituting here(above regex) with X(a NULL value).
$0);                #### Mentioning the $0(which is complete current line).
print}              #### print the line(newly substituted pattern line).
'  Input_file       #### Mentioning Input_file here.
 

But I would suggest to use other solution for better REGEX matching.

awk --re-interval '{match($0,/[0-9]{4}-(0[1-9]||1[0-2])-(0[1-9]||1[0-9]||2[0-9]||3[0-1]) ([0-1][1-9]||[0-2][0-3]):[0-5][0-9]:[0-5][0-9],[0-9]{3} .*/);if(substr($0,RSTART+24,RLENGTH-24)){print substr($0,RSTART+24,RLENGTH-24)}}'  Input_file

The best way to learn anything is practice and reading, so keep asking good questions(you should do give some trys too before posting for learning) and
try to learn from forum's posts(As this is one of the BEST forum for learning UNIX/LINUX/Scripting/Admin.), reading good books, reading man pages etc.

Thanks,
R. Singh