Pattern match with awk

jskobs1 · January 29, 2024, 7:21am

we have data as below.

/* ------- pattern_1 --- */
kjfhas

/* ----------------- string ----------------- */
aadaew

/*--keyword-----*/
2134asdf
@@@@asdf

Requirement is to get keywords b/w comment lines (/-- --/) like below.

pattern_1 
string 
keyword

code:

awk '/\/\*.*[a-z0-9_]+.*\*\// {print $3}' file

it is working for first 2 comment lines since there are spaces. If no spaces in comment line (like 3rd one), not working. any help ?

MadeInGermany · January 29, 2024, 8:03am

The $3 does not fit then.
Most simple: a capture group and a reference.
sed can do it, and perl is the master:

perl -lne 'm#/\*.*?(\w+).*?\*/# and print $1' file

-l strip and print with a newline
-n loop around the input, no default print
-e next argument is perl code

m match
# delimiter
.*? mimimum "catch all"
( ) capture group
$1 reference to 1st capture group
\w a "word" character

This would also show comment strings that are preceded or succeeded by other text:

pretext /* -- comment -- */ posttext

And

perl -lne 'print $1 while m#/\*.*?(\w+).*?\*/#g' file

would even print repeated comments:

pretext /* -- comment1 -- */ midtext /* -- comment2 -- */ posttext

munkeHoller · January 29, 2024, 8:45am

another possible

grep '/\*' file | tr -d '/ *-'
pattern_1
string
keyword

jskobs1 · January 29, 2024, 1:34pm

Hi @MadeInGermany, Thanks much for the useful information.

extension of the requirement is to split the file into 3 and create as 3 separate files as below:

pattern_1.txt

/* ------- pattern_1 --- */
kjfhas

string.txt

/* ----------------- string ----------------- */
aadaew

keyword.txt

/*--keyword-----*/
2134asdf
@@@@asdf

below awk code is working if there are spaces in comment lines. If there are no spaces (like 3rd one), it's not working

awk '/\/\*.*[a-z0-9_]+.*\*\// {if (x) close(x); split($0,a," "); if (a[3] != "") {x=a[3]".jil"} else {next}} {if (x) print > x}'  file

would like to know can this be achievable in perl ?

MadeInGermany · January 29, 2024, 3:05pm

Sure.

perl -lpe 'if (m#/\*.*?(\w+).*?\*/#) { open(FH, ">", $1.".jil"); select FH; }' file

-p loop around the input, default print

open FH, ">", $1.".jil" handle FH, for writing, filename $1.jil
. string concatenation operator
select FH use FH as default

munkeHoller · January 29, 2024, 3:45pm

awk '$0 ~ /^\/\*/ { outputFile=gensub(/[ \*\-\/]/, "","G" ) ".txt"}
{ print $0 > outputFile }
' file

DrScriptt · February 27, 2024, 2:54am

Here's a quick test of GNU awk on Linux.

awk '($1 ~ /^\/\*/){sub(/\/\* *--* */, "", $0); sub(/ *--* *\*\//, "", $0); print}'

Work on the entire line because the number of fields is variable.

sub(/pattern/, "replacement", field) is your friend.

/\/\* *--* */ looks for a literal forward slash / followed by a literal asterisk *, followed by zero or more space characters, followed by one or more hyphens, followed by zero or more space characters.

"", replaces the previous pattern with nothing.

$0 operates on the entire line.

The second sub(...) does similar, but it looks for zero or more spaces followed by one or more hyphen characters, followed by zero or more space characters, followed by a literal asterisk *, followed by a literal forward slash /.

Given that this is simply substitution regular expressions, you could probably do this with sed. But you asked about awk, so that's how I answered.

N.B. Not all awk's are created equally. I usually use nawk on Solaris as awk is basic.