awk cut column based on string

Shirishlnx · February 13, 2012, 5:58am

Using awk I required to cut out column contain word "-Tag" regardles of any order of contents and case INsensitive

-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical

Please Guide ......

--Shirish Shukla

---------- Post updated at 05:58 AM ---------- Previous update was at 05:50 AM ----------

Have came with this but it's case sensitive

# echo "-tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical" | awk '{for(i=1;i<=NF;i++) if($i ~ /tag/) print $i}'
-tag:messages

:rolleyes:

itkamaraj · February 13, 2012, 6:01am

 
$ cat test.txt
-tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-TAG:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-tAG:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
$ nawk '{for(i=1;i<=NF;i++){if($i~/[Tt][Aa][Gg]/)print $i}}' test.txt
-tag:messages
-TAG:messages
-Tag:messages
-tAG:messages

balajesuri · February 13, 2012, 6:04am

grep -io "\-tag:messages" inputfile

Shirishlnx · February 13, 2012, 6:19am

@itamaraj

Sorry nawk not installed have to achieve via awk only ...
Thanks..

---------- Post updated at 06:19 AM ---------- Previous update was at 06:16 AM ----------

@balajesuri

Thanks am aware it ... but i want to achieve this via awk only ...

--Shirish

itkamaraj · February 13, 2012, 6:41am

use awk instead of nawk

Shirishlnx · February 13, 2012, 11:17am

Thanks All !!!

Here what had used, IGNORECASE=1 with awk

[root@nagios Shirish@Shukla]# echo "-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical" |  \ 
> awk 'IGNORECASE=1 {for(i=1;i<=NF;i++) if($i ~ /tAg/) print $i}'
-Tag:messages
[root@nagios Shirish@Shukla]# echo "-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical" | \
> awk 'IGNORECASE=1 {for(i=1;i<=NF;i++) if($i ~ /tag/) print $i}'
-Tag:messages
[root@nagios Shirish@Shukla]# echo "-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical" | \
> awk 'IGNORECASE=1 {for(i=1;i<=NF;i++) if($i ~ /taG/) print $i}'
-Tag:messages
[root@nagios Shirish@Shukla]#

itkamaraj · February 14, 2012, 2:29am

whenever you post your question, post your OS and shell details.

that is easy for giving suggestions and ideas

---------- Post updated at 12:59 PM ---------- Previous update was at 12:58 PM ----------

As IGNORECASE only works in gnu awk.

otheus · February 14, 2012, 4:45am

The standard awk is fairly weak. If you don't have access to GNU awk, install it. All the above solutions rely on GNU awk or nawk or at least Sun's xpg awk (which is an old version of nawk).

awk -v IGNORECASE=1 '{if( match($0,/-Tag:([^[:space:]]*)/,found)) print found[1]; }'

With nawk you might do something similar, but using sub() because nawk's match() isn't as cool as GNU's.

ctsgnb · February 14, 2012, 5:03am

awk -F"[ :-]" 'tolower($2)~/tag/{print "-"$2":"$3}' yourfile

or

awk '{split($1,a,":")}tolower(a[1])~/-tag/{print $1}' yourfile

or

awk '{NF=1;split($1,a,":")}tolower(a[1])~/-tag/' yourfile

$ cat tst
-tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-TAG:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
-tAG:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical
$ awk -F"[ :-]" 'tolower($2)~/tag/{print "-"$2":"$3}' tst
-tag:messages
-TAG:messages
-Tag:messages
-tAG:messages
$ awk '{split($1,a,":")}tolower(a[1])~/-tag/{print $1}' tst
-tag:messages
-TAG:messages
-Tag:messages
-tAG:messages
$ awk '{NF=1;split($1,a,":")}tolower(a[1])~/-tag/' tst
-tag:messages
-TAG:messages
-Tag:messages
-tAG:messages

Scrutinizer · February 14, 2012, 8:04am

Are you certain about that otheus? I was under the impression that /usr/xpg4/bin/awk was introduced to Solaris later and does more to approach Posix standards than nawk on Solaris does, which stands for new awk, but that is only relative to ancient original awk...

otheus · February 14, 2012, 9:54am

Not 100% sure, but I know Kernighan was maintaining nawk at least through 2007, and the open BSD project has been maintaining it since, and Solaris, well, I think they brought awk over from System V back in the 90s or maybe even before then with SunOS 4.x

---------- Post updated at 03:54 PM ---------- Previous update was at 03:16 PM ----------

Follow-up:
From the FIXES file in awk.zip downloaded from Kernighan's web page:

Jun 1, 2003:
	subtle change to split: if source is empty, number of elems
	is always 0 and the array is not set.

From Solaris 10 (2005) xpg-awk:

$ /usr/xpg4/bin/awk 'BEGIN { print split(null,out,FS) }' </dev/null
0

So it would seem Solaris DID keep nawk up-to-date w.r.t Kernighan's version.

Then again....

Jan 1, 2002:
	length(arrayname) returns number of elements; thanks to 
	arnold robbins for suggestion

And on Sun's implementation:

$ /usr/xpg4/bin/awk 'BEGIN { split("test",out,/es/); print out[1]; print length(out)}' </dev/null
t
0

Scrutinizer · February 14, 2012, 10:49am

@otheus, Interesting, I think though you should be comparing these Solaris nawk, not /usr/xpg4/bin/awk, which should not be following Kernighan's changes, but rather strive to be Posix compliant, no? What is the output of the same commands with nawk ?

otheus · February 14, 2012, 12:03pm

First, I think you should split this thread into the Underground forum, for instance, and link to it

Second, Kerhnighan *is* the author of nawk. What Solaris did to what they call nawk is anyone's guess.

Third, Solaris lists the nawk man page and xpg4/awk man page as the same entity (yet oddly, the files differ vastly in size).

Fourth, nawk explicitly errors with length(arrayname):

$ nawk 'BEGIN { split("test",out,/es/); print out[1]; print length(out)}' </dev/null
t
nawk: can't read value of out; it's an array name.
 source line number 1

Scrutinizer · February 14, 2012, 1:19pm

I think you are right, let's do that if you think it is interesting (I do), but what shall we call the thread? /usr/xpg4/bin/awk vs. nawk on Solaris? I thought in post#8 you meant on Solaris /usr/xpg4/bin/awk is an old version of nawk , i.e. the current version on Solaris. And my point was/is that nawk on Solaris is not as compliant as /usr/xpg4/bin/awk and therefore the latter is preferable to nawk on Solaris.

But on rereading you seem to be referring to a recent version of nawk on different systems. But in many other systems nawk is either non-existing or a link to gawk or mawk and on yet others awk is nawk (or bwk).

Yes, Kernighan is the author of nawk, but length() operating on an array is an added feature and is not part of the Posix specification (and unnecessary).

Shirishlnx · February 15, 2012, 3:14am

Sure .. Have to be ...

But was not aware about "As IGNORECASE only works in gnu awk." !! Thankx..

Have checked this is working fine on hp-ux/solaris/aix and on various Linux flavours .. (Suse/Redhat/CentOS) ... and on sh/ksh shell too ...

--Shirish

ctsgnb · February 15, 2012, 4:33am

You can use tolower() or toupper() function when testing the matching so in fact this will result just like an ignorecased comparison.
(see examples given post #9)