extract column with awk

xshang · September 24, 2012, 12:19pm

This is a example TXT file:

ID CDR1 CDR2 CDR3
1 CDR1   CDR3
2 CDR1 CDR2 CDR3
3     CDR3

What I want to do is extract the fourth column with awk, however, there are some blank space in some field. I get wrong result when it print out awk result.

awk '{print $1,$4}' awk_column_test.txt

How will i deal with these blank spaces?

If I want print 'NULL' when the field is blank, how will I should do?

Thank you very much!

in2nix4life · September 24, 2012, 12:38pm

cat file

ID CDR1 CDR2 CDR3
1 CDR1   CDR3
2 CDR1 CDR2 CDR3
3     CDR3

awk '$4 == ""{$4="NULL"}{print $4}' file

CDR3
NULL
CDR3
NULL

Corona688 · September 24, 2012, 12:46pm

What separator is used in your text file? Tab, space?

You might be able to tell it to use a single space or tab as the separator rather than globbing all whitespace...

awk -F"\t" '{ print $1,$4 }'
awk -F" " '{ print $1,$4 }'

tukuyomi · September 24, 2012, 12:47pm

Try this

~/unix.com$ awk -F\| 'gsub(/ /,"|")&&$4=$4?$4:"NULL"{print $1,$4}' f

The above script is overcomplicated, use Corona688's or in2nix4life's suggestion instead

xshang · September 24, 2012, 1:08pm

Sorry, I can't express clearly what I mean.

The txt file could be understand like this:

ID CDR1 CDR2 CDR3
1 CDR1 NULL CDR3
2 CDR1 CDR2 CDR3
3 NULL NULL CDR3

The results I want to get should be four "CDR3". "NULL" here represents blank space, so maybe I should deal with the original file?

Thanks a lot!

---------- Post updated at 12:08 PM ---------- Previous update was at 11:50 AM ----------

Sorry, I can't express clearly what I mean. I use space as the separator.

The txt file could be understand like this:

ID CDR1 CDR2 CDR3
1 CDR1 NULL CDR3
2 CDR1 CDR2 CDR3
3 NULL NULL CDR3

The results I want to get should be four "CDR3". "NULL" here represents blank space, so maybe I should deal with the original file?

Thanks a lot!

corona688:

What separator is used in your text file? Tab, space?

You might be able to tell it to use a single space or tab as the separator rather than globbing all whitespace...
awk -F"\t" '{ print $1,$4 }'
awk -F" " '{ print $1,$4 }'

Corona688 · September 24, 2012, 1:13pm

Try -F"[ ]" if you're on Linux

tukuyomi · September 24, 2012, 2:02pm

From your sample code, it's impossible to find CDR3 on second and fourth line (ID 1 & 3). Is there any [tab] and [spaces] mixed?

beta17 · September 24, 2012, 2:53pm

maybe this helps you:

[root@sandbox tmp]# awk '{print $NF}' test
CDR3
CDR3
CDR3
CDR3

$NF = last position

Don_Cragun · September 24, 2012, 3:56pm

This input file is worse than any of you seem to have imagined. This file uses <space> as the field separator and uses <space> as the contents of an "empty" field. Since the field separator is also field contents in some cases, there is no obvious way to specify an ERE that matches just the field separators. The following awk script converts the spaces that are data to the string "NULL" and leaves the spaces that are separators alone. It seem to do what was requested. (Note that the 2nd printf statement just shows how the input is modified along with the desired output. Remove it to just get the desired output.)

#!/bin/ksh
awk ' { gsub("  ", " NULL")
        printf("%s %s\n", $1, $4)
        printf("%s %s from %s\n", $1, $4, $0)
}' input