UNIX - requirement

morbid_angel · June 16, 2013, 11:21pm

Hi All,
I have a source file with data

Name ~ Groups
Muni~abc,was,USA_ax,123
Chaitanya~USA_12,was
Balaji~123,xyz,was
Ramu~123,xyz

From the second column i want to extract only the groups that matches the pattern 'USA_%' or if the group = 'was', and ignore any other columns.

Expected output ***

Name | Groups
Muni|was,USA_ax
Chaitanya|USA_12,was
Balaji|was
Ramu|

How can I do this in UNIX

Yoda · June 16, 2013, 11:49pm

What have you tried so far? Share your thoughts to solve this problem.

morbid_angel · June 17, 2013, 12:25am

I am not have any idea about this .Thinking how can I do this by grep

Yoda · June 17, 2013, 12:30am

grep is definitely not an option as I see your expected output has different field separator.

I would suggest using awk instead:

awk -F'[~,]' '
        NR == 1 {
                sub ( /~/, "|" )
        }
        NR > 1 {
                s = $1 "|"
                for( i = 2; i <= NF; i++ )
                {
                        if( $i == "was" || $i ~ /USA.*/ )
                                s = s $i OFS
                }
                sub( /,$/, X, s )
                $0 = s
        }
        1
' OFS=, file

Scrutinizer · June 17, 2013, 12:34am

It cannot be done with grep, because the input needs to be transformed..

morbid_angel · June 17, 2013, 1:50pm

great logic. Thanks a lot

---------- Post updated at 12:50 PM ---------- Previous update was at 12:26 AM ----------

i hope for removing last character "," you have used this

sub( /,$/, X, s )

what is the purpose of 1 here .just curious to know the logic .Is it for printing ?

  NR > 1 {
                s = $1 "|"
                for( i = 2; i <= NF; i++ )
                {
                        if( $i == "was" || $i ~ /USA.*/ )
                                s = s $i OFS
                }
                sub( /,$/, X, s )
                $0 = s
        }
        1

Corona688 · June 17, 2013, 1:57pm

Yes, it is for printing. It's a logical expression that determines whether a line gets printed or not. It could be any expression technically, but a '1' makes it always print.

morbid_angel · June 17, 2013, 2:33pm

ok .What kind of expression we have ? Do you have any link for these kind of pure awk programming .

Corona688 · June 17, 2013, 2:38pm

The kind where zero or a blank string is false, and anything else is true.

You could have (NR != 5), to avoid printing the 5th line (since NR means number of records).

You could have $1, to avoid printing blank lines, since $1 means the first column, and if it's blank you don't want to print it.

And so forth.