Hi All,
I have a source file with data
Name ~ Groups
Muni~abc,was,USA_ax,123
Chaitanya~USA_12,was
Balaji~123,xyz,was
Ramu~123,xyz
From the second column i want to extract only the groups that matches the pattern 'USA_%' or if the group = 'was', and ignore any other columns.
Expected output * **
Name | Groups
Muni|was,USA_ax
Chaitanya|USA_12,was
Balaji|was
Ramu|
How can I do this in UNIX
Yoda
June 16, 2013, 11:49pm
2
What have you tried so far? Share your thoughts to solve this problem.
I am not have any idea about this .Thinking how can I do this by grep
Yoda
June 17, 2013, 12:30am
4
grep is definitely not an option as I see your expected output has different field separator.
I would suggest using awk instead:
awk -F'[~,]' '
NR == 1 {
sub ( /~/, "|" )
}
NR > 1 {
s = $1 "|"
for( i = 2; i <= NF; i++ )
{
if( $i == "was" || $i ~ /USA.*/ )
s = s $i OFS
}
sub( /,$/, X, s )
$0 = s
}
1
' OFS=, file
It cannot be done with grep, because the input needs to be transformed..
great logic. Thanks a lot
---------- Post updated at 12:50 PM ---------- Previous update was at 12:26 AM ----------
i hope for removing last character "," you have used this
sub( /,$/, X, s )
what is the purpose of 1 here .just curious to know the logic .Is it for printing ?
NR > 1 {
s = $1 "|"
for( i = 2; i <= NF; i++ )
{
if( $i == "was" || $i ~ /USA.*/ )
s = s $i OFS
}
sub( /,$/, X, s )
$0 = s
}
1
Yes, it is for printing. It's a logical expression that determines whether a line gets printed or not. It could be any expression technically, but a '1' makes it always print.
ok .What kind of expression we have ? Do you have any link for these kind of pure awk programming .
The kind where zero or a blank string is false, and anything else is true.
You could have (NR != 5), to avoid printing the 5th line (since NR means number of records).
You could have $1, to avoid printing blank lines, since $1 means the first column, and if it's blank you don't want to print it.
And so forth.