Extract values of duplicate keys

Viernes · January 15, 2013, 12:21am

I have two questions that are related, so it would be great if you can help me with both!

Question1:
I have a file A that looks like this:

a x
b y
b z
c w

I want to get something like:

a x
b y; z
c w

Given that a,b,c has no spaces. But the other letters might contain spaces.

Question2:
Next, I have a file B that has

x
y
q

And I want to compare it with subset of file A:

x
y; z
w

So that I count how many lines of B are subset of A. In this case it is 2.

bartus11 · January 15, 2013, 4:12am

Answer1:

awk '{x=$0;sub("^[^ ]+ ","",x);a[$1]=(a[$1])?a[$1]"; "x:x}END{for (i in a) print i,a}' fileA

Answer2:

awk '{x=$0;sub("^[^ ]+ ","",x);a[$1]=(a[$1])?a[$1]"; "x:x}END{for (i in a) print i,a}' fileA | cut -d" " -f2- | grep -cf fileB -

Viernes · January 15, 2013, 6:53am

I ran this on files foo and foo2

cat > foo
x
y; z
w
cat > foo2 
x
y
q

Here's what I got:

awk '{x=$0;sub("^[^ ]+ ","",x);a[$1]=(a[$1])?a[$1]"; "x:x}END{for (i in a) print i,a}' foo | cut -d" " -f2- | grep -cf foo2 -
1

---------- Post updated at 02:53 PM ---------- Previous update was at 02:52 PM ----------

Is there a way to get 2?
Since we have x and y matches?