is it possible to count the number of keys based on state and cell and output it as a simple matrix.
Ex: cell1-state1 has 2 keys
cell3-state1 has 4 keys.
Note: Insert 0 if no data available.
input
key states cell
key1 state1 cell1
key1 state2 cell1
key1 state3 cell1
key1 state4 cell1
key1 state5 cell1
key1 state6 cell1
key1 state7 cell1
key1 state8 cell1
key1 state9 cell1
key1 state10 cell1
key1 state11 cell1
key1 state12 cell1
key1 state13 cell1
key1 state14 cell1
key1 state15 cell1
key2 state1 cell1
key2 state2 cell1
key2 state3 cell1
key2 state4 cell1
key2 state5 cell1
key2 state6 cell1
key2 state7 cell1
key2 state8 cell1
key2 state9 cell1
key2 state10 cell1
key2 state11 cell1
key2 state12 cell1
key2 state13 cell1
key2 state14 cell1
key2 state15 cell1
key1 state1 cell2
key1 state2 cell2
key1 state3 cell2
key1 state4 cell2
key1 state5 cell2
key1 state6 cell2
key1 state7 cell2
key1 state8 cell2
key1 state9 cell2
key1 state10 cell2
key1 state11 cell2
key1 state12 cell2
key1 state13 cell2
key1 state14 cell2
key1 state15 cell2
key1 state1 cell3
key1 state14 cell3
key2 state1 cell3
key3 state1 cell3
key4 state1 cell3
output
cell state1 state2 state3 state4 state5 state6 state7 state8 state9 state9 state10 state11 state12 state13 state14 state15
cell1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
cell2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
cell3 4 0 0 0 0 0 0 0 0 0 0 1 0 1 0
I think it is. are the values of column 2 & 3 static values you know about?
col2 has state1 to state15
col3 has cell1-cell9
and
col1 has many keys 1- several thousands
maybe you can create a loop and concatenate the value of col2&3 to a name. this name can be used as variable and set.
for example:
1st row -> found key1, state3, cell5 -> $state3cell5 = "_key1"
before adding the key to the var check if the key(substring) already exists.
after the loop you can count in each variable matches of "/_key\d{1,3}/".
Hi,
Try this one,
awk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " ";}else{print " "a[l]" ";}}print "\n";}}' file
My Output is,
state7 state8 state9 state10 state11 state12 state1 state13 state2 state14 state3 state15 state4 state5 state6
cell1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
cell2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
cell3 4 1
if you have gawk, you can sort the header with below code,
gawk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{asort(f1);ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " ";}else{print " "a[l]" ";}}print "\n";}}' file
i dont have gwk installed here, so i didn't used that.
Cheers,
Ranga:)
2 Likes
gawk is installed but not getting output
$ gawk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{asort(f1);ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " ";}else{print " "a[l]" ";}}print "\n";}}' rm2
state12 state7 state13 state8 state14 state9 state15 state1 state2 states state3 state4 state10 state5 state11 state6
4
1
2
3
but awk version is working great.
key1 state1 cell3
key1 state14 cell3
key2 state1 cell3
key3 state1 cell3
key4 state1 cell3
What exactly is it doing?
please refer to the previous post. I just updated it. Thanx
rangarasan:
Hi,
Try this one,
awk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " ";}else{print " "a[l]" ";}}print "\n";}}' file
My Output is,
state7 state8 state9 state10 state11 state12 state1 state13 state2 state14 state3 state15 state4 state5 state6
cell1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
cell2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
cell3 4 1
if you have gawk, you can sort the header with below code,
gawk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{asort(f1);asort(f2);asort(a);ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " ";}else{print " "a[l]" ";}}print "\n";}}' file
i dont have gwk installed here, so i didn't used that.
Cheers,
Ranga:)
Just make a try, i am not sure it will work perfectly since i dont have gawk.
your code is useful to count combination of col 2&3..
but doesn't count the occurrence of how many different keys for these pair of values are in the file?
but doesn't count the occurrence of how many different keys for these pair of values are in the file?
I didn't get your point clearly. we have counts for each combiantion right ?
ctsgnb
February 28, 2012, 5:34am
13
I didn't take care of the matrix format, but all the infos you need are in the output :
awk '{i=$3 FS $2}!a[$0]++{c++}END{for(i in c) print i,c}' yourfile
If you need the detail of which keys :
awk '{i=$3 FS $2}!a[$0]++{c++;x=d;d=(x?x FS:z) $1}END{for(i in c) print i,c,d}' yourfile
You can add a | sort
with the options of your choice to make it ordered by cells, or by states etc ...
yes. and we have different keys for combinations.
it was asked for the amount of different keys for each if I understood correctly..
awk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END
[snip...]
f1 is the different cell and f2 is different keys in the input.
Cheers,
Ranga:)
@Ranga : I think cell3 states are wrong in the output of your code.
quincyjones:
is it possible to count the number of keys based on state and cell and output it as a simple matrix.
Ex: cell1-state1 has 2 keys
cell3-state1 has 4 keys.
Note: Insert 0 if no data available.
input
key states cell
key1 state1 cell1
key1 state2 cell1
key1 state3 cell1
key1 state4 cell1
key1 state5 cell1
key1 state6 cell1
key1 state7 cell1
key1 state8 cell1
key1 state9 cell1
key1 state10 cell1
key1 state11 cell1
key1 state12 cell1
key1 state13 cell1
key1 state14 cell1
key1 state15 cell1
key2 state1 cell1
key2 state2 cell1
key2 state3 cell1
key2 state4 cell1
key2 state5 cell1
key2 state6 cell1
key2 state7 cell1
key2 state8 cell1
key2 state9 cell1
key2 state10 cell1
key2 state11 cell1
key2 state12 cell1
key2 state13 cell1
key2 state14 cell1
key2 state15 cell1
key1 state1 cell2
key1 state2 cell2
key1 state3 cell2
key1 state4 cell2
key1 state5 cell2
key1 state6 cell2
key1 state7 cell2
key1 state8 cell2
key1 state9 cell2
key1 state10 cell2
key1 state11 cell2
key1 state12 cell2
key1 state13 cell2
key1 state14 cell2
key1 state15 cell2
key1 state1 cell3
key1 state14 cell3
key2 state1 cell3
key3 state1 cell3
key4 state1 cell3
output
cell state1 state2 state3 state4 state5 state6 state7 state8 state9 state9 state10 state11 state12 state13 state14 state15
cell1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
cell2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
cell3 4 0 0 0 0 0 0 0 0 0 0 1 0 1 0
Your expected output is wrong, just check that we dont have key for state12 and cell3. is that typo error ?
key1 state1 cell3
key1 state14 cell3
key2 state1 cell3
key3 state1 cell3
key4 state1 cell3
You mean 0 if the key is not found,
Yeah i have missed that requirment.
Note: Insert 0 if no data available.
Check with the below code for that requirment.
awk '{f1[$3]++;f2[$2]++;a[$3"^"$2]++;}END{ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " 0 ";}else{print " "a[l]" ";}}print "\n";}}' Input_File
Cheers,
Ranga:)
1 Like
that's confusing to me. I just noticed you didn't used or checked for '$1' (col1 contains keys?)
how to not increase without check if same key for a combination occurred again?
Also can you add this minor change please. normalize the output matrix by dividing total no.of unique keys * 100 ? with a different command by piping
command1 input | command2
For ex: cell1-state1 (2/4)*100
f1[$3]++;f2[$2]++;
we are not using the above arrays count of particular keys. even we can use the below
f1[$3]=1;f2[$2]=1;
yeah you have got point that if i have same key exist more than once for the same combination of cell and state
key1 state1 cell3
key1 state14 cell3
key2 state1 cell3
key3 state1 cell3
key4 state1 cell3
key4 state1 cell3
The above code will display the count as 5 for cell3 and state1 but the actual count count is 4 bcos the same key is exist. but thats not in the input which we have too. if its the case, check with the below code,
awk '{
f1[$3]=1;f2[$2]=1;if(chk[$3"^"$2] != $1){a[$3"^"$2]++;chk[$3"^"$2]=$1;}
}END{ORS="";print " ";for(j in f2){print j" ";c++;if(c>5){print " ";}}print "\n";for(i in f1){print i;for(k in f2){l=i"^"k;if(a[l] == ""){print " 0 ";}else{print " "a[l]" ";}}print "\n";}}' Input_File
Cheers,
Ranga:)
---------- Post updated at 04:49 AM ---------- Previous update was at 04:44 AM ----------
quincyjones:
Also can you add this minor change please. normalize the output matrix by dividing total no.of unique keys * 100 ? with a different command by piping
command1 input | command2
For ex: cell1-state1 (2/4)*100
2 is count of keys.
4 is total no of keys.
right ?
where you need this output ?
Expected output format ?
Cheers,
Ranga