I want to count the unique number of each string in the first and second columns based on the numbers of third column (or fourth and fifth column). I used the following program.
for i in f1/*.txt;
do
awk '
BEGIN {
path=sprintf("%s", "/home/gomez/Desktop/f2/")
}
!s[1":"$1":"$3]++{sU[$1]++;tot++}
!s[2":"$2":"$3]++{sU[$2]++;tot++}
END {
sub(/.*\//,"",FILENAME)
for (x in sU)
print x, sU[x] > path FILENAME;
print "Total No -",tot > path FILENAME;
}' $i;
done
Welcome to the forum gomez,
Could you please describe your logic. I am confused.
bbb - 26, having two entries, but you expecting one.
When you say count based on third column, please specify the meaning of first and second number (separated by hyphen) and how they are being used to count.
yes. bbb 26 has two entries. But I need to print only unique occurrences. In the same way, aab 263 has 3 entries. I need to count it as one. Hope you can understand my logic.
Please look carefully, on your input file and desired output and make sure you are not missing anything. I also see you edited the data from the original.
I need to count the strings in the first and second column even if the strings are same. I have 2 bga in the output because one bga is from the first column, its value is 230 and the other is from the second column, its value is 232. Like this, aab and abb has counted separately. aab is from the first column and abb is from the second column.
awk '
{
v = $1 FS $3 FS $4
if ( ! ( v in A ) )
A[v]++
v = $2 FS $3 FS $5
if ( ! ( v in A ) )
A[v]++
}
END {
for ( k in A )
{
n = split ( k, T )
R[T[1] FS T[n]]++
}
for ( k in R )
{
print k, R[k]
c += R[k]
}
print "Total No -", c
}
' file
Thank you for your answer. I tried to print the results of each file in to another directory f2 with your code. The results of each file are not printing to f2
for i in f1/*.txt;
do
awk '
BEGIN {
path=sprintf("%s", "/home/gomez/Desktop/f2/")
}
{
v = $1 FS $3 FS $4
if ( ! ( v in A ) )
A[v]++
v = $2 FS $3 FS $5
if ( ! ( v in A ) )
A[v]++
}
END {
for ( k in A )
{
n = split ( k, T )
R[T[1] FS T[n]]++
}
for ( k in R )
{
print k, R[k]
c += R[k]
}
print "Total No -", c
}
' $i;
done
---------- Post updated at 04:22 PM ---------- Previous update was at 04:17 PM ----------
Hi Ahamed,
Thank you for your answer. I tried to print the results of each file in to another directory f2 with your code. The results of each file are not printing to f2.
for i in f1/*.txt;
do
awk '
BEGIN {
path=sprintf("%s", "/home/gomez/Desktop/f2/")
}
!($1$2$3 in data){
data[$1$2$3]++
b[$1":"$4]++
b[$2":"$5]++
}
END{
for(i in b){
split(i,a,/:/)
print a[1],a[2],b
s+=b
}
print "Total No - " s
}' $i;
done