When I slightly modify the code suggested by Chubler_XL to be:
awk '
{
mc = NF > mc ? NF : mc
for(i=NF; i; i--) {
T[$i]
C[i FS $i]++
}
}
END {
printf "Value"
for(i=1; i<=mc;i++) printf "\tCOL%d",i
for(v in T) {
printf "\n%s", v
for(i=1; i<=mc;i++) printf "\t%d",C[i FS v]
}
printf "\n"
}' a2.txt
and store this in a file named Chubler_XL
, make it executable and run the command:
./Chubler_XL > Chubler_XL.out
and I slightly modify the code suggested by Yoda to be:
awk '
BEGIN {
n = split ( "./. 0/0 0/1 1/1", T )
}
{
for ( i = 1; i <= NF; i++ )
R[i FS $i] += 1
}
END {
printf "VAL\t"
for ( i = 1; i <= NF; i++ )
printf "COL%d\t", i
printf "\n"
for ( j = 1; j <= n; j++ )
{
printf "%s\t", T[j]
for ( i = 1; i <= NF; i++ )
printf "%d\t", R[i FS T[j]]
printf "\n"
}
}
' a2.txt
and store this in a file named Yoda
, make it executable and run the command:
./Yoda > Yoda.out
and I write the code:
awk -v line_count="$(wc -l < a2.txt)" '
function check() {
printf("Checking fields 2 through %d in file: %s\n", NF, f)
for(i = 2; i <= NF; i++)
if(c != line_count)
printf("file %s: field %d count %d\n", f, i, c)
split("", c)
}
FNR == 1 {
line_count += 0
if(f == "")
printf("Evaluating output produced from %d lines in a2.txt\n",
line_count)
else
check()
f = FILENAME
next
}
{ for(i = 2; i <= NF; i++)
c += $i
}
END { check()
}' *.out
and store that in a file named counter
, make it executable, and run it, I get the output:
Evaluating output produced from 83 lines in a2.txt
Checking fields 2 through 305 in file: Chubler_XL.out
Checking fields 2 through 305 in file: Yoda.out
which shows that the sums of the values for each of the 304 fields does indeed equal the number of lines found in the file you attached in post #6.
I see no indication that either of these suggestions is producing results that are incorrect although neither of them produce output that is at all close to the output you showed us in post #4. I do note that the output you showed us in post #4 only shows output for the three values "0/0", "0/1", and "1/1"; but the data in a2.txt
also includes some entries with the value "./." which is included in the output produced by the code Chubler_XL suggested and in the output produced by the code Yoda suggested (after changing it to look for those four values instead of the values, "A", "B", "C", and "D" that you said were included as values in your statements in post #1.
If you'd like to show us the code you used to produce the output for the 1st 26 columns you showed us in post #4, maybe we can help you explain why that code failed to correctly interpret the output produced by Chubler_XL's code or Yoda's code.