Awk: What is the difference between: X[a,b,c] - X[a][b,c] - X[a][b][c]

I have awk appearing to behave inconsistently. With the same variable it will give the message:

fatal: attempt to use array `X["2"]' in a scalar context

and, if I try to correct that, then:

fatal: attempt to use a scalar value as array

I'm using a three dimensional array. There seems to be a significant difference to how awk - GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2) - treats the following:

X[a,b,c]
X[a][b,c]
X[a][c]

What is the difference between these? Where can I find it documented?

To demonstrate, with a small piece of code:

BEGIN {
        X[1,"Jim","The quick brown fox jumps over the lazy dogs"]++;
        Y[1]["Jim","The quick brown fox jumps over the lazy dogs"]++;
        Z[1]["Jim"]["The quick brown fox jumps over the lazy dogs"]++;
        X[2,"Jack","The quick brown foxs jumps over the lazy dogs"]++;
        Y[2]["Jack","The quick brown foxs jumps over the lazy dogs"]++;
        Z[2]["Jack"]["The quick brown foxs jumps over the lazy dogs"]++;
        X[3,"Joe","The quick brown foxs jumps over the lazy dogs"]++;
        Y[3]["Joe","The quick brown foxs jumps over the lazy dogs"]++;
        Z[3]["Joe"]["The quick brown foxs jumps over the lazy dogs"]++;
      }
END {
for ( x in X )
        print x;
for ( y in Y )
        print y;
for ( z in Z )
        {
        for ( a in Z[z] )
                print z " : " a  # -> gives the error 'awk: demo.awk:20: fatal: attempt to use array `Z["1"]["Jim"]' in a scalar context' ":" Z[z][a];
        }
}

$ awk -f demo.awk </dev/null

1JimThe quick brown fox jumps over the lazy dogs
2JackThe quick brown foxs jumps over the lazy dogs
3JoeThe quick brown foxs jumps over the lazy dogs
1
2
3
1 : Jim
2 : Jack
3 : Joe

As the code says, in the last example, if I try

Z[z][a]

which seems the correct syntax, it gives the error

fatal: attempt to use array `Z["1"]["Jim"]' in a scalar context

.

split(Y[1],y_bits,SUBSEP);

gives:

awk: demo.awk:14: fatal: attempt to use array `Y["1"]' in a scalar context

The same error occurs for

split(Z[1],z_bits,SUBSEP);

but

split(X[1],x_bits,SUBSEP);

works.

Welcome to the forum.

None of the awk version I know provides the array constructs you present as the second and third version. man awk :

So - I'm surprised you don't get a error msg like I do for your script:

awk: line 4: syntax error at or near [

Awk, in this case, particularly, gawk, has supported multidimensional arrays for a long time. On linux, 'awk' is usually 'gawk', so the distinction isn't important.

You can see one manual entry here:

https://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html

It seems there are two mechanisms. In the case of:

X[a,b]

awk (gawk) it uses a separator character, known as 'SUBSEP'. So you can get 'a' and 'b' above by:

split(X,x_bits,SUBSEP);

gawk also provides 'true' arrays. What I'm trying to understand is the difference between these. The manual page is:

   Arrays
       Arrays  are  subscripted  with an expression between square brackets ([ and ]).  If the expression is an expression list (expr, expr ...)  then the array subscript is a string con
       sisting of the concatenation of the (string) value of each expression, separated by the value of the SUBSEP variable.  This  facility  is  used  to  simulate  multiply  dimensioned
       arrays.  For example:

              i = "A"; j = "B"; k = "C"
              x[i, j, k] = "hello, world\n"

       assigns the string "hello, world\n" to the element of the array x which is indexed by the string "A\034B\034C".  All arrays in AWK are associative, i.e., indexed by string values.

       The special operator in may be used to test if an array has an index consisting of a particular value:

              if (val in array)
                   print array[val]

       If the array has multiple subscripts, use (i, j) in array.

       The in construct may also be used in a for loop to iterate over all the elements of an array.  However, the (i, j) in array construct only works in tests, not in for loops.

       An  element  may be deleted from an array using the delete statement.  The delete statement may also be used to delete the entire contents of an array, just by specifying the array
       name without a subscript.

       gawk supports true multidimensional arrays. It does not require that such arrays be ``rectangular'' as in C or C++.  For example:

              a[1] = 5
              a[2][1] = 6
              a[2][2] = 7

       NOTE: You may need to tell gawk that an array element is really a subarray in order to use it where gawk expects an array (such as in the second argument to split()).  You  can  do
       this by creating an element in the subarray and then deleting it with the delete statement.

The version is:

$ awk -V
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)

---------- Post updated at 01:37 PM ---------- Previous update was at 01:06 PM ----------

OK, I think I've worked out the answer.

The two types of array are entirely different and incompatible. You cannot use split if you define the array as X[a][b][c].

It does allow you to mix the two types, as in X[a][b,c], as I did, but it is a really bad idea to do this as it seems to confuse the interpreter - this was my problem.

Here is some working code that sets up, then lists, the elements in the three dimensional array correctly - no use of split and it's OK.

BEGIN {
        Z[1]["Jim"]["The quick brown fox jumps over the lazy dogs"]=42;
        Z[1]["Harry"]["Colorless green ideas sleep furiously"]=41;
        Z[2]["Jack"]["The quick brown foxes jumps over the lazy dogs"]=40;
        Z[2]["Harry"]["Colorless green ideas sleep furiously"]=39;
        Z[3]["Joe"]["The quick brown foxes jumps over the lazy dog"]=38;
        Z[4]["James"]["The quick brown fox jumps over a lazy dog"]=37;
        Z[5]["Jimmy"]["The quick brown fox jumps over the lazy dog again"]=36;
      }
END {
print "Z[a,b,c]";
for ( a in Z )
        for ( b in Z[a] )
                for ( c in Z[a] )
                        print "a of Z[a]: " a "\t| b of Z[a] :\t" b  "\t| c of Z[a][c]:\t" c " |\t\tvalue of Z[a][c]:\t" Z[a][c];
}

Output:

$ awk -f demo.awk </dev/null
Z[a,b,c]
a of Z[a]: 1	| b of Z[a] :	Harry	| c of Z[a][c]:	Colorless green ideas sleep furiously |		value of Z[a][c]:	41
a of Z[a]: 1	| b of Z[a] :	Jim	| c of Z[a][c]:	The quick brown fox jumps over the lazy dogs |		value of Z[a][c]:	42
a of Z[a]: 2	| b of Z[a] :	Harry	| c of Z[a][c]:	Colorless green ideas sleep furiously |		value of Z[a][c]:	39
a of Z[a]: 2	| b of Z[a] :	Jack	| c of Z[a][c]:	The quick brown foxes jumps over the lazy dogs |		value of Z[a][c]:	40
a of Z[a]: 3	| b of Z[a] :	Joe	| c of Z[a][c]:	The quick brown foxes jumps over the lazy dog |		value of Z[a][c]:	38
a of Z[a]: 4	| b of Z[a] :	James	| c of Z[a][c]:	The quick brown fox jumps over a lazy dog |		value of Z[a][c]:	37
a of Z[a]: 5	| b of Z[a] :	Jimmy	| c of Z[a][c]:	The quick brown fox jumps over the lazy dog again |		value of Z[a][c]:	36
1 Like