Question on assigning array elements to a regular variable

Environment: Bash shell on RHEL 8.4

This is bit of a newbie question.

This post is a based on another post in this forum 4 days back.

Orignal poster (OP) Niklaus was saying his for loop wasn't working well. .

MIG found out the root cause of the issue. Its because of the way array elements from one array (fruits) was copied to another array (fruit_array)

OP was wrongly using

fruit_array=( ${fruits} )

The correct way to do this (copy array elements from one array to another array) is like below

fruit_array=( "${fruits[@]}" )

This got me thinking. Because, I have an important shell script which is in production for more than an year now. It is something like below (array_print.sh). I am modifying to match Niklaus's example.

I had to convert the elements in array1 to upper case. Because I didn't know how to do this on the fly, I had to assign array1 to another variable named employee_list as seen below.

And then, on multiple occasions (in functions actually), I had to re-assign (copy) the array elements in employee_list to another array called emp_array using the following way. Kind of the same mistake which Niklaus made.

emp_array=( ${employee_list} )

But, my code is working fine (iterates correctly) with emp_array=( ${employee_list} ).

In fact, if I use emp_array=( "${employee_list[@]}" ) as shown in Variant2 below, the loop won't work well.

I think this is because employee_list variable is not a real array or something. Right ?

My question:

Is emp_array=( ${employee_list} ) assignment seen in Variant1 (currently in production) error prone ? Currently, its working fine though.

--## Variant1 (my current working production version looks like this)
-- ## using emp_array=( ${employee_list} )

$ cat array_print.sh
#!/bin/bash

CUST_DIR=/some/long/path/Linux

printf "Enter the alpha-numeric Employee ID.\nIf there are multiple Employees to process, separate the IDs using commas : "
IFS="," read -a array1
unset IFS

## converting array elements to Upper case
export employee_list=$(echo "${array1[@]}" | tr '[a-z]' '[A-Z]')

### Is the below assignment error prone ?
emp_array=( ${employee_list} )

for ((idx=0; idx < ${#emp_array[*]}; idx +=1 ))

do
    echo @$CUST_DIR/${emp_array[$idx]}\/june/process_junePayment_${emp_array[$idx]}.sql
done


-- Executing array_print.sh. It works fine. 

$ ./array_print.sh
Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : JOHN, Keith, Steven
@/some/long/path/Linux/JOHN/june/process_junePayment_JOHN.sql
@/some/long/path/Linux/KEITH/june/process_junePayment_KEITH.sql
@/some/long/path/Linux/STEVEN/june/process_junePayment_STEVEN.sql
 $

---### Variant2
---## Now, using emp_array=( "${employee_list[@]}" )

$ cat array_print.sh
#!/bin/bash

CUST_DIR=/some/long/path/Linux

printf "Enter the alpha-numeric Employee ID.\nIf there are multiple Employees to process, separate the IDs using commas : "
IFS="," read -a array1
unset IFS

## converting array elements to Upper case
export employee_list=$(echo "${array1[@]}" | tr '[a-z]' '[A-Z]')


emp_array=( "${employee_list[@]}" )

for ((idx=0; idx < ${#emp_array[*]}; idx +=1 ))

do
    echo @$CUST_DIR/${emp_array[$idx]}\/june/process_junePayment_${emp_array[$idx]}.sql
done


## The loop doesn't iterate correctly. In fact, it iterates only once but prints the array elements weirdly like below.

$ ./array_print.sh
Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : JOHN, KEITH, STEVEN
@/some/long/path/Linux/JOHN KEITH STEVEN/june/process_junePayment_JOHN KEITH STEVEN.sql

First of all:

IFS="," read -a array1
unset IFS

Unsetting IFS might cause unwanted behavior (actually I have no experience with it).
If the goal is to undo a previous IFS= then this is

  1. wrong: the original IFS consists of 3 characters: space tab newline.
  2. unnecessary: the previous assignment is prefixed to read i.e. is only valid for the read command.

--

employee_list=$( )
is an assignment of a plain string to a normal variable. Some information is lost. E.g. 3 array elements
"a" "b c" "d" would be joined to "a b c d".
The variable is successfully exported to environment: the Unix environment consists of strings (not objects). Linux seems to ignore an export attempt of an array.

If the environment (available to all spawned commands) is not needed then let employee_list be an array!
You can use a loop for modification of each element:

employee_list=( )
for v in "${array1[@]}"
do
  employee_list+=( "$(echo "$v" | tr '[a-z]' '[A-Z]')" )
done

bash has a variable modifier for this operation

employee_list=( )
for v in "${array1[@]}"
do
  employee_list+=( "${v^^}" )
done

The arr+=( ) adds an element to an array.
Other var+=string adds a string to a normal variable.

A variable modifier can be applied on the whole array in one go

employee_list=( "${array1[@]^^}" )

A typeset will convert everything that is assigned to uppercase. A modifier is not needed:

typeset -u employee_list=( "${array1[@]}" )

BTW whenever you want to split the array to its members then you need @ not *
A small exercise:

printf "%s\n" "a" "b"
printf "%s\n" "a b"
printf "%s\n" "${array1[@]}"
printf "%s\n" "${array1[*]}"

You cannot differentiate between these with "echo".

The "quotes" allow $-substitution but protect against word splitting and filename generation.
"${var} $( ) $(( ))" are substituted but not further expanded.
"${array1[*]}" yields one composed and protected string.
"${array1[@]}" splits to array members that are then protected.

A minor note: the args to tr are character sets, and do not use the same character-class range construct as regular expressions.

$ echo 'My Aunt Sally' | tr 'a-z' 'A-Z'
MY AUNT SALLY
$ echo '[My] [Aunt] [Sally]' | tr '[a-z]' '>A-Z<'
>MY< >AUNT< >SALLY<

But as @MadeInGermany points out, shell substitution modifiers are faster and simpler than external processes, for any reasonably sized data (i.e. anything that you do not loop a read for in shell).

True, the extra [ ] are not needed and misleading. But if they map to the same [ ] then there is no harm.

In this case, no difference because the strings are of equal length and identical [ and ].

But most of the other troptions (--complement, --delete, --squeeze-repeats, --truncate-set1), and the thirteen actual character-class args like [:alnum:], and the rule "SET2 is extended to length of SET1 by repeating its last character as necessary", are all accidents waiting to happen if you bracket either of the sets with spurious characters.

tr has accumulated two entirely different styles of option over the years, to the extent that it has become hard to predict what the man page means. Caveat Emptor.

For example:

$ echo 'abcdefghijklmnopqrstuvwxyz' | tr '[a-z]' '12345[Z*012]K'
2345ZZZZZZZZZZKKKKKKKKKKKK

[ maps to 1, so abcd map to 2345. The next ten characters from the range a-z map to Z (because the 012 is Octal). The other twelve characters from the range a-z all map to K (because set2 is expanded to the same length as set1 by repeating the last character). The other 102 ASCII (or possibly 230 8-bit) characters are unchanged. And I don't even want to think about UTF-8, as neither the man page nor the info page address that.

I was probably mistaken to start my previous comment with "A minor note:".

Yes, the simple variable is assigned to an array. It is unquoted because you want it to split into words and assign each word to an array member. But an embedded space will split it, too.
Also it will try filename generation (e.g. a * will be expanded to all filenames in the current directory).
Example with an embedded space:

Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : a,b c,d
@/some/long/path/Linux/A/june/process_junePayment_A.sql
@/some/long/path/Linux/B/june/process_junePayment_B.sql
@/some/long/path/Linux/C/june/process_junePayment_C.sql
@/some/long/path/Linux/D/june/process_junePayment_D.sql

A safe and simple code follows:

#!/bin/bash

CUST_DIR=/some/long/path/Linux

printf "Enter the alpha-numeric Employee ID.\nIf there are multiple Employees to process, separate the IDs using commas : "

## Convert assigned values to uppercase
typeset -u employee_list

## read -a makes it an array
IFS="," read -a employee_list

## Looping over array values (simpler than looping over the indexes)
for e in "${employee_list[@]}"
do
  echo "@$CUST_DIR/${e}/june/process_junePayment_${e}.sql"
done

A criticical input and output:

Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : a,b c,*
@/some/long/path/Linux/A/june/process_junePayment_A.sql
@/some/long/path/Linux/B C/june/process_junePayment_B C.sql
@/some/long/path/Linux/*/june/process_junePayment_*.sql

The embedded space and the asterisk are preserved.

Thank You very much MIG, Paul

Two questions:

  1. About case conversion:
    typeset cannot seem to convert mixed case to upper case. Googling about typeset and mixed-case didn't turn up anything.
    In the below code, alice (all lower case) also didn't get converted to upper case. Same for "Second execution" shown below.
    Or, am I missing something here ?
    So, I may have to stick to using the tr function thing way.

  2. Any way to remove leading and trailing spaces of elements in an array?

Is there any way I can remove leading and trailing spaces of elements in an array 'on the fly' ie. without assigning the trimmed version to another array ?
As you can see below, the leading space for Frank, Scott and alice caused the generated filenames to have an empty space after underscore.

$ cat array_print2.sh
#!/bin/bash

CUST_DIR=/some/long/path/Linux

printf "Enter the alpha-numeric Employee ID.\nIf there are multiple Employees to process, separate the IDs using commas : "

## Convert assigned values to uppercase
typeset -u employee_list

## read -a makes it an array
IFS="," read -a employee_list

## Looping over array values (simpler than looping over the indexes)
for e in "${employee_list[@]}"
do
  echo "@$CUST_DIR/${e}/june/process_junePayment_${e}.sql"
done
$

---### First execution

$ ./array_print2.sh
Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : John, alice, Frank, Scott
@/some/long/path/Linux/John/june/process_junePayment_John.sql
@/some/long/path/Linux/ alice/june/process_junePayment_ alice.sql
@/some/long/path/Linux/ Frank/june/process_junePayment_ Frank.sql
@/some/long/path/Linux/ Scott/june/process_junePayment_ Scott.sql
$

--## Second execution

$ ./array_print2.sh
Enter the alpha-numeric Employee ID.
If there are multiple Employees to process, separate the IDs using commas : john,alice,henry
@/some/long/path/Linux/john/june/process_junePayment_john.sql
@/some/long/path/Linux/alice/june/process_junePayment_alice.sql
@/some/long/path/Linux/henry/june/process_junePayment_henry.sql
$

aside from what others will respond, recommend you do some basic reading of the bash documentation, you'll find details about typeset and a myriad of other useful information.

additionally, man builtins will provide details on typeset ....

Looks like typeset only operates during assignments. This works (Bash 4.4.20)

typeset -u employee_list="${employee_list}"

Found it ! In Bash, typeset is a synonym for declare. declare -u says "When the variable is assigned a value ...". It is all in there somewhere, but often not entirely obvious.

After 40+ years in various shells, I still keep the Bash Reference Manual open most of the time. Today, I struggled with readarray for about 20 minutes because I forgot its -t option.

I absolutely agree with @munkeHoller that reading the Bash Ref is always excellent, but somewhat daunting as it is (IIRC) 160+ pages. I generally use the contents and index to find specific solutions, and then browse through the rest of that section to get a more rounded view of the subject.

There are some dodges that are really hard to come up with. For example, trimming leading and trailing whitespace from a variable takes two separate substitutions, and the patterns you need are fairly obscure. But it happens the read built-in strips that whitespace, and you can pass a variable using a thing called a Here-String.

So the neat (and idiomatic) way to remove leading and trailing spaces, and uppercase the text, all at once, is actually: read myVar <<<"${myVar^^}".

perhaps use ${var^^} where var needs to be UPPERCASED :slight_smile: ...

  1. The typeset -u should work! It must be before any assignment to it. And the read is an assignment. Perhaps a bug in your bash version?
  2. A leading or trailing space can be removed by another variable modfier.

Try the following, that uses typeset -u on a simple variable emp that gets assignments from for and =

#!/bin/bash
 
CUST_DIR=/some/long/path/Linux
 
printf "Enter the alpha-numeric Employee ID.\nIf there are multiple Employees to process, separate the IDs using commas : "

## read -a makes it an array
IFS="," read -a employee_list

## Convert assigned values to uppercase
typeset -u emp

## Looping over array values (simpler than looping over the indexes)
for emp in "${employee_list[@]}"
do
  # In case the typeset -u does not work:
  # emp=${emp^^}
  # Delete a leading space 
  emp=${emp# }
  # Delete a trailing space
  emp=${emp% }
  # Substitute remaining space by dash
  emp=${emp// /-}
  echo "@$CUST_DIR/${emp}/june/process_junePayment_${emp}.sql"
done