Reading from file bash command

Hello, I have a file in the following format

id sample platform R1 R2 gene1 gene2 gene3
1       abc     llumina       R1_001.fastq.gz      R2_001.fastq.gz   apoe    prnpp    asp
2       def     llumina       R1_001.fastq.gz      R2_001.fastq.gz   apoe    prnpp    
3       ghi     llumina       R1_001.fastq.gz      R2_001.fastq.gz   apoe    

The first 6 columns are always filled, the last two columns are only for some entries. So I have a main script that reads this file to process these ids further. I generally use

while id sample platform R1 R2 gene1
do
<mainscript.sh>
done < samples.txt

But how can I include the last two columns as well to run the command as many entries under the variable gene2 and gene3 are empty.

This is what i have tried so far to read all the lines in the file and if variable under gene2 and/or gene3 are empty, to move to the next line

while read  id sample platform R1 R2 gene1 gene2 gene3
do
echo "$id $sample $gene1 $gene2" | {
        read  id sample platform R1 R2 gene1 gene2
        [ -z "$gene2" ] && continue
        for i in $gene2; do eval echo "\"$i\""; done
 }
 
 done < samples.txt

I could do it for gene2 but wasnt able to implement it for gene3

any suggestions would be helpful

thank you

This statement means that if there is no gene2 then there cannot be a gene3 read:

 [ -z "$gene2" ] && continue

When you use the read statement you have to have all fields there in order to get gene3.

Otherwise gene3 is an "empty" variable. Or the same can happen for gene2.

You can create dummy values for those variables if they are empty, but since you have a separate script processing things that could affect the output of that script.

Try creating an intermediate file with dummy variables, let's use the word 'dummy' for the values of empties. This is awk, the output file will be called tmp.tmp which is what your script will have to use for input.

This could also be set up writing to a pipe which your existing script reads.

awk '{
        # check to see what type of line we have, 
        # if  col 1 is a number then we have to play the dummy game
        if( int($1) !=0)
        {
            if(NF==6) { $7="dummy"; $8="dummy" }
            if(NF==7) { $8="dummy" }
         }
        print $0 
       }'  inputfile > tmp.tmp

So the last 2 fields in tmp.tmp can be a real value, or just fiber filler: "dummy", but there will always be 8 fields. The last two fields cannot be blank or bash will not read them into a variable.

If you want to add a default/dummy within a bash script, you could do this:-

my_var="${my_var:-default}"

This says "assign the value of $my_var to itself, or the string default if it is empty or not set." You can have another variable instead of the string default if you want to.

Does that help?

Robin

I do not see the problem.?
Read enough fields, because joining is easier than splitting.
An exercise for demonstration

while read  id sample platform R1 R2 gene1 gene2 gene3 junk
do
  echo "----"
  echo "process non-empty genes"
  for i in $gene1 $gene2 $gene3; do echo "$i"; done
  echo "process all 3 genes"
  for i in "$gene1" "$gene2" "$gene3"; do echo "$i"; done
done < samples.txt