User input and run awk using the input

cmccabe · February 22, 2016, 4:58pm

I am trying to allow a user to enter in text and then store that text in a variable $gene to run in an awk command in which those values are used to run some calculations. I am getting syntax errors however, when I try. Thank you :).

The awk runs great if it is a pre-defined file that is used, but it could also be user input.

/home/cmccabe/Desktop/loop.sh: line 87: syntax error near unexpected token `for'
/home/cmccabe/Desktop/loop.sh: line 87: `for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do'

other() {
printf "\n\n"
printf "Please enter the gene(s) of interest, use a comma between multiple: "; IFS="," read -a gene
        printf "the indicated genes will now be loaded and used to calculate coverage\n"
        [ -z "$gene" ] && printf "\n No ID supplied. Leaving match function." && sleep 2 && return
        [ "$gene" = "end" ] && printf "\n Leaving match function." && sleep 2 && return
        for ((i=0; i<${#gene[@]}; i++))

logfile=/home/cmccabe/Desktop/NGS/API/2-12-2015/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do
     echo "Start custom panel creation: $(date) - File: $f"
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '
 NR == FNR {input[$0]; next}
 {
    split($5, a, "-")
    if (a[1] in input) {
         key = $4 OFS $5
         n[key]++
         sum[key] += $7
     }
 }
 END {
     for (key in n) 
         printf "%s %.1f\n", key, sum[key]/n[key]
 }
' /home/cmccabe/Desktop/panels/$gene $f | awk '{split($2,a,"-"); print a[1] "\t" $0}' | sort | cut -f2-> /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/${pref}_genescoverage.bed
      echo "End custom panel creation: $(date) - File: $f"
done >> "$logfile"
printf "coverage calculated and log created\n"
}

Don_Cragun · February 22, 2016, 5:29pm

Give us a little help here. Your script is failing on line 87. But the script you have shown us isn't nearly that long???

What operating system are you using?

What shell are you using?

Note, however, that the syntax for a shell for loop is different than the syntax for an awk for loop. Shell:

for ...
do      command...
done

awk :

for ...
        command

or:

for ... {
        command
        ...
}

cmccabe · February 22, 2016, 5:33pm

I am on Ununtu 14.04 using bash .
Thank you :).

full code

menu() {
    while true
    do
        printf "\n please make a selection from the MENU \n
        ==================================
        \t 1  Incidental Findings
        \t 2  CHARGE Syndrome
        \t 3  PFS Syndrome
        \t 4  Other
        ==================================\n\n"
        printf "\t Your choice: "; read menu_choice

        case "$menu_choice" in
        1) incidental ;;
        2) charge ;;
        3) pfs ;;
        4) other ;;
        *) printf "\n Invalid choice."; sleep 2 ;;
        esac
    done
}
echo "$menu_choice"

charge() {
printf "\n\n"
printf "the charge syndrome genes will now be loaded and used to calculate coverage\n"
logfile=/home/cmccabe/Desktop/NGS/API/2-12-2015/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do
     echo "Start custom panel creation: $(date) - File: $f"
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '
 NR == FNR {input[$0]; next}
 {
    split($5, a, "-")
    if (a[1] in input) {
         key = $4 OFS $5
         n[key]++
         sum[key] += $7
     }
 }
 END {
     for (key in n) 
         printf "%s %.1f\n", key, sum[key]/n[key]
 }
' /home/cmccabe/Desktop/panels/CHARGE_unix.bed $f | awk '{split($2,a,"-"); print a[1] "\t" $0}' | sort | cut -f2-> /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/${pref}_Chargecoverage.bed
      echo "End custom panel creation: $(date) - File: $f"
done >> "$logfile"
printf "coverage calculated and log created\n"
}
pfs() {
printf "\n\n"
printf "the pfs syndrome genes will now be loaded and used to calculate coverage\n"
logfile=/home/cmccabe/Desktop/NGS/API/2-12-2015/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do
     echo "Start custom panel creation: $(date) - File: $f"
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '
 NR == FNR {input[$0]; next}
 {
    split($5, a, "-")
    if (a[1] in input) {
         key = $4 OFS $5
         n[key]++
         sum[key] += $7
     }
 }
 END {
     for (key in n) 
         printf "%s %.1f\n", key, sum[key]/n[key]
 }
' /home/cmccabe/Desktop/panels/PFS_unix.bed $f | awk '{split($2,a,"-"); print a[1] "\t" $0}' | sort | cut -f2-> /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/${pref}_Pfscoverage.bed
      echo "End custom panel creation: $(date) - File: $f"
done >> "$logfile"
printf "coverage calculated and log created\n"
}
other() {
printf "\n\n"
printf "Please enter the gene(s) of interest, use a comma between multiple: "; IFS="," read -a gene
        printf "the indicated genes will now be loaded and used to calculate coverage\n"
        [ -z "$gene" ] && printf "\n No ID supplied. Leaving match function." && sleep 2 && return
        [ "$gene" = "end" ] && printf "\n Leaving match function." && sleep 2 && return
        for ((i=0; i<${#gene[@]}; i++))

logfile=/home/cmccabe/Desktop/NGS/API/2-12-2015/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do
     echo "Start custom panel creation: $(date) - File: $f"
     bname=$(basename $f)
     pref=${bname%%.txt}
     awk '
 NR == FNR {input[$0]; next}
 {
    split($5, a, "-")
    if (a[1] in input) {
         key = $4 OFS $5
         n[key]++
         sum[key] += $7
     }
 }
 END {
     for (key in n) 
         printf "%s %.1f\n", key, sum[key]/n[key]
 }
' /home/cmccabe/Desktop/panels/$gene $f | awk '{split($2,a,"-"); print a[1] "\t" $0}' | sort | cut -f2-> /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/${pref}_genescoverage.bed
      echo "End custom panel creation: $(date) - File: $f"
done >> "$logfile"
printf "coverage calculated and log created\n"
}

while true; do
    read -p "Do you want to get coverage of a specific panel?" yn
    case $yn in
        [Yy]* ) menu; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done

sea · February 23, 2016, 1:42am

This:

printf "Please enter the gene(s) of interest, use a comma between multiple: "
IFS="," read -a gene
        printf "the indicated genes will now be loaded and used to calculate coverage\n"

Will not work as you expect.

In fact, you tell to only catch the first genom only, and no other.
Because you say the IFS shall be ',' which is shall be used to seperate the genoms, mainwhile, you only read 1 genom, as 'gene' will be split into as many arguments/variables as the user passes using ','.

Saying:
Replcae the IFS= part to a later procedure, when parsing the user input.
Parsing is done after reading, or if while reading, it must be a limited (say pass 3 genoms, then you mus tread 3 variables - not just one).

I'm no scientist, but afaik a genom doesnt have 'spaces' in between, so they might just seperate the genoms passed by spaces OR coma - since the IFS is removed, that doesnt matter, in fact, its even simpler to work with the passed genoms, if the users do not use ',' to seperate the list.

Only use the red parts if you insist of using coma to seperate the list, if using space its not required at all.

read genes
oIFS="$IFS"
IFS=","
for gene in $genes;do
	echo "Working with genom: $gene"
done
IFS="$oIFS"

Other than that, please make the according corrections of for loops as Don already stated.

Thank you and hope this helps

Don_Cragun · February 23, 2016, 3:56am

In case what sea said:

wasn't clear, line 84 in your script:

        for ((i=0; i<${#gene[@]}; i++))

is missing a do and a done .

Since the indentation in your code seems to be random, I can't tell what you intend to include inside that for loop (i.e., where the done should be placed).

cmccabe · February 23, 2016, 10:54am

I updated the portion of code and it does seem to append the entered genes to a file GENE.txt. The problem is even though each line is one a new line a space is put in after wach so no calculation results.

PTPN11,SCN1A,FBN1

GENE.txt looks like

PTPN11
SCN1A
FBN1

.

However, if there is only one gene entered PTPN11 then the calculation works fine.

I apologize about the indenting, I am a scientist and not a programmer. Can you recommend some books on correct indentation? Thank you :).

other() {
printf "\n\n"
printf "%s \n" "Please enter gene(s), use a comma between multiple:"
OLDIFS=$IFS
IFS=","
read -a genes
for (( i = 0; i < ${#genes[@]}; i++ ))
    do
    printf "%s \n" "${genes[$i]}" >> /home/cmccabe/Desktop/panels/GENE.txt
    done
IFS=$OLDIFS

MadeInGermany · February 23, 2016, 4:10pm

There is no strict rule for indention.
Its purpose is to quickly realize the structure. But people are different.
I usually put for/do/done on one indention level, and increase indention of the code block in between
Same for if/then/else/fi (and further indention of the code blocks in between).
--
If you really use the $gene array (and consequent use of ${gene[ ]} index) then your original IFS="," read -a makes sense, and you don't need the extra IFS stuff that SEA suggested.

other() {
printf "\n\n"
printf "%s \n" "Please enter gene(s), use a comma between multiple:"
IFS="," read -a genes
for (( i = 0; i < ${#genes[@]}; i++ ))
do
    printf "%s \n" "${genes[$i]}"
done > /home/cmccabe/Desktop/panels/GENE.txt

cmccabe · February 23, 2016, 5:54pm

Using the suggested code, If two genes are entered, say PTPN11,SCN1A. I can see them both on separate lines in GENE.txt, but there is a space after each newline in the file, so no calculation is done. Thank you

GENE.txt

PTPN11 
SCN1A

---------- Post updated at 04:54 PM ---------- Previous update was at 03:29 PM ----------

This seems to work:

printf "%s \n" "Please enter gene(s), use a comma between multiple:"
OLDIFS=$IFS
IFS=","
read -a genes
for (( i = 0; i < ${#genes[@]}; i++ ))
do
    printf "%s\n" "${genes[$i]}" >> /home/cmccabe/Desktop/panels/GENE.bed
done

removed IFS=$OLDIFS as well