Bash to extract file prefix and from input to use in output

cmccabe · October 25, 2017, 8:24am

In the bash below which does execute I am trying to extract the contents of ${id} is 1234 , as ${id} stores the variable that changes each time.
After the path is removed the contents of ${id} are stored in pref , so they can be used in the output. Currently I am not able to extract the 1234 in the for and can not seem to figure it out. Thank you :).

awk: cmd. line:2: fatal: cannot open file `C:/path/to/file/.txt
.hg19_multianno.txt' for reading (No such file or directory)

for f in C:/path/to/file/${id}.txt.hg19_multianno.txt ; do
     bname=`basename $f`
     pref=${bname%%$.*.txt}
     awk -v F=49 -v T=96 '
BEGIN{FS=OFS="\t"}
{ b=T+1
t=T<NF?T:NF
for(i=F;i<NF-t+F;i++) $i=$(b++)
NF=--i}1' $f > C:/path/to/file/${pref}_dbremoved.txt
done

desired C:/path/to/file/${pref}_dbremoved.txt

1234_dbremoved

Aia · October 25, 2017, 9:46am

cmccabe:

[...]

for f in C:/path/to/file/${id}.txt.hg19_multianno.txt ; do
   bname=`basename $f`
   pref=${bname%%$.*.txt}
   awk -v F=49 -v T=96 '
BEGIN{FS=OFS="\t"}
{ b=T+1
t=T<NF?T:NF
for(i=F;i<NF-t+F;i++) $i=$(b++)
NF=--i}1' $f > C:/path/to/file/${pref}_dbremoved.txt
done

Let's reflect in what's highlighted. The id variable needs to be populated before it can be used. Now, let's assume it contains a value that ultimately will result in a file path. A loop is not necessary to iterate over what it would just be one item.

cmccabe · October 25, 2017, 9:58am

There may be multiple ${id} values in the directory, however each unique ${id} is stored in a file called list , one per line, in the same path. So if 1234, 5678, 9123 are each ${id} the list will look like:

list.txt

1234.txt
5678.txt
9123.txt

Would this file need to be used to populate each id? I am not sure if that helps. Thank you :).

Aia · October 25, 2017, 10:17am

Start by reading the file named list that contains the information.

dir_path=$PWD
while IFS= read line
do
    id=${line%.*}
    echo "$dir_path/${id}.whatever_else"
done < list

And build upon that.

cmccabe · October 26, 2017, 8:47am

So the portion in bold identifies ${id} as 1234 perfectly, however it does not store it so the for loop can use it and output the file with the 1234 . Do I not need a loop even though ${id} could be multiple? Thank you :).

# read id from target.txt
dir_path=C:/Users/cmccabe/Desktop/annovar
while IFS= read line
do
    id=${line%.*}
    echo "$dir_path/${id}.hg19_multianno.txt"
done < target.txt

# remove -dbnsfp33a fields 49-96 from multianno
${id}=(line)
for f in 'C:\Users\cmccabe\Desktop\annovar\${id}.txt.hg19_multianno.txt' ; do
     bname=`basename $f`
     pref=${bname%%$.*.txt}
     awk -v F=49 -v T=96 '
BEGIN{FS=OFS="\t"}
{ b=T+1
t=T<NF?T:NF
for(i=F;i<NF-t+F;i++) $i=$(b++)
NF=--i}1' $f > C:/Users/cmccabe/Desktop/annovar/${pref}_dbremoved.txt
done

Aia · October 26, 2017, 9:17am

Work from this point.

dir_path=C:/Users/cmccabe/Desktop/annovar
while IFS= read line
do
    id=${line%.*}
    awk -v F=49 -v T=96 '
        BEGIN{ FS=OFS="\t" }
        { b=T+1
           t=T<NF?T:NF
           for(i=F;i<NF-t+F;i++) {
              $i=$(b++)
              NF=--i}
         }1' "$dir_path/${id}.hg19_multianno.txt" > "$dir_path/${id}_dbremoved.txt"
done < target.txt

Modified on the fly. Not tested.

cmccabe · October 31, 2017, 7:32am

Thank you very much :).