array + if in linux shell scripting

Hi,

I am having two set of files with different number of columns and rows.
A set of files have only single row with 20 columns.
B set of files have 1000s of rows with 5 columns.
both set contains equal number of files.

I want to save all the 20 columns of A in variables one by one and then compare it with 5th column of set B files. If it matches....it should print the 4th column of set B files.

Thanks in advance!!!! :slight_smile:

It would be easier if you post sample data: input and expected output.

Hi radoulov,
Thanks for ur interest.
sample files are:
A files sample :

3.764724 3.765135 3.780947 3.785922 2.918862 3.791665 3.776370 2.918862 3.773137 3.778527 3.773680 2.943051 3.782568 3.771691 3.773778 3.765135 0.000000 3.773137 3.773778 3.779468

B files sample :

1  AAN  97  APN  7.789069
1  AAN  98  ASK  9.827249
1  AAN  99  DAS  7.531465
2  BBM  1    GFD  3.786426
2  BBM  2    RYT  0.000000
2  BBM  3    RGF  3.764724
2  BBM  4    MKH  6.393094
2  BBM  5    POU  8.005275
2  BBM  6    PHE   6.145675...... AND SO ON..

I want to check if first field (i.e, 3.764724 ) == 5th column && 2nd column =="BBM" then
echo col4th of B files.

Likewise, I need to check it for every variable of A files from every corresponding B files.

Thanx :slight_smile:

Something to start with, this command gives the output of one Afile and one Bfile:

awk '
NR==1{s=$1; next}
$2=="BBM" && $5==s {print $4}
' Afile Bfile
for i in $(<file1); do awk '($5 " " $2) ~ /'$i' BBM/ {print $4}' file2; done
RGF
RYT

Hi Franklin,

I have tried ur code. It works well :b:
But it shows output only for a single A file and its corresponding single B file.
Now, I need to run it for multiple files and save their data into new file.

I tried the following code, but it doesnt work:(

f

name=list1
exec<$fname
while read line
       do
 
        f2name=list2
        exec<$f2name
        while read line2
        do
 
        awk ' NR==1{a=$1; next} $2=="BBM" && $5==a {print $4}' $line $line2> $line"456"
        done
done

thanks for the help!!!!

---------- Post updated at 07:40 AM ---------- Previous update was at 06:04 AM ----------

Hi ygemici,

code given by you also works for single A file and single B file.
the output shown is :

RGF
HGD
ALA
ALA
ALA
ALA
ALA
ALA
ALA
ALA
ALA
ALA
ALA

Although I need to display only RGF

Kindly help me to work with multiple files at a time.

Do you have the files in one directory?

What are the names of the A files and the B files?

yes, same directory has all the files.

A files are named as : npkf_123,
qb5f_123
and list1 consists the names of all these files.

whereas, B files are named as : mist_npkf,
mist_qb5f etc
and list2 consists the names of all B files.

I have created the above given shell in the same directory...

list1 and list2?

Post some lines of those files and please use code tags when posting code or data examples.

 
List 1: 
npkf_123
qb5f_123
 
list2 : 
mist_npkf
mist_qb5f 
 

Assuming the name of a Bfile is "mist_" and the part before the "_" of the Afile:

while read Afile
do
  file=${Afile%_*}
  Bfile="mist_"$file
  awk 'NR==1{s=$1; next}
  $2=="BBM" && $5==s {print $4}' $Afile $Bfile > $file"456"
done < List1

Hi franklin,

The above code gives the error :

Hi.

I imagine this line:

Bfile="mist_" file

should be:

Bfile="mist_"$file
1 Like

hi,
the above code is also giving the error

Whereas, B files are starting with "dist_", i cant understand, wotz the problem....

Hi everyone,

I am having two lists of equal number of files,

list1:
f_1a0s.pdb123fname
f_1a0t.pdb123fname
f_1acc.pdb123fname
f_1af6.pdb123fname
f_1aij.pdb123fname
f_1ap9.pdb123fname
f_1bh3.pdb123fname
f_1brx.pdb123fname
f_1c3w.pdb123fname

and

List2:
f_1a0s.pdbdist
f_1a0t.pdbdist
f_1acc.pdbdist
f_1af6.pdbdist
f_1aij.pdbdist
f_1ap9.pdbdist
f_1bh3.pdbdist
f_1brx.pdbdist
f_1c3w.pdbdist

Content of files from list1 is :

3.789894  3.775013        3.720026        3.766262        3.729790        3.775523        3.759575        3.781067        3.789970        0.000000        3.785133        3.756248        3.773160        3.720026        3.737493        3.773160        0.000000        3.772912        3.775922        3.737493

and content of files from list2 is:

126     ALA     124     VAL     5.242442
126     ALA     125     GLY     3.839224
126     ALA     126     ALA     0.000000
126     ALA     127     LEU     3.789894
126     ALA     128     THR     5.824391
126     ALA     129     LYS     8.606871
126     ALA     130     VAL     10.355069
126     ALA     131     TYR     9.790689
126     ALA     132     SER     11.737067

I want to compare 1st column of list1 files with 5th column of list2 files,
if it is same && the 2nd column of list2 files are "ALA", it must print 4th column of list2 files.
I used the below code:

fname=list1
exec<$fname
while read line1
do
fname2=membdistlist
exec<$fname2
while read line2
do
awk 'NR==1{s=$1; next} $2=="ALA" && $5==s {print $4} ' $line1 $line2
done
done


Output: It is printing "ALA" multiple number of times, which is wrong output.
Whereas, the command:

awk 'NR==1{s=$1; next} $2=="ALA" && $5==s {print $4} ' f_1a0s.pdb123fname f_1a0s.pdbdist

Which is showing single file from list 1 and single file from list2 is showing the correct output.
Can any one help in accessing this code for multiple files ????
Thanx in advance :slight_smile:

If you say the awk command gives you what you want, you can just make a loop to process all files like this:

while read f1 f2 ; do 
  awk 'NR==1{s=$1; next} $2=="ALA" && $5==s {print $4} ' $f1 $f2
done <(paste list1 List2)

But I'm afraid that the awk command will do what you want only if file1 contains only one line.

Hi mirni,

Thanx for ur efforts.....
but the code is showing an error: "unexpected end of file"
Can u plz figure it out.

ummm... sorry I forgot a redirection operator:

while read f1 f2 ; do 
  awk 'NR==1{s=$1; next} $2=="ALA" && $5==s {print $4} ' $f1 $f2
done < <(paste list1 List2)

or, in a more concise way:

paste list1 List2 | while read f1 f2 ; do 
  awk 'NR==1{s=$1; next} $2=="ALA" && $5==s {print $4} ' $f1 $f2
done 
1 Like

Hi Mirnis,

Above first code still shows an error

Whereas, the second given by you works perfectly. Thank you very much for this.

Now, I want to compare each of the 20 fields in list1 files with 5th field in list2 files and print their corresponding 4th field, separated by a tab.

paste lista listb | while read f1 f2;
do
awk 'NR==1{ala=$1; next} $2=="ALA" && $5==ala {print $4"\t"}' | awk 'NR==1{val=$2; next} $2=="VAL" && $5==val {print $4"\t"} ' | 
awk 'NR==1{leu=$3; next} $2=="VAL" && $5==leu {print $4"\t"} ' | awk 'NR==1{iso=$4; next} $2=="VAL" && $5==iso {print $4"\t"} ' 
$f1 $f2 > $f1"output"
done

and so on..... for all 20 fields

Is it possible in a single code or do i need to make 20 such programmes and then merge them into one?

Hi,you can try this..

paste list1 list2 | while read f1 f2;do
awk 'NR==FNR{a[$20]=$20;b=$4;next}a[$5]{print b"\t"$4}' $f1 $f2;done

regards
ygemici

1 Like