awk comparison and substitution

waddle · November 15, 2011, 5:07am

Hi,
here's my - not so easy to describe - problem: I want to compare the values of one file (FileA) with a cutoff-value and, if this comparison is true, substitute those values with those in the second file (FileB). However, there are many FileA's (FileA[1->200]), whereas there is only one FileB. Every FileA has three lines, each containing one value.

3.4
3.5
3.6

FileB has 3 columns and > 200 lines.

0.0154    0.0139    0.0227
0.0198    0.0259    0.0231
0.0126    0.0216    0.0174
0.0115    0.0145    0.0237
...            ...            ...

The three values of FileA1 should now be compared line-by-line with the cutoff-value. If true, the corresponding value of FileB should be assigned. However, those corresponding values are contained within a line of FileB. So I now need some kind of script which substitutes line x of FileA[1] (if value > or < cutoff) with the field in line[1] and cloumn x.
My script so far:

# getting number of line of FileB in which values of FileA are contained

n=`echo "$1" |sed 's/.*\([0-9]\{1,3\}\).*/\1/'`

# comparison and substitution

awk -v val=$n '{
    getline < "$1"
        for(i=1; i<=NF; i++){
            if($i >= 3.5){
                print $i 
            }
        else{
            getline < "FileB.txt"
            NR==n {print $i}
            }
        }
        } ' $1 FileB.txt > $1_new.txt

since i'm a beginner in awk, it's very intuituve aaaand - of course - doesn't work.

output should look something like this:

0.0154
3.5
3.6

Any help would be greatly appreciated!

waddle

rdcwayx · November 15, 2011, 5:53am

I still don't understand, why 3.4 is choiced, and replaced by 0.0154. Why not 3.5 or 3.6. What's the cutoff value in each line in fileB

can you explain more detail?

waddle · November 15, 2011, 6:31am

The script should chek FileA[1] for a line containing a value greater or equal to a value, in this example 3.5. If this is true, the original value should be printed, if it's false, this specific value (in this example in line 1 of FileA[1]) should be replaced with the corresponding value in FileB, here line1 (since its FileA[1]), column 1 (first value in FileA). Again, notice that the corresponding values are listed in one column in FileA and one line in FileB.

I hope, it's better to understand now...

Klashxx · November 15, 2011, 7:18am

Try this:

# cat FileA1
3.4
3.5
3.6

# cat FileA2
3.1
3.3
3.8

awk -v val=3.5 '
NR==FNR{o[NR""1]=$1
        o[NR""2]=$2
        o[NR""3]=$3
        next}
{
if ( FILENAME  != lFN ) 
   L++
val+=0
cmp=$1+0
if ( val <= cmp ) 
   print 
else 
   print o[L""FNR] 
lFN=FILENAME}' FileB FileA*
0.0154
3.5
3.6
0.0198
0.0259
3.8

waddle · November 15, 2011, 8:29am

Thanks for your time and work Klashxx,

the code works, though I now have some new problems

i don't really understand it (but i'll try to)
the code only works when i paste it into the shell, not when i try to run the script (but that's not the main point)
i have to type in all FileA's (>200) consecutively (s. 4.) (also not the main point)
i can't compare one FileA individually: if I take FileA[145], the values (if necessary) become substituted with those of line 1 from FileB, not with line 145

Thanks again,
waddle

Klashxx · November 15, 2011, 9:04am

The basic thing here is the naming of the files, you need a constant pattern , say FileA1,FileA2,FileA3,...FileA200.

# cat FileA4
3.6
3.3
3.8

# cat ren.sh             
#!/usr/bin/ksh

value="${1}"
patFileA="${2}"
FileB="${3}"

awk -v val="${value}" '
NR==FNR{o[NR""1]=$1
        o[NR""2]=$2
        o[NR""3]=$3
        next}
{
if ( FILENAME  != lFN ) 
   extF=substr(FILENAME,match(FILENAME,/[0-9]/))
val+=0
cmp=$1+0
if ( val <= cmp ) 
   print 
else 
   print o[extF""FNR] 
lFN=FILENAME}' ${FileB} ${patFileA}*

# ren.sh 3.5 FileA4 FileB
3.6
0.0145
3.8

Use

ren.sh 3.5 FileA FileB

to process all the files.

waddle · November 15, 2011, 11:39am

Hi Klashxx,
I took your code and modified it slightly for my purposes. The files indeed have a pattern in their naming: [0-9]{1,3}[A-Z]{3}. Your code gives me correct output solely for FileA's, in which the comparison is true.
For FileA no. 2 however, I always get the same output: in case of comparison is true, i get the three values contained in this file plus as much "2"s as FileB has lines. In case that the comparison is not true, i get as many empty lines as FileB has lines.
Do you have any explanation for this finding?
cheers,
waddle

Klashxx · November 16, 2011, 8:18am

Post the ex. file (content and name ) that generates the wrong result.

waddle · November 16, 2011, 8:55am

ok, so here are all my files:

FileA's:
1ABC:

3.75289
3.74839
3.74117

2DEF:

3.45011
3.44657
3.46905

3GHI:

3.27445
3.27389
3.30938

etc. etc.

FileB:

0.0154    0.0139    0.0227
0.0198    0.0259    0.0231
0.0126    0.0216    0.0174
0.0115    0.0145    0.0237
0.0146    0.0124    0.0149
0.0128    0.0142    0.0161
...

# cat ren.sh             
#!/usr/bin/ksh

echo "Please choose cutoff value", read cut

FileB="${1}"
patFileA="${2}"


awk -v val=cut '
NR==FNR{o[NR""1]=$1
        o[NR""2]=$2
        o[NR""3]=$3
        next}
{
if ( FILENAME  != lFN ) 
   extF=substr(FILENAME,match(FILENAME,/[0-9]/))
val+=0
cmp=$1+0
if ( val <= cmp ) 
   print 
else 
   print o[extF""FNR] 
lFN=FILENAME}' ${FileB} ${patFileA}*

sorry to annoy you that much,
thanks so far

Klashxx · November 16, 2011, 11:38am

Ok , you have to adjust the regex to match the file pattern.

# ls [0-9]*[A-Z]*            
1ABC  2DEF  3GHI

#!/usr/bin/ksh

echo "Please choose cutoff value: \c"
read cut

FileB="${1}"
patFileA="${2}"


awk -v val="${cut}" '
NR==FNR{o[NR""1]=$1
        o[NR""2]=$2
        o[NR""3]=$3
        next}
{
if ( FILENAME  != lFN ) 
   extF=substr(FILENAME,match(FILENAME,/^[0-9]*/),RLENGTH)
val+=0
cmp=$1+0
if ( val <= cmp ) 
   print 
else 
   print o[extF""FNR] 
lFN=FILENAME}' ${FileB} ${patFileA}*

# ren.sh FileB "[0-9]*[A-Z]*"
Please choose cutoff value: 3.45
3.75289
3.74839
3.74117
3.45011
0.0259
3.46905
0.0126
0.0216
0.0174

For an individual file:

# ren.sh FileB 1ABC          
Please choose cutoff value: 3.75
3.75289
0.0139
0.0227

waddle · November 17, 2011, 7:34am

Hi Klashxx,
I applied exactly your code, it just didn't work out for me. I still got some incorrect output with a lot of empty lines. It's ok though, I solved it with another (more artless) bash-script.
Thanks anyway!