Add values < or eq to 1000

repinementer · November 9, 2009, 10:06pm

make a list based on the first column key and corresponding value (2nd column-bold) in input1 search values that less than or equal to 1000 (2nd column-bold)in the input2 of the same key along with other columns.

input

x1 10 hfffhf 646474_jhg
x2 100 jkfgjj 765755_jg

input2

x1 -990 jgjgjggjhgh
x1 -991 jgjhgjgg
x1 1010 nbnmmbmb
x1 1011 jhgjhg
x2 1100 ghjgjhg
x2 1111 jbhgjg
x2 -900 jghgh
x2 -899 97jjkh

output

x1 10 hfffhf 646474_jhg x1 -990 jgjgjggjhgh
x1 10 hfffhf 646474_jhg x1 1010 nbnmmbmb
x2 100 jkfgjj 765755_jg x2 1100 ghjgjhg
x2 100 jkfgjj 765755_jg x2 -900 jghgh

Thanx

thegeek · November 11, 2009, 4:57am

Could you please explain it better !

Franklin52 · November 11, 2009, 5:19am

Try this:

awk 'NR==FNR{a[$1]=$0;next}
a[$1] && !($2 % 10) {print a[$1], $0}
' file1 file2

repinementer · November 16, 2009, 3:09am

I sounded bit rude in the previous mail. My apologies. and Thanx for the code.
The code is giving me trouble when the size of the number increases.

example
input1

x1 115404863 hfffhf 646474_jhg

input2

x1 115405673 hfffhf 646474_jhg x1 -990 jgjgjggjhgh

Franklin52 · November 16, 2009, 3:30am

Sorry, I don't think I understand your question. Can you clarify the question, maybe with more examples?

repinementer · November 16, 2009, 3:57am

The input1 contain keys in first column (x1,x2 and so on). second column contains values ranging from smaller to largest (1 to 10000000 and so on).Second input is also the same.

Logic is to find the values that are with in +/-1000 with corresponding to values in input1 (along with corresponding columns not shown below)
As you can see below the ist input value is 10000 (the values with in range of +/- 1000 i.e. +1000=11000 and -1000=9000 are in output others are not)

input1

x1  10000

input2

output

x1  10000  9000
x1   10000  11000

Franklin52 · November 16, 2009, 5:12am

Please be more specific, I've spend a lot of time trying to understand your question but without any luck (and I'm sure I'm not the only one at all).

What if the value of the 2nd column in the 1st file is 10, 18, 29, 100 or 1000?

ghostdog74 · November 16, 2009, 6:41am

repinementer, you have 150++ posts, i am sure you are already competent enough to start showing some of your own code. what have you tried? search back your previous posts and see how awk is used to solve some of your other similar problems.

repinementer · November 16, 2009, 10:43pm

Here is the detailed input file with explanation in output(bold)
input1

x1    10
x1    100
x2    1000
x2    10000
x3    989
x4    345
x10    8767736477736
xx    234
xy    387889999

input2

x1    1010
x1    1100
x1    900
x1    800
x1    10000
x2    2000
x2    3000
x3    1989
x3    2000
x4    1345
x4    1500
x10    8767736478730
x10    10000000
xx    1234
xy    387888999

output

x1    10    x1    800    [x1  10+/-1000=1010 or -990 i.e all the values bw 1010 and -990 with a similar key x1 those are 800,900,1010 ]
x1    10    x1    900
x1    10    x1    1010    [x1  100+/-1000=1100 or -900 i.e all the values bw 1100 and -900 with a similar key x1 those are 800,900,1010,1100 ]
x1    100    x1    1100
x1    100    x1    800
x1    100    x1    900
x1    100    x1    1010
x2    1000    x2    2000       [x2  1000+/-1000=2000 or 0 i.e all the values bw 2000 and 0 with a similar key x2 those are 2000 ]
x2    10000    NULL    NULL                [Have none]
x3    989    x3    1989         [x3  989+/-1000=1989 or -11 i.e all the values bw 1989 and -11 with a similar key x3 those are 1989]
x4    345    x4    1345         [x4  345+/-1000=1345 or -665 i.e all the values bw 1345 and -665 with a similar key x4 those are 1345]
x10    8767736477736    x10    8767736478730                   [same like above]
xx    234    xx    1234                                                     [same like above]
xy    387889999    xy    387888999                                   [same like above]

---------- Post updated at 07:38 PM ---------- Previous update was at 06:22 AM ----------

Guys i treid but still a bug in the file

awk 'NR==FNR{a[$1]=$2;next} a[$1] && ($2 <= a[$1]+1000) && ($2 >= a[$1]-1000) {print a[$1], $0}' test1.txt test2.txt

output for the above code

100  x1    1010
100  x1    1100
100  x1    900
100  x1    800
989  x3    1989
345  x4    1345
8767736477736  x10    8767736478730
234  xx    1234
387889999	xy    387888999

---------- Post updated at 07:43 PM ---------- Previous update was at 07:38 PM ----------

and it would be grateful if you suggest how to define other columns in the same script like a[$1]=$1 .....so that i can print all the other column along with the output

danmero · November 17, 2009, 12:07am

This will match your required output and hopefully solve your problem

awk 'NR==FNR{a[$1]=$1;b[$1]=$2;c[$1]=$0;next}a[$1] && b[$1]<($2+x) && b[$1]>($2-x){print c[$1],$0}' x=1001  input1 input2
x1 10 hfffhf 646474_jhg x1 -990 jgjgjggjhgh
x1 10 hfffhf 646474_jhg x1 1010 nbnmmbmb
x2 100 jkfgjj 765755_jg x2 1100 ghjgjhg
x2 100 jkfgjj 765755_jg x2 -900 jghgh

repinementer · November 17, 2009, 8:34pm

Hi.Thanx for explaining array definition by using a,b and c. Very helpful. The result for your code ouput is as almost same as mine. The bold letters in desired output below I high lighted are missing.? Do you know why?

input1

x1    10
x1    100
x2    1000
x2    10000
x3    989
x4    345
x10    8767736477736
xx    234
xy    387889999

input2

x1    1010
x1    1100
x1    900
x1    800
x1    10000
x2    2000
x2    3000
x3    1989
x3    2000
x4    1345
x4    1500
x10    8767736478730
x10    10000000
xx    1234
xy    387888999

Output_for_above_Codes

x1    100  x1    1010
x1    100  x1    1100
x1    100  x1    900
x1    100  x1    800
x3    989  x3    1989
x4    345  x4    1345
x10    8767736477736  x10    8767736478730
xx    234  xx    1234
xy    387889999 xy    387888999

DesiredOutput

x1    10    x1    800    
x1    10    x1    900
x1    10    x1    1010    
x1    100    x1    1100
x1    100    x1    800
x1    100    x1    900
x1    100    x1    1010
x2    1000    x2    2000      
x2    10000    NULL    NULL            
x3    989    x3    1989         
x4    345    x4    1345        
x10    8767736477736    x10    8767736478730                   
xx    234    xx    1234                                                     
xy    387889999    xy    387888999

danmero · November 17, 2009, 9:44pm

Try this solution:

# cat awk.script
NR==FNR{
                a[$0]=$0
                next
        }
        {
                b[$0]=$0
                next
        }
END{
        for(x in a){
                        for(y in b){
                                        split(a[x],c)
                                        split(b[y],d)
                                        if( c[1]==d[1] &&
                                            ( d[2] <= c[2] + 1000 ) &&
                                            ( d[2] >= c[2] - 1000 ) \
                                           ){
                                                print a[x] "\t" b[y]
                                            }
                                   }
                   }
    }
# awk -f awk.script input1 input2 | sort
x1    10        x1    1010
x1    10        x1    800
x1    10        x1    900
x1    100       x1    1010
x1    100       x1    1100
x1    100       x1    800
x1    100       x1    900
x10    8767736477736    x10    8767736478730
x2    1000      x2    2000
x3    989       x3    1989
x4    345       x4    1345
xx    234       xx    1234
xy    387889999 xy    387888999

Do you need this line and why ?

repinementer · November 17, 2009, 10:27pm

NULL is beacuse for the value in input1 key has no referred values with in +/- 1000 range in input2.
It would be great if I have this small one

I'm getting following error.

$ awk -f awk.script test1.txt test2.txt
gawk: awk.script:16:                                             ( d[2] >= c[2]
- 1000 ) \
gawk: awk.script:16:
         ^ backslash not last character on line

I can run the script with out backslash. whats the significance of \

danmero · November 17, 2009, 11:23pm

Let's try again

# cat awk.script
BEGIN   {
        t="\t"
        n="NULL"
        }
NR==FNR{
                a[$0]=$0
                next
        }
        {
                b[$0]=$0
                next
        }
END{
        for(x in a){
                        for(y in b){
                                        split(a[x],c)
                                        split(b[y],d)
                                        if( c[1]==d[1]                  &&\
                                            ( d[2] <= c[2] + 1000 )     &&\
                                            ( d[2] >= c[2] - 1000 )       \
                                           ){
                                                print a[x] t b[y]
                                                o[a[x]]++
                                            }
                                   }
                        if(!o[a[x]])       {
                                        print a[x] t n t n
                                   }
                   }
    }

repinementer · November 17, 2009, 11:49pm

Still the same error.
Anyways with out it (\)the script working great. Thank you.
Still I have not cleared the doubts about previous scripts(awk one liners). Whats wrong with the previous scripts?

danmero · November 18, 2009, 7:37am

My awk(FreeBSD) require (\) backslash

Using the first column value as array key to hold the second column value, because you have multiple records having the same key value (x1 and x2 in this example) each one will overwrite the previous one and only the last one will be saved in array as per your example:

repinementer · November 18, 2009, 9:18am

That make sense. Thanx alot for cooperation.