Make multiple awk files into an executable

Hello everyone,

The following are my input files.

The following are my sequence of steps.

Can someone please let me know about how to make these bunch of steps into a single script so that I start the script with 1.txt and 2.txt, after execution gives me the final 1_2_distance_5K.txt file.

Thanks in advance.

Hi jacobs.smith,

What would be the output with the input of your first post? Otherwise paste input and output after your process. It would be easy to accomplish the task and check if it is correct.

1 Like

Hi Birei,

Thanks for ur post.

I somehow missed the output.

Here is the output. A small change, in my last step, instead of -5000 to 5000, I chose -1000000 to 1000000 for our input convenience. In that case, my final output would be this one

Basically I want to grab those records that are at a distance of 1000000 between column2 of file1 and column2 of file2 and print the whole record from both files.

Please feel free to re-post any comments u might come across.

Thanks in advance.

Give a try to next awk script:

$ cat 1.txt
chr1 14765298 14766727 def
chr1 16759093 16760238 def
chr1 16759236 16760238 def
chr1 20782516 20784428 him
chr1 20989962 20991078 her
chr2 31672150 31673532 abc
chr2 33157721 33158124 abc
chr3 34542283 34542962 abc
chr3 38248682 38251416 abc
chr4 58562053 58567653 abc
$ cat 2.txt
chr1 21438731 21439423 26.12
chr1 33939851 33940673 34.76
chr1 36779864 36780494 20.16
chr1 36817091 36817917 27.22
chr2 36977015 36977908 19.27
chr3 40475125 40475885 21.58
chr3 40483838 40484616 15.3
chr4 40502827 40503675 10.61
chr4 40532299 40533156 14.78
chr5 43593022 43594143 24.33
$ cat script.awk
BEGIN {
        if ( ARGC != 3 ) {
                print "Usage: awk -f script.awk <file1> <file2>"
                exit 0
        }
}

FNR == NR {
        f1_data[ FNR ] = $0
        next
}

FNR < NR {
        for ( i = 1; i <= length( f1_data ); i++ ) {
                split( f1_data[ i ], fields )
                if ( fields[1] != $1 ) {
                        next
                }
                substraction = fields[2] - $2
                if (substraction >= -1000000 && substraction <= 1000000 ) {
                        for ( j = 2; j <= length( fields ); j++ ) {
                                f1_line = (f1_line ? f1_line " " : "" ) fields[j]
                        }
                        printf "%s %s %d\n", $0, f1_line, substraction
                        f1_line = ""
                }
        }
}
$ awk -f script.awk 1.txt 2.txt 
chr1 21438731 21439423 26.12 20782516 20784428 him -656215
chr1 21438731 21439423 26.12 20989962 20991078 her -448769
1 Like

Hi Birei,

Thanks for your time.

But, it is not producing any output. All I get is a blank output. I did exactly what you have written.

Sure?

Same input and same awk program? Try debugging with prints inside the script to see where it fails.

I can't help much because I can't reproduce your problem, but post your OS and awk version, and perhaps other users have any idea.

1 Like

Hi Birei,

I tried using gawk -f script.awk and it works only for the input files. When I try it with other files, it doesn't do anything.

I doubt if the input files in this post are space separated.

Mine are tab separated.

Any thoughts?

Thanks again.

Ah, ok. So it works as I posted it. It was strange otherwise.

Tabs shouldn't be a problem, because awk splits automatically in any space character, but they will have any other issue. Post an example that doesn't work.

OK. This is what I have checked so far.

The space or tabs doesn't matter. I checked it.

The gawk or awk doesn't matter. I checked it.

When the limit is -1000000 to 1000000, it works fine.

But, when I change it to -5000 to 5000, it doesn't.

I am checking this script on an already done files. I got 161 records using -5000 to 5000 range using the same two input files.

But, this script doesn't generate anything.

I know it is weird. Any thoughts?

Post both input files (part of them) and expected output to see where the script fails.

My files have more than a million records.

I just checked the script with those records that are between -5K and 5K range.

It works fine.

But, when I give the main files, it doesn't generate anything.

What do you think? :wall::wall::wall::wall:

Without seeing the data which doesn't work, it's extremely difficult to say.