Hi, All
I have a huge file which has 450G. Its tab-delimited format is as below
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is from 600000 to 30000000. I wrote the following perl script but it doesn't work:
#!/usr/bin/perl
$file1 = $ARGV[0]; # Input file
$file2 = $ARGV[1]; # Output file
open (IN, $file1);
while ($line = <IN>)
{
chomp($line);
@array = split(/\t/,$line);
if ($array[0] eq 'x10')
{
if (($array[2] >= 600000) && ($array[2] <= 26279795))
{
open (OUT, ">>$file2");
print OUT "$line\n";
close OUT;
}
}
}
close IN;
exit;
I guess the input file and output file are both too big that my script can't handle it.
Anyone knows if there is any good way to do it? Perl or Shell scripts are preferred..
All your help will be appreciated!