I have multiple (~80) files (some can be as big as 30GB of >1 billion of lines!) to grep on a pattern, and piped the match to a single file. I have a 96-core machine so that each grep job was sent to the background to speed up the search:
file1.tab
chr1A_part1 123241847 123241848
chr1A_part1 123241848 123241849
chr1A_part1 123241849 123241850
chr1A_part1 123241850 123241851
......
The input files have uniformly 3 fields each row, so should the output file,
for file in $(cat files.list); do
grep -F chr1A ${file} >> subset_chr1A.tab &
done
but I found some of the matching lines are broken and the output file became a mess!
subset_chr1A.tab
chr1A_part1 123241847 123241848
chr1A_part1 123241848 123241849
chr1A_part1 1232
41849 123241850
ch1
chr1A_part1 12
3241850
chr1A_part1 123441848 123441849
123541851
...
It seems to me the problem is from the writing of the pipe, as 80 grep jobs for 80 files are writing to the same output file. By default grep prints matching lines so that I assume each row should be printed as a whole, but it did not in my case.
What is wrong here?