So, the idea is to pick every 2000th data (with different starting line) and put them into a new file, this process goes down to the end of file.
I've tried AWK/sed with while loops but it does not work because AWK/sed dont accept variables ($i) in their arguments. The code is in CSHELL.
This works perfectly for limited number of files, however if I am to deal with the entire data, actual case, there will be 2000 ouput files which the code gives me error, " too many output files", e.g. if I try 2000 instead of 3 in NR-1%3+1 it doesn't like it!!
Thanks!!
well, the actual text file has 59850 records where I have to pick every 3325 records and output to different files!! So, i have to place some number around 18 which still gives me the same error code " too many output files" !!
Can you please help me more? Also, what if i have more than one field in the input file? will a for loop inside AWk structure do?
Here a version that works without running into the open file limit. It's an awk script and should be in a file:
BEGIN {
print RF_CNT;
print LN;
}
(NR <= RF_CNT) { # output first records
a[NR] = "file" NR;
print >a[NR];
close(a[NR]); # expensive but stays under fopen limit
}
((i = NR%LN +1) in a) { # see if this line is a candidate
print >>a; # add it
close(a); # close it to stay under fopen limit
}
Use this command line for it:
awk -f num.awk -v RF_CNT=20 -v LN=2000 num.txt
The value of RF_CNT is the number of files to make. The first line of each file will contain line N of the input file (ie. file 4 contains line 4 of the input file).
The value of LN is the line offset/modulo to determine if line need to go to a file and which file.
num.txt is input file. I used a file with the numbers 1-8000 on each line.
It opens and closes the output files to avoid the open file limits...
The BEGIN clause is just to confirm inputs and can be removed....
I switched to SUN systems and the code 'kinda' worked. I mean, the numer of output files were correct but I had only one line in each output file. For the test case, I had one input file with 40950 records and I wanted multipliations of 2275 to be printed in each file which should result in 40950/2275=18 lines in each file.
In the last file, file2275, i had the last line of input file, for the file2274 i had 40949th line of the input file printed and so on to file1!!!!
I do not know what's wrong and am really confused.
The code works for small number of modulos like up to 10 but doesn't like big numbers!!
JP2542a
I tried your code but it gives me an error on the line begining with (( i = NR....). syntax error and bailing out near this line!!
Surprisingly, when i tried your first code on SUN, i.e.
awk '{print > "file" (NR-1)%3+1}' infile
it worked pretty well!! whereas when i tried the code with close (out) the outputs were garbage!
So, if i have 10 fields and i want to have each field processed seperatly, what should i do? By this I mean if 2000 files is generated from one column I would like to end up with 2000*10 files. Does a for loop
(for i=1;i<=FN;i++) works? if so, where in the code?
Dear JP2542a,
The following is the code i've used. The bug shows up when i try it on Solaris machines while when i run it on SUNs it doesn't complain but it doesn't generate any files. Don't bother yourself if it's sth wierd coz i've been succefully run Sottn's code.
I should very much appreciate your help:)
I'll probably get back here with some other questions.
Cheers
BEGIN {print RF_CNT;print LN;}(NR <= RF_CNT) { a[NR] = "file" NR;print >a[NR];close(a[NR]);}(( i = NR %LN + 1 ) in a ) {print >> a;close(a);}
well, (NR-1)%3+1 was just a test, the increments between lines were actually 3325 in my actual input file, so i replace 3 with 3325! so i am dealing with sth like 33250 files in total, sonsidering 10 fields!
cheers
I tried to change the code the get the desired outputs but it didnt happen. Assuming (NR-1%3+1), I'll bring a simple example to simplify what I am willing to have:
So, for each fileld there will be three files containing only contents of field 1 and the same for other two fields. In general, the number of sequential numbers would be 3*3=9.