Hi,
I'm using Kali linux, I think it's a debian linux ?
I'm trying to create a folder in which there'll be 256 folders, and in each of this folders there will also be 256 folders.
Then in each terminate folders I want to create 4096 files.
It will look like /dir/aa/aa/aaa.txt, /dir/aa/aa/aab.txt, and so on.
Problem is when I try to fill the files, I'm using php (which may not be the best idea, but I need some functions such as sha512), so I open a big text file and read it to dispatch every line in its specific file. When I only have the 4096 files (ie. without all the directories before), there's no problem reading and writing, it's going quite fast.
When I try with all the directories, the read/write process takes super long time, like one minute to print one line into a file.
Do I reached the limit of files ? For what I thought, I could create up to 2^32 files in a directory.
Is php struggling ? Will perl or bash be better for that ? If so I need hashing functions that are not in bash.
If you guys have any idea, I'll be glad to hear it.
Thank you in advance
Hi,
Could you describe your algorithm and show code who write to file ?
Regards.
Do you have all 4096 files open at the same time?
There are possibilities, but as disedorgue mentioned, some code would clear things up.
Thank you for answering.
I should have post some code, that would have make things easier for sure.
I'm not trying to open 4096 files at the same time, I open the files, add a line, then close, and so on.
I decided to try with less directories, as in my previous example it would have make about 200 millions files. SO now I only have one directory with one hexadecimal letter. Then into each dir I have 4096 files, which is 65536 files at the end.
I also switched to perl as I thought it would be faster, but even with that the writing takes ages.
Here's some code :
#! /usr/bin/perl
use Digest::MD5 qw(md5_hex);
my $file = 'list';
open my $info, $file || die "Could not open $file: $!";
while( my $line = <$info>) {
$line =~ s/\r|\n//g;
my $md5 = md5_hex($line);
my $add = substr $md5,0,1;
my $add2 = substr $md5,1,3;
my $add3 = substr $md5,4,3;
$outfile = "md5hash__/".$add."/md5".$add2.".txt";
open (FILE, ">> $outfile") || die "problem opening $outfile\n";
print FILE $add3."\n".$line."\n";
close(FILE);
}
Don't mind the code, it was just to try and see the speed.
Thank you for your help
EDIT : I should say, the file I'm opening for reading is about 25gB heavy. But that wasn't a problem when I only had 4096 files to write, so I thought it would be the same with 65536.