Finding largest files takes too long

alexcol · June 11, 2018, 7:54pm

Good evening.

because the folder has thousand of files it takes too long and have some trouble to get the largest files and then compress files or delete it, for instance

find . -size +10000000c -exec ls -ld {} \; |sort -k5n | grep -v .gz

The above commad took an hour and i have to cancel it

du -sk * | sort -nr | more

secondly The above command took up some much time and got me root: args too long so it filed

Tried to compress some files but it took so long

nohup gzip log_HistoricoRecargas*201703*.log &

are there more efficient methods or commands to do this because i failed tried with these 3 choices
The operating system is SunOS

I appreciate your help in advanced

rbatte1 · June 12, 2018, 7:02am

If there are many many files, then any search trough them will take time, potentially a lot of time. If this is on a filesystems that is mounted over the network, then this time will be much greater. Far better to do the work when the disk really is.

Perhaps this may be slightly more efficient for you though:-

find . -type f -size +10000000c ! -name "*.gz" -ls | sort -bnk 7

Can you tell us more about what hardware you have?

Kind regards,
Robin

alexcol · June 12, 2018, 7:41am

Yes you are right, this a Filesyem mounted over a network

df -k 
10.80.1.83:/SCEL/logs1
                     37748736 29987224 7742352    88%    /SCEL/logs1

I dont have much infor about hardware just this;

SCEL /SCEL/logs1 # uname -a
SunOS prosclxxc 5.10 Generic_144488-01 sun4u sparc SUNW,SPARC-Enterprise

But if i run the sugested command there is a risk to compress logs from today i just want to compress logs from some day ago but not today.

I'd appreciate your help in advanced

gull04 · June 12, 2018, 7:58am

Hi,

You can modify rbatte1's command to:-

find . -type f -mtime +2 -size +10000000c ! -name "*.gz" -ls | sort -bnk 7

This will only deal with files older than two days.

Regards

Gull04

joker · June 12, 2018, 7:59am

You may create a list of filenames and sizes prior to having a closer look with different criteria. So the long running part - the reading of all file sizes is only being done once.

Example

1) Read the sizes

find / -type f -exec stat -c "%n %s" "{}" + >$HOME/files_sizes.txt

2) find the big files

awk '$2 > 1000000 { print $0 }' $HOME/file_sizes.txt

I've no knowledge with SunOS. The above commands may have slightly different syntax there.

rbatte1 · June 12, 2018, 8:22am

Can you sign on to the server at address 10.80.1.83 ? If you can, running your code there will be significantly faster that running over the network. This will be not just the searching, but the actual compression too. If you compress over the network, then you have to read the file across the network to your local memory, compress it and then write the resultant file back across the network to the server disk.

It really could be a massive difference in performance.

Robin

jim_mcnamara · June 12, 2018, 10:24am

Stomp - there is no stat command on vanilla SunOS. AFAIK. Since Oracle took over the Solaris freeware site died as well.

I think the OP also has another problem - Solaris 10 file systems (not ZFS) and earlier
all had a problem. If there are large numbers of files in a single directory, some file-related commands, notable is ls , bog down. A lot.

We had a directory with >30K small files in it. I fixed the performance problems by
moving files off the primary directory every day in a cron job. But still kept on the same file system. With about 5000 files performance was acceptable.

alexcol - please post the the output of a command the gives the physical size in bytes of the exact directory with the problem.

Since I do not know the name of the directory, here is an example, note the lowercase "d" in the command:

ls -ld  /path/to/directory

Please post the result so we can help

And if you happen to have too many individual files, regardless of size, on a file system then you can run out of inodes as well. This is pretty hard to do, but if the filesystem was created with unusual parameters this happens.
To see used inodes, try:

df -i /path/to/mountpoint

where mountpoint is the place in the file system where you interesting directory is mounted

alexcol · June 12, 2018, 8:33pm

This is the output of 3 subdirectores of Filesysyem /SCEL/logs1

SCEL /SCEL/logs1 #ls -ld xpbatch  
drwxrwsrwx   3 oracle   explotacion 86179840 Jun 12 09:47 xpbatch
SCEL /SCEL/logs1 #ls -ld xpfactur
drwxrwxrwx   9 oracle   explotacion 1327104 Jun 12 09:45 xpfactur
SCEL /SCEL/logs1 #ls -ld xpconpag
drwxrwxrwx   2 xpconpag explotacion 2596864 Jun 12 09:00 xpconpag
SCEL /SCEL/logs1 #

I appreciate your help once again in advanced

---------- Post updated at 10:16 AM ---------- Previous update was at 09:53 AM ----------

Thank you all of yout for your support:

This is the output of 3 subfolders for Filesystem /SCEL/logs1

SCEL /SCEL/logs1 #ls -ld xpbatch  
drwxrwsrwx   3 oracle   explotacion 86179840 Jun 12 09:47 xpbatch
SCEL /SCEL/logs1 #ls -ld xpfactur
drwxrwxrwx   9 oracle   explotacion 1327104 Jun 12 09:45 xpfactur
SCEL /SCEL/logs1 #ls -ld xpconpag
drwxrwxrwx   2 xpconpag explotacion 2596864 Jun 12 09:00 xpconpag

---------- Post updated at 07:33 PM ---------- Previous update was at 10:16 AM ----------

This is what I posted, thanks again for your help

SCEL /SCEL/logs1 #ls -ld xpbatch  
drwxrwsrwx   3 oracle   explotacion 86179840 Jun 12 09:47 xpbatch
SCEL /SCEL/logs1 #ls -ld xpfactur
drwxrwxrwx   9 oracle   explotacion 1327104 Jun 12 09:45 xpfactur
SCEL /SCEL/logs1 #ls -ld xpconpag
drwxrwxrwx   2 xpconpag explotacion 2596864 Jun 12 09:00 xpconpag

rbatte1 · June 13, 2018, 6:14am

Can you sign on to the server at address 10.80.1.83 ? If you can, running your code there will be significantly faster that running over the network. This will be not just the searching, but the actual compression too. If you compress over the network, then you have to read the file across the network to your local memory, compress it and then write the resultant file back across the network to the server disk.

It really could be a massive difference in performance.

Robin

MadeInGermany · June 13, 2018, 1:10pm

If your bottleneck is the many fork()s then it helps to replace

-exec ls -ld {} \;

with

-exec ls -ld {} +

The + bundles the arguments and forks/execs few ls with the maximum number of arguments.

alexcol · June 15, 2018, 11:44am

Thanks again all of you for your help. It was useful for me