Weird 'find' results

bodisha · February 16, 2017, 1:25pm

Hello and thanks in advance for any help anyone can offer me

I'm trying to learn the find command and thought I was understanding it... Apparently I was wrong. I was doing compound searches and I started getting weird results with the -size test. I was trying to do a search on a 1G file owned by the user database. I was expecting to get a single file back, but for some reason the find returns not only the 1G file but the scripting files owned by the database user. I've been messing with this for a while trying to understand it. I can filter it out by using a -not -name ".*" but that's not the point. I want to understand why it's including the start up scripts & what I'm doing wrong. Here's the command and the results... If someone could tell me what I'm doing wrong I would greatly appreciate it!!!

find /home -type f -user database -size 1G -ls
13774125 4 -rw-r--r-- 1 database database 18 Nov 20 2015 /home/database/.bash_logout
13774135 4 -rw-r--r-- 1 database database 193 Nov 20 2015 /home/database/.bash_profile
13774136 4 -rw-r--r-- 1 database database 231 Nov 20 2015 /home/database/.bashrc
13774141 1048576 -rw-r--r-- 1 database root 1073741824 Feb 16 11:10 /home/database/large1.log

RudiC · February 16, 2017, 2:24pm

Not so weird if you read the man page meticulously. man find :

Don_Cragun · February 16, 2017, 5:33pm

Hi RudiC,
I don't understand your comments on this issue. The command in this case is:

find /home -type f -user database -size 1G -ls

which is looking for regular files owned by user database that contain exactly 1073741824 bytes. I don't see that any of first three lines of the -ls output provided by the above find command meet that criteria.

I agree that if the command had been:

find /home -type f -user database -size -1G -ls

then the output shown might be expected. But with 1G as the -size primary's argument (not -1G ), I don't understand the output shown.

bakunin · February 16, 2017, 5:54pm

I don't think so: because the size (which is a small fraction of a GB) is rounded up to the next unit (GB here, therefore 1GB) all files with 1GB and less (but at least 1c) are shown.

I hope this helps.

bakunin

bodisha · February 16, 2017, 7:41pm

don cragun:

Hi RudiC,
I don't understand your comments on this issue. The command in this case is:
find /home -type f -user database -size 1G -ls
which is looking for regular files owned by user database that contain exactly 1073741824 bytes. I don't see that any of first three lines of the -ls output provided by the above find command meet that criteria.

I agree that if the command had been:
find /home -type f -user database -size -1G -ls
then the output shown might be expected. But with 1G as the -size primary's argument (not -1G ), I don't understand the output shown.

Thanks for the reply!

The fact the 1st three lines appear (Bash startup scripts) but don't meet the criteria of my find command when I explicitly using the -size 1G test is why I'm posting. I would expect with the find criteria I'm using for ONLY the large1.log to show up. I'm trying to figure out why the Bash startup scripts are appearing when they shouldn't be.

Don_Cragun · February 16, 2017, 8:05pm

Hi bakunin,
No. When no units are specified, such as with -size 2 , it is looking for a file that has a size that fits in 2 512-byte blocks which corresponds to a file with a file size that is 513 through 1024 bytes. But when units are specified, an unsigned number is looking for a file with the exact size specified (at least with a BSD-based find utility which is also used on macOS systems). Note that the POSIX standard's find utility's -size primary does not include a units modifier except c (which specifies that the number is counting bytes instead of 512-byte blocks); it just has negative numbers (meaning less than number), unsigned numbers (meaning exactly that number), and positive numbers (with a leading + meaning more than number).

If some other system's find utility treats unit modifiers as block size multipliers instead of just numbers of bytes, that difference in behavior from BSD might be a reason why POSIX hasn't standardized modifiers other than c .

Hi bodisha,
What operating system are you using?

Don_Cragun · February 16, 2017, 8:17pm

Hi bodisha,
Guessing that the find that you're using behaves differently than the macOS/BSD find utility I'm using and that you really do only want to select files that are exactly of size 1G bytes, try:

find /home -type f -user database -size 1073741824c -ls

If you're looking for files that are at least 1G bytes, try:

find /home -type f -user database -size +1073741823c -ls

bodisha · February 16, 2017, 10:54pm

Centos 7 3.10.0-327.el7.x86_64. I've got multiple instances running on both VMware & VirtualBox. I've just tested it on a different Centos 7 guest and I get similar results when I use the exact "-size" option and it returns files that don't qualify with the criteria I specified. P.S. thanks for the assistance!

---------- Post updated at 09:54 PM ---------- Previous update was at 07:37 PM ----------

don cragun:

Hi bodisha,
Guessing that the find that you're using behaves differently than the macOS/BSD find utility I'm using and that you really do only want to select files that are exactly of size 1G bytes, try:
find /home -type f -user database -size 1073741824c -ls
If you're looking for files that are at least 1G bytes, try:
find /home -type f -user database -size +1073741823c -ls

Here's a screenshot of the problem. It's the same version of Centos 7 but on a different laptop. As you can see when I use the "-size 1M" option I get more files that expected. When I use the "-size 1000k" I get the results expected.

Don_Cragun · February 16, 2017, 11:15pm

I have no idea what is going on with the Centos (presumably GNU) find utility. With BSD and macOS find , -size 1024k and -size 1M should produce identical results.

bakunin · February 17, 2017, 2:06am

@Don Cragun: thanks for the explanation.

@bodisha: your posted screen shot suggests that the file is indeed 1000k (which is NOT 1MB, 1024k=1M) in size, which might add to the confusion.

After Don so concisely refuted my explanation attempt i am at a loss myself for what is going on.

I hope this helps.

bakunin

durden_tyler · February 17, 2017, 3:36am

I am using Debian 8 "Jessie" that has GNU find 4.4.2
After running your commands on similarly sized files, here's my guess about what is happening.

When you say "-size nS", where "n" is an integer specifying "units of space" and "S" is the suffix (M, k etc.), then the find command searches for files that have a rounded up size of "nS".
That effectively means that the size of the file is > (n-1)S and <= nS.

So, as per this theory, "-size 1M" means files with "rounded up size of 1M", or sizes > 0M and <= 1M. 0M bytes = 0 bytes. Hence files with sizes 183, 2112 etc are displayed.

$ find . -type f -size 1M -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
6029492    4 -rw-r--r--   1 r2d2     r2d2          183 Feb 17 02:29 ./graphical.txt
6029493    4 -rw-r--r--   1 r2d2     r2d2         2112 Feb 17 02:29 ./strace.out
6029494    8 -rw-r--r--   1 r2d2     r2d2         4659 Feb 17 02:29 ./vmstat.out
6029495    8 -rw-r--r--   1 r2d2     r2d2         4660 Feb 17 02:30 ./vmstat1.out
$

If you say "-size 2M", then it would mean files with "rounded up size of 2M", or sizes > 1M and <= 2M. That will not display anything, since there is no file with size > 1M or 1048576 bytes and <= 2M or 2097152 bytes.

$ find . -type f -size 2M -ls
$

Similar case could be argued for size 3M.

$ find . -type f -size 3M -ls
$

Now, in case of "-size = 1000k", notice that k = 1024 bytes, so it searches for files with rounded up size of 1000k i.e. sizes > 999k or (999 * 1024 =) 1022976 bytes and <= 1000k or (1000 * 1024 =) 1024000 bytes.
That displays your one file.

$ find . -type f -size 1000k -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
$

To test this logic further, I created three files with sizes:

(1) 999k bytes - 1 byte = 1022975 bytes
(2) 999k bytes          = 1022976 bytes
(3) 999k bytes + 1 byte = 1022977 bytes

using the following commands:

perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v1.log
perl -e 'foreach (1..1)   {foreach (1..974){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v1.log

perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v2.log
perl -e 'foreach (1..1)   {foreach (1..975){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v2.log

perl -e 'foreach (1..1022){foreach (1..999){print chr(97+int(rand(26)))}; print "\n"}'  >large1_v3.log
perl -e 'foreach (1..1)   {foreach (1..976){print chr(97+int(rand(26)))}; print "\n"}' >>large1_v3.log

My pwd now looks like this:

$ ls -l
total 4024
-rw-r--r-- 1 r2d2 r2d2     183 Feb 17 02:29 graphical.txt
-rw-r--r-- 1 r2d2 r2d2 1024000 Feb 17 02:32 large1.log
-rw-r--r-- 1 r2d2 r2d2 1022975 Feb 17 02:55 large1_v1.log
-rw-r--r-- 1 r2d2 r2d2 1022976 Feb 17 02:55 large1_v2.log
-rw-r--r-- 1 r2d2 r2d2 1022977 Feb 17 02:56 large1_v3.log
-rw-r--r-- 1 r2d2 r2d2    2112 Feb 17 02:29 strace.out
-rw-r--r-- 1 r2d2 r2d2    4660 Feb 17 02:30 vmstat1.out
-rw-r--r-- 1 r2d2 r2d2    4659 Feb 17 02:29 vmstat.out
$

Now specifying "-size 1000k" should display files large1.log and large1_v3.log since they both have sizes > 1022976 (999k) and <= 1024000 (1000k)

$ find . -type f -size 1000k -ls
6029496 1000 -rw-r--r--   1 r2d2     r2d2      1024000 Feb 17 02:32 ./large1.log
6029500 1000 -rw-r--r--   1 r2d2     r2d2      1022977 Feb 17 02:56 ./large1_v3.log
$

And "-size 999k" should display files large1_v1.log and large1_v2.log since they both have sizes > 1021952 (998k) and <= 1022976 (999k)

$ find . -type f -size 999k -ls
6029497 1000 -rw-r--r--   1 r2d2     r2d2      1022975 Feb 17 02:55 ./large1_v1.log
6029498 1000 -rw-r--r--   1 r2d2     r2d2      1022976 Feb 17 02:55 ./large1_v2.log
$

##################
More tests follow:

$ 
$ # size 1k = sizes in the range (0k, 1k] or (0, 1024]
$ find . -type f -size 1k -ls
6029492    4 -rw-r--r--   1 r2d2     r2d2          183 Feb 17 02:29 ./graphical.txt
$ 
$ # size 2k = sizes in the range (1k, 2k] or (1024, 2048]
$ find . -type f -size 2k -ls
$ 
$ # size 3k = sizes in the range (2k, 3k] or (2048, 3072]
$ find . -type f -size 3k -ls
6029493    4 -rw-r--r--   1 r2d2     r2d2         2112 Feb 17 02:29 ./strace.out
$ 
$ # size 4k = sizes in the range (3k, 4k] or (3072, 4096]
$ find . -type f -size 4k -ls
$ 
$ # size 5k = sizes in the range (4k, 5k] or (4096, 5120]
$ find . -type f -size 5k -ls
6029494    8 -rw-r--r--   1 r2d2     r2d2         4659 Feb 17 02:29 ./vmstat.out
6029495    8 -rw-r--r--   1 r2d2     r2d2         4660 Feb 17 02:30 ./vmstat1.out
$ 
$

So essentially, if a file is using up:
(a) 10.3 blocks i.e. 10 blocks + a fraction of the next block, then its size is considered to be 11 blocks
(b) 4k blocks + a fraction of the next 1k block, then its size is considered to be 5k blocks
(c) 2M blocks + a fraction of the next 1M block, then its size is considered to be 3M blocks

RudiC · February 17, 2017, 5:35am

@durden_tyler: Thanks, this is exactly (admittedly not that detailed) what I found when testing with my find (GNU findutils) 4.7.0-git on linux, hence my highlighting of the "rounding up to unit size" in the man page citation (commented by Don Cragun in post#3).
I see that with other versions on other systems, the size test is handled differently. In FreeBSD, for instance, rounding is done only for 512 byte blocks, and 1k means exactly 1024 bytes, -1k includes 1023 bytes but excludes 1024, +1k shows 1025 but not 1024.

bakunin · February 17, 2017, 7:30pm

Now, this is funny - this is eactly what i thought to be the case, until Don said it can't be that way. The same reasoning led me to think that any file sized >0c is selected by -size 1G - because it is "rounded up" to the next full GB.

Now completely confused.

bakunin

Don_Cragun · February 17, 2017, 11:07pm

Hi Bakunin,
Don't be confused. What we see here is another case where GNU utilities and BSD utilities behave differently. (And, some UNIX systems don't offer the extension at all.) You get exactly the same behavior on BSD, Linux, and UNIX systems for:

find file... ... -size [+|-]number[c] ...

which are the -size primary argument formats required by the POSIX standards, but the behavior of:

find file... ... -size [+|-]number[k|M|G|T|F] ...

where one of the optional size multipliers is supplied is likely to give you a syntax error on some UNIX-branded systems, one of the two behaviors that we have discussed in this thread on Linux systems (and maybe on some UNIX-branded systems), and the other behavior on BSD-based systems and at least one UNIX-branded system.

bodisha · February 20, 2017, 2:38pm

I'd like to thank everyone for their help figuring this out. I can't say I understand why 'find' is working the way it does using specific tests using Mb & Gb... But If it seems to work ok if I break the size down a bit. So I'll just have to remember that.