Help with command to Move files by X number to seperate directories

Hello,

I need help finding a script that will allow me to move files from one directory to another directory 10k files at a time.

I have a directory that has 100 K files in it. I need to have those 100k files broken apart to separate directories each with 10k files in them.

Here is the diagram of what i'm trying to do.

Dir1
100000 files

Needs to be broken apart every 10000 files and those need to be in a new directory

NEWDIRA = 10000 files
NEWDIRB = 10000 files

If anyone knows of a command that would allow this to be run that would help greatly. Thanks

Geo_Bean

The most important thing to take into account is to avoid using a for-loop: you would run into an "argument list too long"-error. I have no directory with that many files at hand to test it, but the following should work (i still can't tell you about execution time, test it carefully):

#! /bin/ksh
typeset iCnt=0
typeset fSrcDir="/path/to/source-dir"
typeset fTgtDir="/path/to/target-dir"
typeset iPart=1
typeset fFile=""

ls "$fSrcDir" | while read fFile ; do
     mv "$fSrcDir/$fFile" "${fTgtDir}/Part${iPart}"
     if [ $? -gt 0 ] ; then
          exit $?
     fi
     (( iCnt += 1 ))
     if [ $iCnt -ge 10000 ] ; then
          iCnt=0
          (( iPart += 1 ))
     fi
done

exit 0

I hope this helps.

bakunin

#!/bin/ksh
cd dirA
ls >../list
x=1
y=1
mkdir ../dir$y
while read file
do
   mv $file ../dir$y
   x=`expr $x + 1` 
   if [ $x -gt 10000 ]
   then
       x=1
       y=`expr $y + 1`
       mkdir ../dir$y
   fi
done < ../list

---------- Post updated at 09:16 AM ---------- Previous update was at 09:07 AM ----------

Bakunin,
No fair, I saw it first, but had to go answer the phone.

Interesting that we both had the same solution, but totally different coding style.

Jack

I like above solutions much better than mine, but I want to show my solution anyway, just to show that we have many solutions.

#!/usr/bin/ksh
input_dir="/home/temp/in"
output_dir="/home/temp/out"
amount_files_input=`ls -1 $input_dir | wc -l`

if [ "$amount_files_input" -ge 100000 ] ; then
   for i in {1..10000}
     do
        file_to_move=`ls -1 $input_dir | tail -1`
        mv $input_dir/$file_to_move $output_dir
   done
fi

and of course I'm a newbie still :frowning:

what if in a folder we have thousands of files ,lets say for past 2 years...

how can i search files of 2009 and just move them to a separate folder ?

date format being : 2009-03-25....

can 'xargs' be used here? if yes,how ?

Since this is presumably a one time job. I would use something like:
ls -ltr >filelist
Then edit filelist and remove all the file names that you do not want to move.
Then write a script to read the edited file and move the files.

---------- Post updated at 11:20 AM ---------- Previous update was at 11:01 AM ----------

Yes it will work with less than 250 entries in the directory, but....
With thousands of entries in the directory, ls, might well take significant time to execute.
I tried the "ls -1 $input_dir |tail -1" line on a directory with 168000 files.
Real time was 0.50 seconds*. You execute this command 10000 times

*on a dual processor quad core system with serial SCSI RAID10

It's probably best to start your own thread for this question. Aside from having to deal with a lot of files, it really has nothing to do with the original poster's problem and will only muddle the discussion, in my opinion.

Regards,
Alister

---------- Post updated at 04:48 PM ---------- Previous update was at 11:25 AM ----------

Hello, bakunin:

That's actually bad advice in general (there is absolutely no need to avoid using a for loop because the list may be very very long), and it's terrible advice in this particular case, where a glob in a for loop's list is by far the simplest and safest way to handle a directory of files (field splitting is not a concern since the glob is expanded during the penultimate step in shell command line processing, only quote removal follows it).

Your warning regarding "argument list too long" scenarios does not apply to a shell expanding a wildcard, which is done internally and does not require an exec system call. Nor does it apply to the for loop since that is also internal. Nor does it apply to any commands within the for loop since they are fed the list items one at a time. For more info regarding ARGMAX issues, The maximum length of arguments for a new process may be helpful.

To test for yourself, you can execute the following (if your system has jot, if not perhaps you can tweak it to use seq, or even brace expansion):

$ for f in $(jot -w '%0100d' 100000); do touch "$f"; done
$ for f in *; do :; done

That will create 100,000 files, each with a 100 character filename, and then run a do-nothing loop which nevertheless has to expand the * wildcard.

Regarding your solution, there are some caveats: it will not properly handle any files which contain leading whitespace, embedded newlines, or a trailing backslash. The whitespace and trailing backslash can be fixed by tweaking IFS and using read's -r option. The embedded newline however cannot be worked around (at least not with any posix-compliant functionality in ls/read that I'm aware of).

On an unrelated note, the following idiom will always return a 0 exit status, since when the exit command executes, the value of $? is the exit status of the [ command, which must have succeeded if the exit has been reached.

A simple, correct way would be:

mv "$fSrcDir/$fFile" "${fTgtDir}/Part${iPart}" || exit

exit will only execute if mv fails, and it will return mv's exit status (the last command run).

The solution you provided also assumes that all the target directories exist, since there is no mkdir anywhere (that may be intentional, I'm just pointing it out to save the original poster some time :wink:

Please don't take the above criticisms personally; they are intended to be helpful. If my analysis is erroneous, I would appreciate being corrected.

Regards,
Alister

---------- Post updated at 04:57 PM ---------- Previous update was at 04:48 PM ----------

Hello, jgt:

This solution is broken with regard to whitespace. $file in the mv should be double-quoted. There are also problems with the way read is used, which will mangle filenames with leading whitespace, embedded newlines, and a trailing backlash.

I mention it only in case the original poster's monster directory has some susceptible filenames.

Regards,
Alister

---------- Post updated at 05:23 PM ---------- Previous update was at 04:57 PM ----------

My attempt at a solution. It should handle any filenames without issue. The only downside is that it must execute mv once per file to move. However, I'll take the performance hit over possible breakage when mv'ing 100K files one at a time only takes about 5 minutes on a 3 yr old laptop with a slow drive. It operates on the current working directory and also creates the necessary destination directories (001, 002, ...) in the current working directory.

#!/bin/sh

i=0
for f in *; do
    if [ $((i%10000)) -eq 0 ]; then
        dest=$(printf '%03u' $((i/10000+1)))
        mkdir $dest || exit 2
    fi
    mv "$f" $dest || exit 1
    : $((++i))
done

Creating 100,000 files with 100 character long filenames:

$ for f in $(jot -w '%0100d' 100000); do touch "$f"; done

A test run, followed by a some simple checks:

$ ../mv10k.sh 

$ ls
001     002     003     004     005     006     007     008     009     010

$ for d in *; do echo $d: $(ls $d | wc -l) files; ls $d | head -n2; echo ...snip...; ls $d | tail -n2; echo =================; done
001: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000009999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000
=================
002: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000019999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000
=================
003: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000029999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030000
=================
004: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000039999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000
=================
005: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000049999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000050000
=================
006: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000050001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000050002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000059999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060000
=================
007: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000069999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000070000
=================
008: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000070001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000070002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000079999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000080000
=================
009: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000080001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000080002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000089999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000090000
=================
010: 10000 files
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000090001
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000090002
...snip...
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000099999
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000
=================

Regards,
Alister

Only a Microsoft person would create a file with a space in he file name.:smiley:

I think that is the wrong attitude: as long as something is *legal* it should be covered by a script using it - or at least be clearly stated as limitation. This attitude of "nobody would in his right mind would ever ..." is what accounts for about 100% of all the injection-type problems with web interfaces. The programmers of said interfacess simply didn't believe "anybody in his right mind" would ever enter SQL-code instead of data somewhere.

You are right about the glob expansion and field splitting but the problem but wrong about the line length limitation. I might have temed the problem not quite correctly with "arguments too loong", though. There is a maximum line length of an input line a shell can handle. The glob will expand before the shell even attempts to execute the command and so you might eventually hit the maximum line length of the shell. I have tested it (not using jot, though, as i don't have that tool) on AIX 5.x and some CentOS-system (don't know which kernel version, it was a 32-bit system) and the results with Korn Shell 88 are consistent (and as i expected).

The same phenomenon can be observed when using a command like "ls *" or "rm *" against a directory that big: the expansion of the glob will create an effective command line too long for the shell to digest.

This is correct for the most part of it. My script was suggested as a sketch towards a solution, not as a production-strength application. I should have said that, though, so your critizism is legitimate. The same goes for the directories not being created as they are filled.

This is correct, an error on my part. I should have catched the return code in a variable, like i do it in my scripts usually.

You are welcome and no offense taken. It is the spirit of the board to learn collectively one from another and I'm neither perfect nor sacrosanct. In fact i welcome the possibility to gain insight from discussions like this and it is one prominent reason for me writing here.

bakunin