Split file based on file size in Korn script

ssemple2000 · June 25, 2012, 2:23pm

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script.

So the command

ls -l>mylist.txt

gives me the file listing in a file, and the command

awk < mylist.txt '{if ($5>75000000) print $5 " " $NF}'

gives me a list of all the files that are larger that the size with their sizes, and the command

wc -l myfile.txt

gives me the number of lines in the file (call it 50000), and if I manually put them together, the command

split -l 25000 myfile.txt myfile.txt

gives me two files, myfile.txtaa and myfile.txtab, each with 25000 lines.

The problem is how to get them together in one script....
Thank you.

ctsgnb · June 25, 2012, 2:53pm

man split

see -l option

ssemple2000 · June 25, 2012, 3:18pm

Thanks, but I already know how to use split -l --what I'm looking for is how to get the size of the file and pass the number of lines (or half the number of lines) to the split -l command in the script.

I was thinking this might work, but it doesn't:
awk < mylist.txt '{if ($5>78000000) "split -l $5/2 $NF $NF"; rm $NF}'

So I need to know how to get the $5 and $NF values into the shell script so I can run it....

Thank you.

in2nix4life · June 25, 2012, 4:14pm

This may help get you started:

#!/bin/ksh
#
#

# declare an array and populate it with files larger
# than 2GBs in the current directory
set -A files $(find . -maxdepth 1 -size +2000000 -type f | sed 's/\.\///')

# set counter
counter=0

# get number of files in the array
numfiles=${#files[*]}

# set linecount
linecount=0

# set number of lines
numlines=0

# iterate through the array files, retrieve the line count,
# divide it by 2 and feed everything to the split command
while [ $counter -lt $numfiles ]
do
    linecount=$(wc -l ${files[$counter]} | awk '{print $1}')
    numlines=$(expr $linecount / 2)
    split -l $numlines ${files[$counter]} ${files[$counter]}
    ((counter=$counter+1))
done

# done
exit 0

ssemple2000 · June 25, 2012, 5:36pm

Thank you. However, when I run the script, I get the error
find: bad option -maxdepth
What does the -maxdepth option do? I don't see it when I man find, but there is a -depth option.

Thanks again.

---------- Post updated at 04:36 PM ---------- Previous update was at 03:36 PM ----------

I think what -maxdepth 1 is supposed to do is to keep find from searching sub-directories. If this is the case, the version of find on the version of Unix that I am using does not have that option. At any rate, I was able to remove the -maxdepth parameter and the script is working. Thank you, thank you, thank you!

Chubler_XL · June 25, 2012, 6:00pm

Why not just use -n 2 with splt?

split -n 2 myfile.txt myfile.txt

ssemple2000 · June 25, 2012, 6:20pm

I guess because the man page for split on my version of Unix doesn't list an option of -n for split. It would be more convenient, but the option isn't there.

Thanks anyway.