I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script.
So the command
ls -l>mylist.txt
gives me the file listing in a file, and the command
awk < mylist.txt '{if ($5>75000000) print $5 " " $NF}'
gives me a list of all the files that are larger that the size with their sizes, and the command
wc -l myfile.txt
gives me the number of lines in the file (call it 50000), and if I manually put them together, the command
split -l 25000 myfile.txt myfile.txt
gives me two files, myfile.txtaa and myfile.txtab, each with 25000 lines.
The problem is how to get them together in one script....
Thank you.
Thanks, but I already know how to use split -l --what I'm looking for is how to get the size of the file and pass the number of lines (or half the number of lines) to the split -l command in the script.
I was thinking this might work, but it doesn't:
awk < mylist.txt '{if ($5>78000000) "split -l $5/2 $NF $NF"; rm $NF}'
So I need to know how to get the $5 and $NF values into the shell script so I can run it....
Thank you.
This may help get you started:
#!/bin/ksh
#
#
# declare an array and populate it with files larger
# than 2GBs in the current directory
set -A files $(find . -maxdepth 1 -size +2000000 -type f | sed 's/\.\///')
# set counter
counter=0
# get number of files in the array
numfiles=${#files[*]}
# set linecount
linecount=0
# set number of lines
numlines=0
# iterate through the array files, retrieve the line count,
# divide it by 2 and feed everything to the split command
while [ $counter -lt $numfiles ]
do
linecount=$(wc -l ${files[$counter]} | awk '{print $1}')
numlines=$(expr $linecount / 2)
split -l $numlines ${files[$counter]} ${files[$counter]}
((counter=$counter+1))
done
# done
exit 0
1 Like
Thank you. However, when I run the script, I get the error
find: bad option -maxdepth
What does the -maxdepth option do? I don't see it when I man find, but there is a -depth option.
Thanks again.
---------- Post updated at 04:36 PM ---------- Previous update was at 03:36 PM ----------
I think what -maxdepth 1 is supposed to do is to keep find from searching sub-directories. If this is the case, the version of find on the version of Unix that I am using does not have that option. At any rate, I was able to remove the -maxdepth parameter and the script is working. Thank you, thank you, thank you!
Why not just use -n 2 with splt?
split -n 2 myfile.txt myfile.txt
I guess because the man page for split on my version of Unix doesn't list an option of -n for split. It would be more convenient, but the option isn't there.
Thanks anyway.