arg list too long error

Hello,

I'm trying to search through 30,000 files in 1 directory, and am getting the "arg list too long" error. I've searched this forum and have been playing around with xargs and can't get that to work either. I'm using ksh on Solaris.

Here's my original code:

nawk "/Nov 21/{_=2}_&&_--" $LOGDIR/*.log > $NEWLOGFILE

Error:

ksh: /usr/bin/nawk: arg list too long

Any help is appreciated!

One way (untested). I assume you want each hit to appear in ${NEWLOGFILE}.
The brackets "()" are important.

(
find "${LOGDIR}" -type f -name \*\.log -print | while read FILENAME
do
           nawk "/Nov 21/{_=2}_&&_--" "${FILENAME}"
done
) 2>&1 > "${NEWLOGFILE}"
ls $LOGDIR/*.log |xargs nawk "/Nov 21/{_=2}_&&_--" > $NEWLOGFILE

Thanks for your responses!

rdcwayx - I got the same error:

/usr/bin/ls: arg list too long

methyl - I got this to work, however it runs very slow. Is there a way this would work as a one liner and not in a loop? I don't know if that would speed it up though since it's looking through 30,000 files.

(
find "${LOGDIR}" -type f -name \*\.log -print | while read FILENAME
do
           nawk "/Nov 21/{_=2}_&&_--" "${FILENAME}"
done
) 2>&1 >> $NEWLOGFILE

No.

It is serial - one file at a time

cnt=0
find "${LOGDIR}" -type f -name \*\.log -print | 
while read FILENAME
do
  nawk "/Nov 21/{_=2}_&&_--" "${FILENAME}" >> $NEWLOGFILE &
  cnt=$(( cnt + 1 ))
  if [ $(( $CNT % 10  )) -eq 0 ] ; then
   wait
  fi
done

This runs 10 files at one time.

Try this:

echo "" > $NEWLOGFILE
find $LOGDIR -name "*.log" -type f |xargs nawk "/Nov 21/{_=2}_&&_--" >>$NEWLOGFILE

If you have GNU coreutils and your are worried about any of these situations:

files with space in their names
find going into sub-diretories
awk still run once with no files on command line, when no .csv files exist

You can also do:

( find . -maxdepth 1 -name "*.csv" -print0 | xargs -r0 nawk "/Nov 21/{_=2}_&&_--"  ) > $NEWFILE

Jim - I got this error:

% 10  : syntax error

Rdc - Thanks that worked this time, although it runs in the same amount of time as the other solution

Chubler - Thanks, but we don't have that installed on our server. Fortunately, we don't have to worry about any of those situations.

Please quantify in hours and minutes.

If the time is substantial we can no doubt improve but we need some disc workspace (or maybe a memory disc) and some half-decent hardware.

This is a good moment to reveal what Operating System you are running this "ksh" on and to give us some feeling for the power of the hardware.

I'm on Solaris (said that in the first post). I don't know much about the hardware other than it's on slower disks than production. I can find out more if needed. It takes about 7-8 minutes for one day's worth of data. I have to run this for 7 days at one time so it should take about 49-56 minutes to complete. I realize that since I'm searching so many files that there might not be a way to make this run faster and that's fine. Just thought I'd ask :slight_smile: I'm still fairly new to unix and scripting.

I appreciate your help and solution, thanks again.

Try this

ls ${LOGDIR} | grep \.log$ | xargs nawk "/Nov 21/{_=2}_&&_--" >>$NEWLOGFILE

Sorry to be pedantic but Solaris is brand name which encompasses many Operating Systems:
Solaris (operating system) - Wikipedia, the free encyclopedia
Please post the output from "uname -a" blacking anything confidential like server names.

First impression of this task is that we are dealing with historical logs.
Knowledge of the data is paramount. If individual logs become static according to a rule we can avoid analysing logs which we have already analysed.

Agreed, another thought that comes to mind is after the "Nov 21" entries have been found in the file can we then move onto the next file straight away. For example if files are sorted by date we can ignore the rest of the file, as no more Nov 21 records will exist after the first group is found:

nawk '/Nov 21/{_=3}_==1{exit}_&&_-- '

Sorry for leaving that out methyl, it's Solaris Sparc 5.10.

The server I've been testing on has logs that were copied over and all have the same modified date. I just discovered that only some of the logs are changed daily on the production server, and not all of them like I originally thought. So I won't need to search through 30,000 files, I will only search for files that have been modified between a specific 7 day date range. Thanks for the tip Chubler, I'll test that line and see if it reduces the run time even more.

Thanks again everyone for your help.

How about...?

find $LOGDIR -name \*.log -exec  nawk "/Nov 21/{_=2}_&&_--" {} \; > $NEWLOGFILE