This is the first time I have posted to this forum so please bear with me. Thanks also advance for any help or guidance.
For a project I need to do the following.
There are multiple files in multiple locations so I need to find them and the location. So I had planned to use
cd LOCATION;
find . -name "FILENAME.TXT" -type f -print > $HOME/list_of_locations.txt
this gives my paths in this format ./dir1/dir2/dir3/FILENAME.txt
Each one of these files is of a different format and the only way to work out the different format is to count the number of occurances of the "|" string in each file.
I can either use head -l to take first row and count the number of occurences of the "|" character or else grep the "|" in all rows and divide by the wc -l (number of lines). My preference is on the most efficient.
I want to produce a new file listing the full path and the number of occurrences of the "|" character so then I can process the .txt file later. If the number of occurences can somehow be concatenated onto the list_of_locations.txt in 1 or else a new file created with this information.
So what I am asking:
Is there a quick way of doing this?
Using find . -name is very slow - but looks like there is no other way as I am doing a recursive search across subdirectories.
Is there a better way to interogate my .txt file to find out how many "|" characters there are?
Is there a better way to put all of this into a UNIX script?
Thanks in advance for any help you can give either code snippit or advice.
This will look into the list of locations for the filename(s) you specified and print them out, separated by a "0"- char. xargs will collect them all and run awk on this list. awk will open each file, and print full path and field count from the first line. Redirect as desired.
As I am not aware of how to skip the remainder of the file and go on to the next one, there is some optimization potential. Trials with close("-") right after the print statement showed a little improvement in execution time, but I'm not sure if it does the right thing. EDIT: It does not; returns -1 error code.
Anybody out there knowing about skipping to the next file in awk's argument list?
RudiC's suggestion is close, but misses on a couple of points. Since no pathname operands are given to awk, all of the filenames printed by awk will be an empty string. And, if there are x field separators on a line, there are x+1 fields.
The -print0 find primary and the -0 option to xargs are not defined by the standards, so they might not be available on your implementation.
A portable way to do what I believe was requested is:
Some implementations of awk have a nextfile statement (like next, but while next restarts processing on the next line, nextfile restarts processing on the first line of the next file). If your awk has this non-standard extension, the following will be much more efficient for long input files:
-------------------------------
Note that the comment I made about Rudi's proposal not printing pathnames is totally bogus. The xargs utility will add the pathname operand to awk as it invokes awk. :o
At least with the combination of find and awk implemented on my linux system, there's a full path listing avalable, including filenames containing spaces:
Hi Rudi,
Yes, but note that by skipping the -print (or -print0) and the invocation of xargs, awk is still given the full pathname as an operand (even if there are spaces, tabs, or newlines included in the pathname).
Agreed. But it wasn't what Charlie6742 asked for.
Not surprising since what you timed runs awk once for each input file.
But note that I specified:
find . . . -exec awk -F\| '. . .' {} +
not:
find . . . -exec awk -F\| '. . .' {} \;
With the + instead of the \; find shouldn't execute awk any more times than xargs would and we avoid needing to start xargs at all.
Thanks guys. I have played with all the methods you suggested but it does not seem to give me any output. It works without errors - but just doesn't give output. I should have said I am using the bash shell - could some of these commands not be working properly on my setup? Is there a way I can set it up so it works as you have it.
If it helps - this is the message it gives me for one of the options that doesn't work.
What system are you using? If it is a Solaris system, try using /usr/xpg4/bin/awk or nawk instead of /usr/bin/awk.
If it is a Solaris system, there is also a good chance that nextfile isn't supported, but that should have generated a clearer error message. Have you tried the other form:
Hi Guys, thanks again for swift responses on this:). I managed to solve my problem - so thought I would share - feel free to comment. Also have another question. So just for everyone's benefit in the code below I am
Doing recursive find of the .txt files
e.g. /folder1/folder2/folder3/a.txt
Pulling out the first row of each of the .txt files
e.g. from a.txt take only first row abc|def|efg
Count number of | strings into temp2 variable
Count number of / strings from the path name into temp3
I only want to output paths which have a set number of / strings (only 9 and 10)
SO HERE'S THE NEW QUESTION:
so for the check on the number of / string I have the OR clause but what actually happens is that it finds paths which have 10 strings it it and outputs them but does not look at any with 9 in them. equally if I swap the code around it find the paths with 9 in them but does not look at any with 10. Is there a better way to do this other than split up the if statement?
find . -name .snapshot -prune -o -name "*.txt" -print|while read i
do
temp1=`grep "|" "${i}"|head -1`
temp2=`echo ${temp1} | awk -F"|" '{c += NF - 1} END {print c}'`
temp3=`echo ${i} | awk -F"/" '{c += NF - 1} END {print c}'`
echo $temp3
if [ "$temp3" = "10" -o "&temp3" = "9" ]; then
echo "${i} , ${temp3} , ${temp2}"
echo
fi
done
Does your find provide the -mindepth and -maxdepth options? That would help in the first place. If it does not, why don't you filter find 's output before reading it into the i variable?