No. of Fields using Awk

pchegoor · July 31, 2011, 3:20am

Hi,
Could someone please let me know me how i can use an awk command to print the No. of fields of each file present in a Directory?Suppose the Directory has 5 text files in which the first record in each file contains fields separated by a '|'. I need to use a awk command to display the No of fields for each of these txt files as follows:

Name1.txt 30
Name2.txt 12
Name3.txt 10
Name4.txt 21
Name5.txt 0

Also some of these txt files may be empty and i need to print '0' for the No. of fields in this case as shown above. Is this Possible.Please suggest.

Thanks .

yazu · July 31, 2011, 3:28am

Try:

awk -F'|'  'FNR == 1 { print FILENAME, NF}'  DIRECTORY/*

pchegoor · July 31, 2011, 3:41am

Thanks Yazu. Your command worked well except it does not print the filename.txt and '0' for a file which is empty.Is there any way to get this?

yazu · July 31, 2011, 4:09am

From GNU awk manual:
"All known awk implementations silently skip over zero-length files. This is a by-product of awk's implicit read-a-record-and-match-against-the-rules loop: when awk tries to read a record from an empty file, it immediately receives an end of file indication, closes the file, and proceeds on to the next command-line data file, without executing any user-level awk program code."

GNU awk can deal with this but it's very specific. I believe it's better to write a shell script.

itkamaraj · July 31, 2011, 4:42am

for i in *.txt; do wc -l $i; done

mirni · July 31, 2011, 5:15am

tha will print the number of lines, not the number of fields, as the OP wanted.

You could do with awk, with a little hack -- use a shell loop around it and getline in BEGIN section, that is before the processing of file starts:

for i in *.txt ; do 
  awk  -F"|" 'BEGIN{
     if(getline < "'$i'") 
       {print "'$i'", NF} 
     else print "'$i' 0"}' 
 done

Scrutinizer · July 31, 2011, 1:03pm

Try:

awk -F\| 'BEGIN{for(i=1;i<ARGC;i++){f=ARGV;print f,(getline<f)?NF:0}}' *.txt

agama · July 31, 2011, 1:37pm

I think it is important to note that Scrutinizer's solution is the most efficient as it will need to read one line from each file. Any solution that uses something like

FNR == 1 { action }

will read all lines from all files. If the files being tested are large, this could take a significant amount of time to do an awful lot of unneeded I/O.

The only suggestion I'd make is to close the file after getting the first line. If there are a lot of files in the parameter list, not closing the file after use could result in exceeding the max open file descriptors limit for the process.

awk -F\| 'BEGIN{for(i=1;i<ARGC;i++){f=ARGV;print f,(getline<f)?NF:0; close(f);}}' *.txt

pchegoor · July 31, 2011, 7:24pm

Thanks Scrutinizer. This was a very very helpful awk command (or rather a combination of commands).

---------- Post updated at 06:24 PM ---------- Previous update was at 06:22 PM ----------

Thanks Agama for the helpful addition.