Shell script: returning the file with the most lines

Breakology · May 16, 2009, 3:27pm

Hey I am relatively new to Linux and shell scripting, looking for a spot of help with a script I am working on.

I am writing a script that counts the number of lines in all the files in a directory, sorts them by line number and then returns ONLY the file with the most lines.

Right now I can accomplish the first part with

wc -l|sort -rn

but I do not know how to return only the file with the most lines in the output.

cfajohnson · May 16, 2009, 3:39pm

wc -l | sort -rn | head -1

Breakology · May 17, 2009, 2:03am

Thanks!

That almost does what I need, but since the first line of the sorted list is a total line count, it displays 'Total XXX' instead of the actual file.

Is there any command I can use to pass a condition that will display the file with the most lines based on the actual number of lines rather than the position on the list? I was also considering using some PERL, but I am still new to that as well. Thanks again.

Edit:

I also found that

wc -l | sort -rn | head -2 | tail -1

will give me the result I want, just not the method I prefer.

ghostdog74 · May 17, 2009, 2:29am

if you have Python, here's an alternative solution

#!/usr/bin/env python
import os
temp=0
path=os.path.join("/home","path1","path2")
for r,d,f in os.walk(path):
    for files in f:
        i=0
        try:
            o=open( os.path.join(r,files) )        
        except Exception:pass            
        else:
            for lines in o: i=i+1 #count lines
            o.close()
            if i>=temp: temp=i; final = os.path.join(r,files)
print "File ", final, " has " ,temp ," lines"

output:

File  /home/path1/path2/file  has  13545528  lines

ghostdog74 · May 17, 2009, 2:30am

you sure that will give you results? you have not provided arguments to wc -l

cfajohnson · May 17, 2009, 3:02am

Don't use ls:

wc -l * | sort -rn | head -1

ghostdog74 · May 17, 2009, 3:11am

using wc -l * has the problem of wc'ing the directories as well... so the final output will not be accurate.

find . -type f -exec wc -l {} \;| sort -rn |head -1

however, for many files, the sort can slow down the process.

cfajohnson · May 17, 2009, 3:30am

It throws an error message, so:

wc -l * 2>/dev/null | ...

It gives a size of 0 for directories, so nothing will be out of kilter.

Breakology · May 17, 2009, 11:30am

ghostdog74:

if you have Python, here's an alternative solution

#!/usr/bin/env python
import os
temp=0
path=os.path.join("/home","path1","path2")
for r,d,f in os.walk(path):
   for files in f:
   i=0
   try:
   o=open( os.path.join(r,files) )        
   except Exception:pass            
   else:
   for lines in o: i=i+1 #count lines
   o.close()
   if i>=temp: temp=i; final = os.path.join(r,files)
print "File ", final, " has " ,temp ," lines"

output:

File  /home/path1/path2/file  has  13545528  lines

I think this is what I need I think, but written as a bash script or perl. The end result is that I need to be able to feed it a directory, and have it output the file with the most lines.
I was trying to do it with a sort, but I don't think that is enough.

Breakology · May 17, 2009, 11:37am

What is the '2' doing in that command? is it switch for the wc command?

ghostdog74 · May 17, 2009, 11:40am

if you need to input parameters,

import os,sys
temp=0
path =sys.argv[1]
for r,d,f in os.walk(path):
    for files in f:
        i=0
        try:
            o=open( os.path.join(r,files) )        
        except Exception:pass            
        else:
            for lines in o: i=i+1 #count lines
            o.close()
            if i>=temp: temp=i; final = os.path.join(r,files)
print "File ", final, " has " ,temp ," lines"

on the command prompt,

# python myscript.py /home/directory

then why don't you try cfa's method.

cfajohnson · May 17, 2009, 11:54am

It redirects the standard error stream to the "bit bucket".

wc -l "$dir"/* 2>/dev/null |
 sed '$d' | ## remove last line (total)
  sort -rn | head -1