how to find files

ravi_agarwalla · March 26, 2011, 1:41pm

1st file

1 4 7
c b 8
3 6 a

2nd file

z 6 1
q g w
3 t 5

suppose i have 1000 files with 3 fields.I don't know if there is value in 4th fields out of those 1000 files. I want to print those files where there is value in the 4th fields.

---------- Post updated at 12:41 PM ---------- Previous update was at 12:18 PM ----------

Can anyone help me with the requirement

sk1418 · March 26, 2011, 2:01pm

suppose that all your files are named by *.txt. And fields delimiter is space " ".

cd yourDir
grep -El ".*\s.*\s.*\s.*" *.txt

will list all the filenames with 4 columns.

ravi_agarwalla · March 26, 2011, 2:23pm

can you please explain how the script works

sk1418 · March 26, 2011, 2:26pm

go through all the files, lookin for if there is one line contains the 4th field. if found one, print only the file name.

ravi_agarwalla · March 26, 2011, 2:30pm

i am not able to understand what
-El ".*\s.*\s.*\s.*" means

sk1418 · March 26, 2011, 2:33pm

-E, --extended-regexp
              Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

 -l, --files-with-matches
              Suppress  normal  output; instead print the name of each input file from which output would normally have been printed.
              The scanning will stop on the first match.  (-l is specified by POSIX.)
 ".*\s.*\s.*\s.*" is regular expression to find out the pattern with 4 fields. in your case.

if you read man page of grep command, you can get all those info.

cgkmal · March 26, 2011, 6:07pm

Another possible option, assuming you look for txt files:

files=$(find . -type f -name '*.txt' | awk -F"/" '{print $NF}')

for i in $files
do
awk 'NF>3{print "Files with more than 3 columns-->",FILENAME}' $i | uniq
done

Or a faster code than my first reply:

files=$(find . -type f -name '*.txt' | awk -F"/" '{print $NF}')

for i in $files
do
awk 'NF>3{print "File with at least one line with " NR " columns-->",FILENAME;exit 1}' $i 
done

Regards

drl · March 26, 2011, 8:37pm

Hi.

Similarly:

#!/usr/bin/env bash

# @(#) s1	Demonstrate identifying lines with more than N fields.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for i;do printf "%s" "$i";done; printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C awk

pl " Data files:"
for f in data*
do
  pe
  cat $f
done

N=3
pl " Results, looking for files with more than $N fields:"
awk -v N="$N" '
NF > 3	{ print " File", FILENAME, "has more than 3 fields.";nextfile}
' data*

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.7 (lenny) 
GNU bash 3.2.39
GNU Awk 3.1.5

-----
 Data files:

1 4 7
c b 8
3 6 a b

z 6 1
q g w
3 t 5

z 6 1
q g w f
3 t 5

1 4 7
c b 8
3 6 a

-----
 Results, looking for files with more than 3 fields:
 File data1 has more than 3 fields.
 File data3 has more than 3 fields.

Showing that awk itself can deal with a number of files as parameters ... cheers, drl