Counting the number of pipes in line

nirnay_s · October 17, 2008, 2:33pm

Hi,

I'm using the ksh shell.

The scenario:

I have a couple of directories

/home/fd
/home/fd/prsd
home/fd/stg

now i have number of files in each of these directories.

some of the files are zipped using gzip so their extension is .gz

the content of the files is as follows

D|abc|1324|ba92|adfds||324 -1 1 | | bcd |||||
D|as|cdsa|235|as|gf=12|sdf34|$||||sas|a5#|

Basically I mean each record is in a newline and is delimeted with the pipe.

All the files have the same structure(remember some are zipped)

and there can be anywhere between 40K and 200K records in a file

So if you count the pipes in the record(line) we have 13 pipes in each line(record). All the files will have the 13 pipes.

But i have one file which has 14 pipes instead of 13(means each record in that file will have 14 pipes)

and I need to find this fiel with 14 pipes.........

It may be in either of these directories and may or maynot be zipped...

Please help me out.....

Many Thanks

treesloth · October 17, 2008, 5:41pm

I hope you don't mind a quick answer that can certainly be improved. I'm afraid I've been a bit unwell, so please pardon the rather brutish nature of the script, and the tcsh. Hopefully, since this can be placed in its own script file, the particular shell won't matter. First, note this relationship:

(# of | in line) = (number of fields in line) -1

So, for example, there are 3 fields and 2 pipes in:

a|b|c

So, simply expect values 1 higher than the terms in which you expressed your problem; a metric boatload of files with 14 fields, and 1 with 15.

#!/bin/tcsh

foreach file ( `ls | grep -v fieldcounter` )
set ext = `echo $file | awk -F. '{print $NF}'`
echo -n $file
if ( $ext == "gz" ) then
     set fields = `zcat $file | head -n 1 | awk -F\| '{print NF}'`
else
     set fields = `cat $file | head -n 1 | awk -F\| '{print NF}'`
endif
echo " has fieldcount:  $fields"
end

I named this script "fieldcounter". You can call it whatever you like, but make sure to change "fieldcounter" in the foreach line accordingly. There are a couple of assumptions. First, I assume that everything in the directory is a file (as opposed to a directory, say); second, that you want to examine everything in the directory; third, that the first line of each file is just data, and isn't a special, differently-formatted header line. Each of these assumptions is easy enough to change and account for, but they are there nonetheless.

So, give it a shot... if there are, in fact, errors in my assumptions, or some other factor, let us know. I'm certain we can piece something together that'll work.

Last of all, this is written as an infrequently-applied solution. If you're going to do this often, something faster would be beneficial.

avis1981 · October 18, 2008, 12:12am

One more option

sed 's/[^|]//g' <filename>|wc -c

you need to add logic to loop through multiple files, and if the above command gives 14, you can print the file name.

vidyadhar85 · October 18, 2008, 11:49pm

this will give the all the file name which are having 14 pipes in that(from current dir only)

for i in *
do
while read line
do
echo "$line"|awk -F\| -v v=$i 'END{if(NF==14){print v}}'
break
done < "$i"
done

system · October 19, 2008, 1:52am

14 pipes would be 15 fields (NF=15). This is simpler:

awk -F'|' 'NF==15 { print "Found:", FILENAME; nextfile }' *