AWK print number of records, divide this number

programmerc · June 28, 2012, 5:48pm

I would like to print the number of records of 2 files, and divide the two numbers

awk '{print NR}' file1 > output1
awk '{print NR}' file2 > output2
paste output1 output2 > output
awl '{print $1/$2}' output > output_2

is there a faster way?

Corona688 · June 28, 2012, 6:26pm

That wouldn't do what you think it does, since a file with 5 lines would print

How about:

awk 'NR==1{F=FILENAME}; FILENAME != F { A[++FNUM]=LNR; F=FILENAME } { LNR=FNR } END { A[++FNUM]=LNR; print A[1]/A[2] }' file1 file2

drl · June 28, 2012, 10:08pm

Hi.

With some help from bash:

#!/usr/bin/env bash

# @(#) s1	Demonstrate process substitution, awk arithmetic.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

pl " Input data files data1 data2:"
paste data1 data2

pl " Results:"
awk -v n1=$(wc -l <data1) -v n2=$(wc -l <data2) 'BEGIN	{print n1/n2; exit}'

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data files data1 data2:
foo	grault
bar	garble
baz	warg
qux	fred
quux	plugh
corge	xyzzy
	thud

-----
 Results:
0.857143

See man pages for details.

Best wishes .... cheers, drl

max_hammer · June 29, 2012, 1:32am

 
#!/bin/ksh
record_count_1=`wc -l < file1`
record_count_2=`wc -l < file2`
 
((result= ${record_count_1} / ${record_count_2} ))

drl · June 30, 2012, 6:47am

Hi.

The experience I had with ksh suggests that I needed to declare variables to be other than default types to obtain useful results:

#!/usr/bin/env ksh

# @(#) user1	Demonstrate ksh (( arithmetic )), typeset.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C 

pl " Input data files data1 data2:"
paste data1 data2

# record_count_1=`wc -l < file1`
# record_count_2=`wc -l < file2`
record_count_1=`wc -l < data1`
record_count_2=`wc -l < data2`
 
pl " Results of $record_count_1 / $record_count_2:"
((result=${record_count_1} / ${record_count_2} ))
printf "%d\n" "$result"
printf "%f\n" "$result"

typeset -F3 t1 t2 ratio
t1=$record_count_1
t2=$record_count_2

pl " Results of $t1 / $t2:"
((ratio=$t1 / $t2))
printf "%f\n" "$ratio"
printf "%s\n" "$ratio"

exit 0

producing:

% ./user1 

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
ksh 93s+

-----
 Input data files data1 data2:
foo	grault
bar	garble
baz	warg
qux	fred
quux	plugh
corge	xyzzy
	thud

-----
 Results of 6 / 7:
0
0.000000

-----
 Results of 6.000 / 7.000:
0.857000
0.857

See man ksh for details.

Best wishes ... cheers, drl

Franklin52 · June 30, 2012, 7:13am

Another one:

awk 'NR==FNR{n=NR}END{print n/FNR}' file1 file2

drl · June 30, 2012, 7:36am

Hi.

This variation may be slightly faster in real time:

#!/usr/bin/env bash

# @(#) s2    Demonstrate process substitution, awk arithmetic.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

pl " Input data files data1 data2:"
paste data1 data2

pl " Results:"
n1=$(wc -l <data1 &)
n2=$(wc -l <data2 &)
wait

awk -v n1="$n1" -v n2="$n2" 'BEGIN    {print n1/n2; exit}'

exit 0

producing:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data files data1 data2:
foo	grault
bar	garble
baz	warg
qux	fred
quux	plugh
corge	xyzzy
	thud


-----
 Results:
0.857143

The 2 counting processes are done simultaneously in the background ( as far as can be done in parallel on any particular system ), but the same computational time is used ( possibly adding a bit for overhead ).

Best wishes ... cheers, drl

elixir_sinari · June 30, 2012, 10:51am

awk 'FNR<NR && FNR==1{lcnt=NR-1} END{print lcnt/FNR}' file1 file2

---------- Post updated at 09:51 AM ---------- Previous update was at 09:25 AM ----------

Actually, I hadn't seen Franklin52's code before posting...:o
Sorry for that...
Now that I've seen it, it differs slightly in that I am executing the action only once when FNR has been reset to 1 due to input file change.

drl · June 30, 2012, 10:59am

Hi.

elixir_sinari:

awk 'FNR<NR && FNR==1{lcnt=NR-1} END{print lcnt/FNR}' file1 file2
---------- Post updated at 09:51 AM ---------- Previous update was at 09:25 AM ----------

Actually, I hadn't seen Franklin52's code before posting...:o
Sorry for that...
Now that I've seen it, it differs slightly in that I am executing the action only once when FNR has been reset to 1 due to input file change.

Yes, I agree, and on long files that would save some time.

Thanks for looking at them both ... cheers, drl