awk array problem

Hi,

Im trying to count bats flying through an infrared beam array. One of the experts here helped me a few months ago but now I am having a problem that is stumping me.
here is the original code that works (with two differnt patterns in array):

# this has been changed to operate under the assumption that the 1st beam
# opens and closes before next beam opens due to spacing between beams being increased.

BEGIN { OT=0 # Time of previous measurement
		MAX=1	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 0	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA)
		{
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CA++;	AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB)
		{
			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CB++;	BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nT %2d\n", CA, CB, CA+CB) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
}

I have added two more patterns below to the array (C and D) and I need to add C to pattern A counts, and D to pattern B counts etc. how can I do this?

	# additional Patterns to check against
		# Block 1	Block 0   	unblock 1	 Unblock 0
		A[0]="1,1";	A[1]="0,1";	A[2]="1,0";	A[3]="0,0";

           	# Block 1	Unblock 1	Block 0		Unblock 0
		C[0]="1,1";	C[1]="1,0";	C[2]="0,1";	C[3]="0,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		B[0]="0,1";	B[1]="1,1";	B[2]="0,0";	B[3]="1,0";

		# Block 0 	Unblock 0	Block 1		Unblock 1
		D[0]="0,1";	D[1]="0,0";	D[2]="1,1";	D[3]="1,0";

here is a small sample of the input file:

2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,29,pv,1,1
2011,07,19,Rx,00,10,29,pv,0,1
2011,07,19,Rx,00,10,29,pv,1,0
2011,07,19,Rx,00,10,29,pv,0,0
2011,07,19,Rx,00,36,57,pv,0,1
2011,07,19,Rx,00,36,57,pv,0,0
2011,07,19,Rx,00,36,57,pv,1,1
2011,07,19,Rx,00,36,57,pv,1,0
2011,07,19,Rx,00,37,10,pv,1,1
2011,07,19,Rx,00,37,10,pv,0,1
2011,07,19,Rx,00,37,10,pv,1,0
2011,07,19,Rx,00,37,10,pv,0,0
2011,07,19,Rx,00,41,31,pv,0,1
2011,07,19,Rx,00,41,31,pv,1,1
2011,07,19,Rx,00,41,31,pv,0,0
2011,07,19,Rx,00,41,31,pv,1,0

thanks

Here is the original thread.

Taking a look at the updated problem.

hi, just thought I should explain why I had to add two more array patterns:
small bats break and un-break the first infrared beam before breaking the second beam, whereas large bats break the first and second beam before un-breaking the first beam. thus there are a total of 4 patterns for the two flight directions to determine. I know it would make most sense to space the beams further apart so I only have two patterns (I will, but the site is far away and inaccessible) but I already have months of data in the existing format that I need to make sense of.
thanks:

1 Like

I guess I can adapt this after all:

# this has been changed to operate under the assumption that the 1st beam
# opens and closes before next beam opens due to spacing between beams being increased.

BEGIN { OT=0 # Time of previous measurement
		MAX=1	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 0	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 1	Block 0   	unblock 1	 Unblock 0
		CC[0]="1,1";	CC[1]="0,1";	CC[2]="1,0";	CC[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		D[0]="0,1";	D[1]="1,1";	D[2]="0,0";	D[3]="1,0";

            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		FOUNDC=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(CC[N] != C[N]) FOUNDC=0;
			if(D[N] != C[N]) FOUNDD=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA || FOUNDC)
		{
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CA++;	AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB || FOUNDD)
		{
			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CB++;	BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nT %2d\n", CA, CB, CA+CB) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
}

There's already a variable named C, which may have fouled up your attempts to modify this yourself. I named the new pattern CC.

many thanks,
its almost 10pm and the missus is getting techy:eek:, so ill try it 1st thing tomorrow.:b:

Reading the original thread I seem to have missed some of your questions. Sorry about that.

You wanted to exclude data from pigeons between certain hours but weren't specific about whether 'real time' meant datalogger time.

1 Like

Hi,

thanks for pointing out the existing C var, but I couldn't figure out enough about how the arrays worked to progress further anyway:(

you may remember: there are 2 types of loggers in use, the 1st type creates seperate directories for each day of data, the 2nd type logs to an MS Access database. You wrote scripts for 1st type (with gawk, in windoze).

I was able to figure out enough of your original scripts to modify to use with the 2nd type (a single large txt file exported from Access). I did a query to remove data in daytime before export to minimize bird count transits but have no solution for the non-access data. So the answer is yes, it would be great if you could script elimination of daytime bird data. I need to be able to change the time of "daytime" - winter would be ~ 8am to 5pm, summer ~ 6am to 9pm; that is actual time, not modified logger time of -16 hours (PC time is minus 16 hours due to daily folders being created at midnight, which does not coincide with a bat "day")

If you feel really ambitious, it would be "nice to have" additional daily stats for "large bats" and "small bats" based on the array criteria - this would enable me to claim my design of the logger was actually correct since beams are the correct width apart to give added value data:D
Another "nice to have" feature would be csv daily stats output that could be easily imported to spreadsheet and graphed.

There is a bit of a problem when 2 bats transit in the same second of the day (see 1st 8 rows of sample data) this is not common but does happen. I think I was able to get around it by changing the max var from 1 to 0 to try to compensate for two transits in one second - but I probably broke something else doing this!

I was getting errors so I changed CC var to X to eliminate any chance of conflict but was still getting errors. changed these lines, was it necessary?

	CA=0		;	CB=0         ; CX=0 ; CD=0

and

		FOUNDA=1;	FOUNDB=1;
		FOUNDCX=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(X[N] != C[N]) FOUNDX=0;
			if(D[N] != C[N]) FOUNDD=0;

I found the errors- missing curly brackets.
here is working code:

# this has been changed to operate under the assumption that the 1st beam
# opens and closes before next beam opens due to spacing between beams being increased.

BEGIN { OT=0 # Time of previous measurement
		MAX=0	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0         ; CX=0 ; CD=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 0	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 1	Block 0   	unblock 1	 Unblock 0
		X[0]="1,1";	X[1]="0,1";	X[2]="1,0";	X[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		D[0]="0,1";	D[1]="1,1";	D[2]="0,0";	D[3]="1,0";
 }
            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		FOUNDX=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(X[N] != C[N]) FOUNDX=0;
			if(D[N] != C[N]) FOUNDD=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA || FOUNDX)
		{
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CA++;	AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB || FOUNDD)
		{
			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			CB++;	BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nT %2d\n", CA, CB, CA+CB) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
	}

===============================================

ADDITIONAL INFO:

reminder this runs from zsh, you made a batch file which calls twozsh scripts, I modified these to use the output.txt file as input (same format as small sample data above)
1) bat file:

.\zsh.exe .\IpEther1.zsh

echo ====================
echo Done.  
PAUSE

2) 1st zsh script:

# because windows CMD doesn't actually expand * into arguments for you,
# we need to run this in sh to do so.  Bleh.
exec ./zsh.exe IpEther2.zsh * > events.txt 2> stats.txt

# operates on any dir name starting between year 2000 and 2999
# "2>" redirects sterr to stats.txt
  1. 2nd zsh script
#!/bin/zsh

# this script is to be used with data exported from Ipether Access database

./gawk.exe -F "," -f batmon5.awk < output.txt    #  calls gawk using output.txt as the input to awk

Hi,

here are stats for 4 days, they do not seem right. The numbers "in" and "out" should be quite similar but they are not.

Date @2011-07-17 COUNT -323 IN 7 OUT 334 Estimate 323 bats(peak was at 22:01:32)
Date @2011-07-18 COUNT +107 IN 401 OUT 1 Estimate 107 bats(peak was at 08:07:31)
Date @2011-07-19 COUNT +158 IN 463 OUT 1 Estimate 158 bats(peak was at 22:20:57)
Date @2011-07-20 COUNT +451 IN 453 OUT 0 Estimate 451 bats(peak was at 08:37:30)
A 6737
B 6344
T 13081
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A 533 479 539 412 1179 314 82 49 29 27 20 16 12 11  9  3  7 43 83 141 278 1414 385
B 464 535 525 455 419 130 68 49 33 31 26 17 11 10  5  3 11 51 87 152 265 2136 535

I have attached the relevant files if you would't mind having a look to see?
many thanks

Getting what errors? Do you mean the inaccuracy you mentioned earlier or something else?

You didn't include most of the files this needed to run in your zip, hunting for them.

---------- Post updated at 01:46 PM ---------- Previous update was at 01:32 PM ----------

I converted your data into one huge, long line and did a rough-and-ready count with grep and wc:

awk -v FS=, '{ printf(" %s,%s", $9, $10); }' < output.txt > output2.txt
$ grep -o "1,1 0,1 1,0 0,0" output3.txt  | wc -l
2603
$  $ grep -o "0,1 1,1 0,0 1,0" output3.txt  | wc -l
746

Do these numbers look reasonable? If not, these patterns aren't working.

---------- Post updated at 02:00 PM ---------- Previous update was at 01:46 PM ----------

I think I see inconsistencies in the patterns.

---------- Post updated at 02:20 PM ---------- Previous update was at 02:00 PM ----------

I think the A pattern in the program you attached was subtly wrong. You'd also set MAX to 0, which may have made it too picky. I think the problem with two events in the same second was due to problems with the A pattern and not because of MAX -- trying it with MAX=1 on your sample data it sees an A event immediately followed by a B event in the same second.

BEGIN { OT=0 # Time of previous measurement
		MAX=1	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0         ; CX=0 ; CD=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 1	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 1	Block 0   	unblock 1	 Unblock 0
		X[0]="1,1";	X[1]="0,1";	X[2]="1,0";	X[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		D[0]="0,1";	D[1]="1,1";	D[2]="0,0";	D[3]="1,0";
 }
            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		FOUNDX=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(X[N] != C[N]) FOUNDX=0;
			if(D[N] != C[N]) FOUNDD=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA || FOUNDX)
		{
                        if(FOUNDX) CX++;
                        else       CA++;
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB || FOUNDD)
		{
                        if(FOUNDD) CD++;
                        else       CB++;

			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nX %2d\nD %2d\nT %2d\n", CA, CB, CX, CD, CA+CB+CX+CD) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
	}

Does this look reasonable?

Date @2011-07-17 COUNT -261 IN 9 OUT 273 Estimate 261 bats(peak was at 22:01:32)
Date @2011-07-18 COUNT +217 IN 380 OUT 1 Estimate 217 bats(peak was at 08:07:31)
Date @2011-07-19 COUNT +266 IN 461 OUT 2 Estimate 266 bats(peak was at 22:20:49)
Date @2011-07-20 COUNT +428 IN 430 OUT 0 Estimate 428 bats(peak was at 08:37:30)
A 4956
B 6163
X 2603
D 746
T 14468
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A 573 536 582 449 1225 321 84 50 29 27 21 16 12 11  9  3  7 46 85 152 300 1865 442
B 499 562 555 471 579 155 69 49 33 31 26 17 12 10  6  3 11 53 90 160 273 2313 561

It is matching the X and D patterns correctly if grep is to be believed..

---------- Post updated at 02:34 PM ---------- Previous update was at 02:20 PM ----------

You haven't detailed most of your requirements at all. If I wrote a CSV export it'd be almost guaranteed to not be the layout or even the data you wanted. :wink:

And your data about the bird rejection times is still too vague to use. Is that 8am to 5pm raw datalogger time, or 8am to 5pm in the "corrected" time? What about spring and fall?

1 Like

hi, this looks much better since the ins are roughly = to outs
as one would expect.
I;m driving to germany on holiday with wife and kids tomorrow so cant look at it in detail for a few days - but like I said, it seems ok.
thanks so much for the help.
will let you know more as soon as i can:):b:

hi,
the bird rejection time is not raw datalogger time since that is minus 16 hours from the actual time, it is 8am to 5pm corrected time. I suppose we could get really fancy and reference an array or lookup table of daily sunrise/sunset at partiular lattitude! -but probably easier to change the times manually for every 4 month data run...

something like this for csv would be sufficent for graphing:
2011-07-17, 261, 22:01:32
2011-07-18, 217, 08:07:31

presumably just modification the format of the output, no?

however, how do I go about ading A+X and b+d for the final in out daily stats?

hi,

just had a chance to look at this now. the stats look ok for each of the individual letters but how would I go about adding them - to get total in, total out and total max bats out by adding a+x and b+d?

Hi,
I have modified the code as follows, however I cannot work out how to get the correct stats. In and out counts are correct but maximum/estimate should be 3 since there are 3 out at one time before they start to return.:wall:

any help appreciated... thanks

BEGIN { OT=0 # Time of previous measurement
MAX=1 # Max num of seconds between valid events
DAY=""; # Current day
CA=0 ; CB=0 ; CX=0 ; CD=0 # var to "hold" in or "out" A, C are one direction, B, X are the other dir
# Running total of bats leaving and entering
TOTALBATS=0;
# The highest TOTALBATS has ever been
MAXBATS=0;
# Length of the patterns
L=4
# Patterns to check against
# Block 1 unBlock 1 block 0 Unblock 0
#A[0]="1,1"; A[1]="1,0"; A[2]="0,1"; A[3]="0,0";
A[0]="1,1"; A[1]="1,0"; A[2]="0,1"; A[3]="0,0";
# Block 1 Block 0 unblock 1 Unblock 0
X[0]="1,1"; X[1]="0,1"; X[2]="1,0"; X[3]="0,0";
# Block 0 Unblock 0 Block 1 Unblock 1
B[0]="0,1"; B[1]="0,0"; B[2]="1,1"; B[3]="1,0";
# Block 0 block 1 unBlock 0 unblock 1
D[0]="0,1"; D[1]="1,1"; D[2]="0,0"; D[3]="1,0";
}
function print_daily(day,total,max,min,maxtime)
{
I=total; if(I<0) I=-I;
MX="no maximum"
if(maxtime > 0)
MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));
# printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
printf("Date @%s IN %+d OUT %d Max %d Estimate %d bats(%s)\n",
#day, total, max, -min, I, MX) > "/dev/stderr";
day, CA, CB, MAXBATS, I, MX) > "/dev/stderr";

# Reset daily counts
TOTALBATS=0; MAXBATS=0; MINBATS=0; MAXTIME=0;
MINTIME=0;  CA=0; CB=0;
}
{ # Calculate timestamp from date string
T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
T+=(60*60*16); # Add sixteen hours
$1=strftime("%Y", T); # Put these back in the strings
$2=strftime("%m", T);
$3=strftime("%d", T);
$5=strftime("%H", T);
$6=strftime("%M", T);
$7=strftime("%S", T);
# When the year, month, and/or day changes, time to print daily counts
if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);
DAY=$1 "-" $2 "-" $3;
if($8 == "pv") # Ignore anything but PV lines.
{
# If too much time has passed since the last event, start over.
if((T-OT) > MAX) # Blank the array
  for(N=0; N<(L-1); N++) C[N]="";
  else # Shift elements toward the front
  for(N=0; N<(L-1); N++) C[N]=C[N+1];
  OT=T # Set prev time to this one.
  C[L-1]=$9 "," $10; # Set the latest event in the array
  # Search for events in the array.
  FOUNDA=1; FOUNDB=1;
  FOUNDX=1; FOUNDD=1;
  for(N=0; N<L; N++)
  {
  if(A[N] != C[N]) FOUNDA=0;
  if(B[N] != C[N]) FOUNDB=0;
  if(X[N] != C[N]) FOUNDX=0;
  if(D[N] != C[N]) FOUNDD=0;
  }
  # Count the events and mark the hour they occurred in
  if(FOUNDA || FOUNDX)
  {
  # if(FOUNDX) CX++;
  # else
  CA++;
  printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
  AH[$5]++;
  TOTALBATS++;
  }
  if(FOUNDB || FOUNDD)
   {
 # if(FOUNDD) CD++;
 # else
 CB++;
printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
BH[$5]++;
TOTALBATS--;
}
# Update our maximum daily counts
if(MAXBATS < TOTALBATS)
{
MAXBATS=TOTALBATS;
MAXTIME=T;
}
#if(MINBATS > TOTALBATS)
#{
#MINBATS=TOTALBATS;
#MINTIME=T;
#}
}
}
END { # The final statistics will be printed to stderr, to easily
# seperate them from the event times printed to stdout.
# The last daily count
print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);
# Print the event counts
printf("A %2d\nB %2d\nX %2d\nD %2d\nT %2d\n", CA, CB, CX, CD, CA+CB+CX+CD) > "/dev/stderr";
# Print a list of hours from 1-23
STR="H";
for(N=1; N<=23; N++) STR=STR sprintf(" %2d", N);;
print STR > "/dev/stderr";
# Print hourly counts for event A
STR="A";
for(N=1; N<=23; N++)
STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
print STR > "/dev/stderr";
# Hourly counts for event B
STR="B";
for(N=1; N<=23; N++)
STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
    print STR > "/dev/stderr";
    }

the result I get is this:

Date @2011-07-19 IN +6 OUT 6 Max 1 Estimate 0 bats(peak was at 16:10:28)
A  0
B  0
X  0
D  0
T  0
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0
B  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0

using this as input data:

2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,29,pv,1,1
2011,07,19,Rx,00,10,29,pv,0,1
2011,07,19,Rx,00,10,29,pv,1,0
2011,07,19,Rx,00,10,29,pv,0,0
2011,07,19,Rx,00,36,57,pv,0,1
2011,07,19,Rx,00,36,57,pv,0,0
2011,07,19,Rx,00,36,57,pv,1,1
2011,07,19,Rx,00,36,57,pv,1,0
2011,07,19,Rx,00,37,10,pv,1,1
2011,07,19,Rx,00,37,10,pv,0,1
2011,07,19,Rx,00,37,10,pv,1,0
2011,07,19,Rx,00,37,10,pv,0,0
2011,07,19,Rx,00,41,30,pv,0,1
2011,07,19,Rx,00,41,30,pv,1,1
2011,07,19,Rx,00,41,30,pv,0,0
2011,07,19,Rx,00,41,30,pv,1,0
2011,07,19,Rx,00,41,31,pv,0,1
2011,07,19,Rx,00,41,31,pv,1,1
2011,07,19,Rx,00,41,31,pv,0,0
2011,07,19,Rx,00,41,31,pv,1,0
2011,07,19,Rx,00,41,27,pv,0,1
2011,07,19,Rx,00,41,27,pv,1,1
2011,07,19,Rx,00,41,27,pv,0,0
2011,07,19,Rx,00,41,27,pv,1,0
2011,07,19,Rx,00,41,28,pv,0,1
2011,07,19,Rx,00,41,28,pv,1,1
2011,07,19,Rx,00,41,28,pv,0,0
2011,07,19,Rx,00,41,28,pv,1,0
2011,07,19,Rx,00,41,29,pv,1,1
2011,07,19,Rx,00,41,29,pv,0,1
2011,07,19,Rx,00,41,29,pv,1,0
2011,07,19,Rx,00,41,29,pv,0,0
2011,07,19,Rx,00,41,31,pv,1,1
2011,07,19,Rx,00,41,31,pv,0,1
2011,07,19,Rx,00,41,31,pv,1,0
2011,07,19,Rx,00,41,31,pv,0,0
2011,07,19,Rx,00,41,32,pv,1,1
2011,07,19,Rx,00,41,32,pv,0,1
2011,07,19,Rx,00,41,32,pv,1,0
2011,07,19,Rx,00,41,32,pv,0,0

thanks

just nudging the post in hope the last question gets answered:D

---------- Post updated 19-08-11 at 09:11 AM ---------- Previous update was 18-08-11 at 07:08 PM ----------

debugging I added this at lines 86 and 96

TOTALBATS++;
  print TOTALBATS > "/dev/stderr"; 

TOTALBATS--;
  print TOTALBATS > "/dev/stderr"; 

the resulting stats output shows what is happening:

-1
0
-1
0
-1
0
-1
-2
-3
-2
-3
-2
-3
-2
-3
-2
-1
-2
-1
-2
-1
-2
-3
-4
-5
-6
-5
-4
-3
-2
Date: @2011-07-19 IN: +14 OUT: 16 Maximum count: 0 mismatch= 2 (no maximum)
-1

If there is a negative maximum (this would happen if the logger was intalled in reverse -easily possible in the field) the result is that no maximum count is recorded -it should be 6 above.
any idea how I can achieve this?
thanks:confused:

Hi corona,

I finally muddled through the code and modified to doing everything I needed.
So, I suppose I should say, thanks for teaching me something about awk -the hard way:b:

however I now have a problem which I think may be due to gawk on win (32bit win xp).

when the file output.txt is over 1 million lines, the thing crashes. Would it make any difference if I used Linux?

thanks,

ps. Is it the electrically charged plasma or the beer?
:confused:

Good, I'd hoped to give you an outline you or someone else could modify how you pleased. I just couldn't keep up with your constantly increasing requirements. :o I'd hoped someone else would help, but it looks like they left it all to me :mad:

I don't know why it's crashing. Unless it runs out of memory -- unlikely -- it just shouldn't crash. Running it in linux may help but your datafiles will all be Windows text and need conversion (strip out carriage returns)

tr -d '\r' < wingarbage.txt > normaltext.txt

---------- Post updated at 09:13 AM ---------- Previous update was at 09:07 AM ----------

You could also try the awk from busybox.exe

http://dl.dropbox.com/u/5943991/busybox-w32/busybox.exe

it's a multiple executable with lots of programs inside. To get awk you do 'busybox.exe awk ...'