Check file from multiple files is empty using awk

I am passing multiple files in awk & since one of the file is empty(say file3) so the same gets skipped & logic goes for toss. Need suggestion/help in checking and putting additional checks for the same

awk -F,  'FNR==1 {++filecounter}
filecounter==1 {KRL[$1]=$2;next}
filecounter==2 {GUJ[$1]=$2;next}
filecounter==3 {DEL[$1]=$2;next}
filecounter==4 {UPW[$1]=$2;next}
{
if($1 in KRL)
print "FOUND IN KRL"
else if($1 in GUJ)
print "FOUND IN GUJ"
else if($1 in DEL)
print "FOUND IN DEL"
else if($1 in UPW)
print "FOUND IN UPW"
else
print "Not Found" 
}
' File1 file2 file3 file4 mainfile

Why don't you use the FILENAME variable for identifying the file being read?

1 Like

If the filenames vary from run to run, you could try something like:

#!/bin/ksh
awk -F, '
FNR == 1 {
	for(filecounter = 1; filecounter <= ARGC; filecounter++)
		if(FILENAME == ARGV[filecounter])
			break
	printf("***Processing file %s, filecounter=%d\n", FILENAME, filecounter)
}
filecounter==1 {KRL[$1]=$2;next}
filecounter==2 {GUJ[$1]=$2;next}
filecounter==3 {DEL[$1]=$2;next}
filecounter==4 {UPW[$1]=$2;next}
{	if($1 in KRL)
		print "FOUND IN KRL"
	else if($1 in GUJ)
		print "FOUND IN GUJ"
	else if($1 in DEL)
		print "FOUND IN DEL"
	else if($1 in UPW)
		print "FOUND IN UPW"
	else	print "Not Found"
}' "$@"

and invoke this script with:

./scriptname File1 file2 file3 file4 mainfile

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

Note, however, that if you want to have parameters passed to your script as trailing arguments as in:

./scriptname OFS="," File1 file2 file3 file4 mainfile

you'd have to make the filename matching code more complex.

Or, you could leave out the FNR == 1 clause in your script and invoke it with:

./scriptname filecounter=1 File1 filecounter=2 file2 filecounter=3 file3 filecounter=4 file4 filecounter=5 mainfile

Thanks Don. Trying to run the script using the following files in the code shared

file1, file2, file3 is empty,i.e, it is blank file & file4 has following content

9868108600|1025107385|1|9868|108600|20120109|9480
1024344800|1023976099|1|9136|006388|20130814|9429
8459103270|1025429705|1|8459|103270|20130924|9480

mainfile has following content

1024344800|1|919136004959|004959|31639136001714|20150729105905105905|79|0|0|2|60|NO_EXCHG|7|900|0|||0|9136|702|0|20150729120737||79|3163|0|702|0|0|0|0|0|0|0|0|0|0|24|24|702|0|0|9431|0|SSJPR1MS201507291102556714_PR.PRC|11:00:24|404009875008997
1024344801|1|919136004959|004959|31639136001716|20150729110247110247|50|0|0|1|60|NO_EXCHG|7|900|0|||0|9136|702|0|20150729120737||50|3163|0|702|0|0|0|0|0|0|0|0|0|0|24|24|702|0|0|9431|0|SSJPR1MS201507291105486715_PR.PRC|11:03:37|404009875008998

but when i run the script provided by you it only gives me processing file4 & mainfile with no output saying "FOUND IN DEL". Can you please suggest

---------- Post updated at 10:09 AM ---------- Previous update was at 10:06 AM ----------

apologies, it should give "FOUND IN UPW". Any file provided provided as input say from file1 to file4 can be empty/blank, so same must work for other conditions in case file1 or file2 or file3 is empty.

You said your field separator was a comma ( -F, ), so with your sample input file file4 , field 1 is:

1024344800|1023976099|1|9136|006388|20130814|9429

(not 1024344800 ), and

1024344800|1023976099|1|9136|006388|20130814|9429

in file4 is not the same as field 1 in mainfile :

1024344800|1|919136004959|004959|31639136001714|20150729105905105905|79|0|0|2|60|NO_EXCHG|7|900|0|||0|9136|702|0|20150729120737||79|3163|0|702|0|0|0|0|0|0|0|0|0|0|24|24|702|0|0|9431|0|SSJPR1MS201507291102556714_PR.PRC|11:00:24|404009875008997

If your field separator is the vertical bar (or pipe symbol), you need to change -F, in your script to -F'|' .

Yup, changed in the code but didnt updated in the code shared by you.

#!/bin/ksh
awk -F'|' '
FNR == 1 {
	for(filecounter = 1; filecounter <= ARGC; filecounter++)
		if(FILENAME == ARGV[filecounter])
			break
	printf("***Processing file %s, filecounter=%d\n", FILENAME, filecounter)
}
filecounter==1 {KRL[$1]=$2;next}
filecounter==2 {GUJ[$1]=$2;next}
filecounter==3 {DEL[$1]=$2;next}
filecounter==4 {UPW[$1]=$2;next}
{	if($1 in KRL)
		print "FOUND IN KRL"
	else if($1 in GUJ)
		print "FOUND IN GUJ"
	else if($1 in DEL)
		print "FOUND IN DEL"
	else if($1 in UPW)
		print "FOUND IN UPW"
	else	print "Not Found"
}' "$@"

---------- Post updated at 12:30 PM ---------- Previous update was at 11:47 AM ----------

Hi Don, also if you see that file4 content, i am putting field1 in array whose value is field2 from file4

filecounter==4{UPW[$1]=$2;next}

& the same is being matched from mainfile. I have also highlighted the same in my earlier post. Can you please suggest how to handle this.

Taken from linux - How to check if a file is empty using awk in a bash script? - Stack Overflow

awk 'END{print(NR>2)?"NOT EMPTY":"EMPTY"}'

You must put this part in END block otherwise it'll be executed for every line.

I understand that your code is setting values in your four arrays that are never used. If you change the four lines setting the arrays from:

filecounter==1 {KRL[$1]=$2;next}
...

to:

filecounter==1 {KRL[$1];next}
...

you would get exactly the same results. I left that seemingly extraneous initialization in place assuming that your real code would do something with the values that had been saved in the arrays instead of just saying "FOUND IN arrayname".

But I have no idea what "this" is referring to in:

Please explain what you are trying to change and show us what output you are trying to produce.

I do not see how this is related to this thread??? Adding this END clause to the code being used in this thread will print "NOT EMPTY" if at least one of the five files being processed by these scripts is not empty, and will print "EMPTY" if there are no lines in any of the five input files.

Sorry, I should have gotten more sleep last night. Ignore what I said above...
I do not see how this is related to this thread??? Adding this END clause to the code being used in this thread will print "NOT EMPTY" if at least two lines were found in the five files being processed by these scripts, and will print "EMPTY" if there are less than three lines in all of the five input files combined.