Validate csv file

Hi guys,
i want to validate the no.of colums in a csv file ,but if there is a comma(,) in any of the data values it should skip and count only valid (,) commas.
e.g

1.abc,pqrs,1234,567,hhh

result :4

2.abc,pqrs,1234,567,hhh,"in,valid",end12

result:6
here script should skip the comma inside the{ "in,valid" }.
please advice how to get this count?

thanks

You said you wanted to count columns, but your sample output seems to be counting commas instead of columns. To count columns, you might try:

awk -F'"' '
{	nf = gsub(/,/, ",") + 1
	for(i = 2; i <= NF; i += 2)
		nf -= gsub(/,/, ",", $i)
	printf("%s\nresult:%d\n", $0, nf)
}' file

which, if file contains your two sample input lines, produces the output:

bc,pqrs,1234,567,hhh
result:5
2.abc,pqrs,1234,567,hhh,"in,valid",end12
result:7
1 Like

Thanks ! Don..it will also work..

---------- Post updated 04-22-16 at 01:25 AM ---------- Previous update was 04-21-16 at 07:38 AM ----------

Hi Don,
can i get total count of columns in out put result .
if a file has 2 rows and 19 colums.
then total colums is 19*2.

thanks

Do you mean: The file consists of records which spread over two lines, and you want the element count per record?

Hi RudiC,
Below code gives the out put as below

bc,pqrs,1234,567,hhh result:5 2.abc,pqrs,1234,567,hhh,"in,valid",end12 result:7

i want the total_count as 12(5+7) and so on ..if more records as there.

This is not the concept of "columns" (as to my understanding). Columns are repeating structures of records (or rows) in a (DB or spread sheet or ...) table or file. E.g. Col1 has "Last Name", Col2 has "First Name", Col3 "CustomerID", etc.

Do you want just the count of data fields/elements in a file? Then try an adaption of Don Cragun's proposal:

awk -F'"' '
        {nf += gsub(/,/, ",") + 1
         for(i = 2; i <= NF; i += 2)
         nf -= gsub(/,/, ",", $i)
        }
END     {printf("result:%d\n", nf)
        }
' file
result:12
1 Like

Your desired output format isn't clear, but maybe something like:

awk -F'"' '
{	nf = gsub(/,/, ",") + 1
	for(i = 2; i <= NF; i += 2)
		nf -= gsub(/,/, ",", $i)
	printf("%s\tresult:%d\n", $0, nf)
	tf += nf
}
END {	printf("Summary: fields:%d rows:%d average fields/row:%.2f\n",
		tf, NR, tf / NR)
}' OFS='"' file

producing the output:

bc,pqrs,1234,567,hhh	result:5
2.abc,pqrs,1234,567,hhh,"in,valid",end12	result:7
Summary: fields:12 rows:2 average fields/row:6.00

will give you what you want from your two sample input lines???

1 Like

Hi RudiC,
This below code i am using as part of a script ,now how can i use this nf value for further processing.

awk -F'"' '         {nf += gsub(/,/, ",") + 1          for(i = 2; i <= NF; i += 2)          nf -= gsub(/,/, ",", $i)         } END     {printf("result:%d\n", nf)         } ' file result:12
if [ 'expr $nf  % 19 ' -gt 0]
then 
  'echo "file is invalid"
please suggest.!

---------- Post updated 04-25-16 at 12:18 AM ---------- Previous update was 04-24-16 at 11:58 PM ----------

apologies..Don..i will take care .

First: You can't remove all of the carriage-return characters from an awk script to make a one-liner and get a working awk script.

Second: Please start a new thread for a new problem. You have now changed your requirements three times in four posts.

Third: You can't access awk variables after the awk script exits.

Fourth: What is your script doing? Why not just have the awk script use the variables that are available in the awk script to produce your additional output? For example:

awk -F'"' '
	{nf += gsub(/,/, ",") + 1
	 for(i = 2; i <= NF; i += 2)
	 nf -= gsub(/,/, ",", $i)
	}
END	{printf("result:%d\n", nf)
	 if(nf % 19)
		print FILENAME, "is invalid"
	}' file

And, to add to all the right points the venerable Don already made:

This is simply a (bunch of) syntax error(s):

Fifth: The single quote in:

  'echo "file is invalid"

is not closed.

Sixth: This:

if [ 'expr $nf  % 19 ' -gt 0]

misses a space near the end, like this: ...-gt 0 ] .

Seventh: The comparison of a string value ( 'expr $nf % 19 ' is a literal string) with an integer ( -gt 0 is an integer comparison) will never work.

I hope this helps.

bakunin

Once the awk script runs error free, assign its result (which is printed to stdin) to a shell variable using "command substitution":

RES=$(awk '...' ...)