Validate csv file

harry123 · April 21, 2016, 2:14am

Hi guys,
i want to validate the no.of colums in a csv file ,but if there is a comma(,) in any of the data values it should skip and count only valid (,) commas.
e.g

1.abc,pqrs,1234,567,hhh

result :4

2.abc,pqrs,1234,567,hhh,"in,valid",end12

result:6
here script should skip the comma inside the{ "in,valid" }.
please advice how to get this count?

thanks

Don_Cragun · April 21, 2016, 3:36am

You said you wanted to count columns, but your sample output seems to be counting commas instead of columns. To count columns, you might try:

awk -F'"' '
{	nf = gsub(/,/, ",") + 1
	for(i = 2; i <= NF; i += 2)
		nf -= gsub(/,/, ",", $i)
	printf("%s\nresult:%d\n", $0, nf)
}' file

which, if file contains your two sample input lines, produces the output:

bc,pqrs,1234,567,hhh
result:5
2.abc,pqrs,1234,567,hhh,"in,valid",end12
result:7

harry123 · April 22, 2016, 2:25am

Thanks ! Don..it will also work..

---------- Post updated 04-22-16 at 01:25 AM ---------- Previous update was 04-21-16 at 07:38 AM ----------

Hi Don,
can i get total count of columns in out put result .
if a file has 2 rows and 19 colums.
then total colums is 19*2.

thanks

RudiC · April 22, 2016, 3:04am

Do you mean: The file consists of records which spread over two lines, and you want the element count per record?

harry123 · April 22, 2016, 3:17am

Hi RudiC,
Below code gives the out put as below

bc,pqrs,1234,567,hhh result:5 2.abc,pqrs,1234,567,hhh,"in,valid",end12 result:7

i want the total_count as 12(5+7) and so on ..if more records as there.

RudiC · April 22, 2016, 3:46am

This is not the concept of "columns" (as to my understanding). Columns are repeating structures of records (or rows) in a (DB or spread sheet or ...) table or file. E.g. Col1 has "Last Name", Col2 has "First Name", Col3 "CustomerID", etc.

Do you want just the count of data fields/elements in a file? Then try an adaption of Don Cragun's proposal:

awk -F'"' '
        {nf += gsub(/,/, ",") + 1
         for(i = 2; i <= NF; i += 2)
         nf -= gsub(/,/, ",", $i)
        }
END     {printf("result:%d\n", nf)
        }
' file
result:12

Don_Cragun · April 22, 2016, 5:55am

Your desired output format isn't clear, but maybe something like:

awk -F'"' '
{	nf = gsub(/,/, ",") + 1
	for(i = 2; i <= NF; i += 2)
		nf -= gsub(/,/, ",", $i)
	printf("%s\tresult:%d\n", $0, nf)
	tf += nf
}
END {	printf("Summary: fields:%d rows:%d average fields/row:%.2f\n",
		tf, NR, tf / NR)
}' OFS='"' file

producing the output:

bc,pqrs,1234,567,hhh	result:5
2.abc,pqrs,1234,567,hhh,"in,valid",end12	result:7
Summary: fields:12 rows:2 average fields/row:6.00

will give you what you want from your two sample input lines???

harry123 · April 25, 2016, 1:18am

Hi RudiC,
This below code i am using as part of a script ,now how can i use this nf value for further processing.

awk -F'"' '         {nf += gsub(/,/, ",") + 1          for(i = 2; i <= NF; i += 2)          nf -= gsub(/,/, ",", $i)         } END     {printf("result:%d\n", nf)         } ' file result:12

if [ 'expr $nf  % 19 ' -gt 0]
then 
  'echo "file is invalid"
please suggest.!

---------- Post updated 04-25-16 at 12:18 AM ---------- Previous update was 04-24-16 at 11:58 PM ----------

apologies..Don..i will take care .

Don_Cragun · April 25, 2016, 1:52am

First: You can't remove all of the carriage-return characters from an awk script to make a one-liner and get a working awk script.

Second: Please start a new thread for a new problem. You have now changed your requirements three times in four posts.

Third: You can't access awk variables after the awk script exits.

Fourth: What is your script doing? Why not just have the awk script use the variables that are available in the awk script to produce your additional output? For example:

awk -F'"' '
	{nf += gsub(/,/, ",") + 1
	 for(i = 2; i <= NF; i += 2)
	 nf -= gsub(/,/, ",", $i)
	}
END	{printf("result:%d\n", nf)
	 if(nf % 19)
		print FILENAME, "is invalid"
	}' file

bakunin · April 25, 2016, 8:36am

And, to add to all the right points the venerable Don already made:

This is simply a (bunch of) syntax error(s):

Fifth: The single quote in:

  'echo "file is invalid"

is not closed.

Sixth: This:

if [ 'expr $nf  % 19 ' -gt 0]

misses a space near the end, like this: ...-gt 0 ] .

Seventh: The comparison of a string value ( 'expr $nf % 19 ' is a literal string) with an integer ( -gt 0 is an integer comparison) will never work.

I hope this helps.

bakunin

RudiC · April 25, 2016, 8:49am

Once the awk script runs error free, assign its result (which is printed to stdin) to a shell variable using "command substitution":

RES=$(awk '...' ...)