Print records which do not have expected number of fields in a comma delimited file

Hi,

I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number.

 
File1
====
name,desgnation,doj,project #header#
asath,se,15-06-2010,asc,"india,mumbai"
rif,sse,12-05-2010,asc
si,tl,01-12-2009,asc
mthr,"ase,trans",15-09-2010
sdu,ase,15-09-2010,bench
 
Here lets say me expected number of field is 4. And in the above input, record 2 has 5 fields and record 5 as 2 fields. I have to print the two records with line number.
 
Expected output
 
File1_fldmismatchrcrds
==================
2 asath,se,15-06-2010,asc,"india,mumbai"
5 mthr,"ase,trans",15-09-2010

I tried the below code

 awk -F "," '{gsub(/"[^"]*"/,x);if (NF != '4'){print NR,$0}}' File1

This gives me the output as

2 asath,se,15-06-2010,asc,
5 mthr,,15-09-2010
```[/b]

 
Can someone please help on this issue? 
:wall:

awk -F "," '/"/{print NR,$0}' file
This will help you.

Output-:
2 asath,se,15-06-2010,asc,"india,mumbai"
5 mthr,"ase,trans",15-09-2010

Hi Could you please explain how this works?
This returns me the rows in which there are double quoted (" ") but what I require is I have to print any record (with / without double quotes enclosed data) in it if the number of field is less than expected number of field.
The above code prints the records with double quote enclosed data alone. Irrepective of the number of fields.

Try preserving the record first:

awk -F, '{p=$0;gsub(/"[^"]*"/,x)} NF!=4{print NR,p}' file
/pattern/ {statements}

is a pattern match and the function or series of statments to invoke if the pattern matches
ie., a boolean result (true or false)

# expected number is 4
# this prints the record number if there are fewer than 5 fields
# and if there is a quote somewhere on the line
awk -F ','   ' /"/ && NF!=4 {print NR,$0}' file

I do not get exactly what the criteria for printing are supposed to be. So this is just giving you what it seems you want....

Thanks. It worked!! :smiley:

awk -F, '{p=$0;gsub(/"[^"]*"/,x)} NF!=4{print NR,p}' file

Hi machomaddy
Are you not getting the output what you want. If not please show me the output you want

@parthmittal2007, that returns linenrs + records that have at least one double quote
@jim, that returns linenrs + records that have at least one double quote and other than 3 commas