Reading CSV file

Hi experts,

Im having csv file with few columns which should contain data as shown below. Want to check if column 3 contain row with duplicate value(9876,9876) then corresponding to this in col2 should contain text "tax" and should not contain text "non".
Word "non" can come but if in column3 duplicate value are not present in rows, if a row contain only distinct value suppose 7623 then corresponding to this "non" can come in column 2.

Need to print error along with line number, if these condition are not met in file or col2 contain "non" against col3 contain duplicate value

CSV Data:

col1         col2   col3
inv	       tax	9876
inv	       tax	9876
inv	       non	7623
inv           tax    1234
inv           tax    1234

Bad Data :
will be considered incorrect if last row with col2 value as "non" and col 3 value as 1234

col1         col2   col3
inv	       tax	9876
inv	       tax	9876
inv	       non	7623
inv           tax    1234
inv           tax    1234
inv           non    1234

Not fully understanding what you're after, how about

awk '
                {++CNT[$3]
                }

$2 == "non"     {NON[$3] = NR
                }

NON[$3] &&
(CNT[$3] > 1)   {print "error ", NON[$3]
                }
' file
error  7

Hi Rudic,

With reference to above query, below is the objective i want to achieve looking into csv file.

Identify the rows with field(column-1) as INV and then on filtered list identify rows with identical values in the column-2 (document Number). In all such cases, check the value in field supplyType (column-3) and in that if any of the row has value TAX then in none of the other row should have value NON.

For ex-
If column three contain duplicate values then against those values in column2 we can have values as "tax","SEZ", "ISD", etc but not value as "non". If value in column 3 does not contain duplicate value(only single or unqiue value) in rows, then column 2 can contain value "non" against those distinct(single) value of colum3.

sample data :

col1         col2   col3
inv	       tax	9876
inv	       tax	9876
inv	       non	7623
inv           tax    1234
inv           tax    1234
inv           non    1234

here the last bundle of duplicate values of "1234"(column3) will be considered as wrong because in column 2 we are getting "non" against column 3(last row value 1234).
Instead it has to be "tax" at last row of column2.

For this error we need to print "error code" with line number.

---------- Post updated at 12:38 AM ---------- Previous update was at 12:14 AM ----------

Hi Rudic,

Thanks for the code,
will execute the same and will let you know for issues if any.
thank you once again :slight_smile: