Delimeters Count in a FlatFile

Hi,

I have the below script to check the count of delimeters for a file (here is File : test.csv Delimeter is ",")

awk '{gsub(/"[^"]*"/,x);print gsub(/,/,x)}' test.csv

And it return the output for each line as:

2
2
cat test.csv:
abc,xyz
"abc,zxyz",1

I need help one the below things:

  • IS there a way to pass the delimeter as a variable ? I am using the below command but it is not working?
awk -vs="," '{gsub(/"[^"]*"/,x);print gsub(/s/,x)}' test.csv
  • Can I validate a file whether it has same delimeter count across all the lines of file rather than for each line?

Please help! Thanks a lot

awk '{FS=del; if(cnt && cnt != NF-1) {a=1;exit;} cnt=NF-1;}END{if (a==1) print "delim count is not same in all lines" else print "delim count is same in all lines"}' del=$delim test.csv

Hi Anuraj,

The command does not solve the purpose.

I need to pass delimeter as a variable and at the same time want to get the output 1 or 0 if the all the rows has the same delimeter count (ignoring the delimeter in the quotes if any, that I already included in the awk command)

Modified command above.

1 Like

Thanks Anuraj.

But the posted command is not giving the desired result. It always ends up going to the if(a==1) and prints the "delim count is not same in all lines" even though the delimiter count is same across the file. And also it is not ignoring the delimeter incase it is there in the quotes.

Example of such a case is:
"abc,zyz",xyz

In this case delimeter count is 1.

sed 's/"\([^,]*\),\([^"]*\)"/"\1\2"/g' test.csv | awk '{FS=del; if(cnt && cnt != NF-1) {a=1;exit;} cnt=NF-1;}END{if (a==1) print "delim count is not same in all lines"; else print "delim count is same in all lines"}' del=$delim

delim is a variable, it's value should be set before running above command.
If it doesn't work, Pls post the command along with the result you got.

1 Like

Hi Anurag,

Thanks for your replies.

Here is the list of commands that I ran:

delim=,
echo $delim
output is ,

sed 's/"\([^,]*\),\([^"]*\)"/"\1\2"/g" test.csv | awk '{FS=del; if(cnt && cnt != NF-1) {a=1; exit} cnt=NF-1;}END{if (a==1)
  print "delim count is not same in all lines" else print "delim count is same in all lines"}' del=$delim

-bash: syntax error near unexpected token `{a=1'

missed semicolon in if statement. Also single quote at the end of sed (not double quote) Corrected the earlier post #6. Please check that now.

Since you only care about the number of delimiters, why not just remove everything but, with:

sed -e 's/"\([^"]\|""\)*"/<>/g' -e 's/[^,]//g'

You can then compare lengths of $0 in awk.
And since you don't care about the actual number of fields:

X=`sed -e 's/"\([^"]\|""\)*"/<>/g' -e 's/[^,]//g' | sort -u | wc -l`

if [[ ${X} = 1 ]]; then
    echo delim count is the same in all lines
else
    echo delim count is not the same in all lines
fi

Note that man wc (linux) will return a number with leading spaces, so you have to make an arithmetic comparison, not a string comparison where " 1" is not the same as "1".