Delimeters Count in a FlatFile

venkatajay_18 · December 24, 2010, 4:19am

Hi,

I have the below script to check the count of delimeters for a file (here is File : test.csv Delimeter is ",")

awk '{gsub(/"[^"]*"/,x);print gsub(/,/,x)}' test.csv

And it return the output for each line as:

2
2

cat test.csv:
abc,xyz
"abc,zxyz",1

I need help one the below things:

IS there a way to pass the delimeter as a variable ? I am using the below command but it is not working?

awk -vs="," '{gsub(/"[^"]*"/,x);print gsub(/s/,x)}' test.csv

Can I validate a file whether it has same delimeter count across all the lines of file rather than for each line?

Please help! Thanks a lot

anurag.singh · December 24, 2010, 4:30am

awk '{FS=del; if(cnt && cnt != NF-1) {a=1;exit;} cnt=NF-1;}END{if (a==1) print "delim count is not same in all lines" else print "delim count is same in all lines"}' del=$delim test.csv

venkatajay_18 · December 24, 2010, 5:04am

Hi Anuraj,

The command does not solve the purpose.

I need to pass delimeter as a variable and at the same time want to get the output 1 or 0 if the all the rows has the same delimeter count (ignoring the delimeter in the quotes if any, that I already included in the awk command)

anurag.singh · December 24, 2010, 5:54am

Modified command above.

venkatajay_18 · December 27, 2010, 3:58am

Thanks Anuraj.

But the posted command is not giving the desired result. It always ends up going to the if(a==1) and prints the "delim count is not same in all lines" even though the delimiter count is same across the file. And also it is not ignoring the delimeter incase it is there in the quotes.

Example of such a case is:
"abc,zyz",xyz

In this case delimeter count is 1.

anurag.singh · December 27, 2010, 4:58am

sed 's/"\([^,]*\),\([^"]*\)"/"\1\2"/g' test.csv | awk '{FS=del; if(cnt && cnt != NF-1) {a=1;exit;} cnt=NF-1;}END{if (a==1) print "delim count is not same in all lines"; else print "delim count is same in all lines"}' del=$delim

delim is a variable, it's value should be set before running above command.
If it doesn't work, Pls post the command along with the result you got.

venkatajay_18 · December 27, 2010, 6:40am

Hi Anurag,

Thanks for your replies.

Here is the list of commands that I ran:

delim=,
echo $delim
output is ,

sed 's/"\([^,]*\),\([^"]*\)"/"\1\2"/g" test.csv | awk '{FS=del; if(cnt && cnt != NF-1) {a=1; exit} cnt=NF-1;}END{if (a==1)
  print "delim count is not same in all lines" else print "delim count is same in all lines"}' del=$delim

-bash: syntax error near unexpected token `{a=1'

anurag.singh · December 27, 2010, 6:45am

missed semicolon in if statement. Also single quote at the end of sed (not double quote) Corrected the earlier post #6. Please check that now.

m.d.ludwig · January 1, 2011, 5:35pm

Since you only care about the number of delimiters, why not just remove everything but, with:

sed -e 's/"\([^"]\|""\)*"/<>/g' -e 's/[^,]//g'

You can then compare lengths of $0 in awk.
And since you don't care about the actual number of fields:

X=`sed -e 's/"\([^"]\|""\)*"/<>/g' -e 's/[^,]//g' | sort -u | wc -l`

if [[ ${X} = 1 ]]; then
    echo delim count is the same in all lines
else
    echo delim count is not the same in all lines
fi

Note that man wc (linux) will return a number with leading spaces, so you have to make an arithmetic comparison, not a string comparison where " 1" is not the same as "1".