Count delimiter(~|*) each row in a file and return 1 or 0

Hi

I want to check delimiter in file. Delimiter in my file is ~|*

sample of file :

ABC~|*edgf~|*T1J333~|*20121130
ABC~|*sdaf~|*T1J333~|*20121130
ABC~|*fsdg~|*T1J333~|*20121130
ABC~|*dfsg~|*T1J333~|*20121130

in this i want to count number delimiter occur is 4 in each row if count is less then or more then 4 in any row then it return 1 else 0

my command is

delim_flag=0
awk -F '~|*' 'BEGIN{delim_flag=0} NF != 4 {delim_flag=1} END{print delim_flag}' "filename.txt"

But this command is not working because delimiter is ~|* same command is working when delimiter is | or ~ or ~| but with * there is problem

Hello Mohanp12,

Welcome to forums, please use code tags for commands/codes/Inputs which you are using in your posts as per forum rules. Following may help you in same.

awk -F"~|*" 'BEGIN{print "Line Number \t status"} {if(NF>4){print NR OFS OFS 1} else {print NR OFS OFS 0}}' OFS="\t"  Input_file

Output will be as follows.

Line Number 	 status
1		1
2		1
3		1
4		1

I am putting 1 in output when condition is Number of fields NF>4 , while taking delimitor as ~|* . You could change status as per your need too, hope this helps. Enjoy learning.

Thanks,
R. Singh

1 Like

The problem is the | as it has a special meaning in regexes (alternation).
Try

awk '{print NR, gsub("~\|*", "&")!=3}' file3
1 0
2 0
3 0
4 0

Rabindra

Thanks in advance

The code you provide is given same error . There is issue with *

[Error]

awk: 0602-521 There is a regular expression error.
        ?*+ not preceded by valid expression.

The input line number is 1. The file is cerf/TGT_OUTBND_20151012_122417DUP.txt.
 The source line number is 1.
DELFLADF=Line Number     status
+ echo Line Number       status
Line Number      status

Hello MOHANP12,

You haven't mentioned your OS name, on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk . Let us know if that helps or please do provide us complete details on your OS and errors for same too. I am using bash(LINUX) system and it is working fine for me.

Thanks,
R. Singh

The asterisk has special meaning in an ERE too.

Assuming you aren't on a Solaris/SunOS system, try:

awk -F '~\|\*' 'NF != 4 {delim_flag=1; exit} END{printf("%d\n", delim_flag}' "filename.txt"

Hi RavinderSingh13

My OS is AIX unix.

[Error] 

+ + awk BEGIN{FS="~|*"} {if(NF!=30){print NR OFS OFS 1} else {print NR OFS OFS 0}} OFS=\t TGT_OUTBND_20151012_122417DUP.txt
awk: 0602-521 There is a regular expression error.
        ?*+ not preceded by valid expression.

 The input line number is 1. The file is TGT_OUTBND_20151012_122417DUP.txt.
 The source line number is 1.
DELFLADF=
+ echo

#!/usr/bin/ksh
set -x
#IFS="~|*" "126 124 42"
DELFLADF=`awk 'BEGIN{FS="~|*"} {if(NF!=30){print NR OFS OFS 1} else {print NR OFS OFS 0}}' OFS="\t" "TGT_OUTBND_20151012_122417DUP.txt"`

#DELFLADF=`nawk -F "~|*" 'BEGIN{delim_flag=0} NF != 30 {delim_flag=1} END{print delim_flag}' "TGT_OUTBND_20151012_122417DUP.txt"`
#DELFLADF=`awk '{print NR, gsub("~\|*", "&")!=30}' "TGT_OUTBND_20151012_122417DUP.txt"`
echo "$DELFLADF"

Already try all condition FS="~|\*" \ for specical chrcter

As I said before, try:
FS='~\|\*'
You need to escape both the pipe symbol and the asterisk!

Actually, depending on where you put the above code you might need different quotes and double escapes. I suggested:

awk -F '~\|\*' ...

which should work. You could also use:

awk -v FS='~\|\*' ...

Inside double quotes depending on where you place the statement:

FS="~\|\*"

you might need:

FS="~\\|\\*"

instead.

Hello MOHANP12,

Could you please try following and let me know if this helps.

awk 'BEGIN{gsub("~|*",FS,$0);print "Line Number \t status"} {if(NF>4){print NR OFS OFS 1} else {print NR OFS OFS 0}}' OFS="\t" Input_file

Considering your columns doesn't have a space in their values. Also here I am taking Number of fields NF>4 you could use it as per your need too. Let us know how it goes.

Thanks,
R. Singh

Thanks in advance but these all condition are already try not working .My OS is AIX.

---------- Post updated at 06:45 AM ---------- Previous update was at 05:21 AM ----------

Hi RabinderSIngh13

I try your code also but got same error

[ERROR]
+ delim_flag=0
+ + awk BEGIN{gsub("~|*",FS,$0);print "Line Number \t status"} {if(NF!=30){print NR OFS OFS 1} else {print NR OFS OFS 0}} OFS=\t /tmps/data/TPMS_DEV5/TPMS/FM_HOME/inbound/remit/cerf/TGT_OUTBND_20151012_122417DUP.txt
awk: 0602-521 There is a regular expression error.
        ?*+ not preceded by valid expression.

 The source line number is 1.
 The error context is
                 >>> BEGIN{gsub("~|*",FS,$0) <<<
DELFLADF=
+ echo

Let me try one more time...

It isn't clear to me whether you are trying to print a single value indicating whether or not any line in the input file does not have exactly 3 field delimiters (0 if every input line has three delimiters; otherwise, 1); or you are trying to print an indication for each line in the input file (line # and 0 if that line has three delimiters; otherwise line # and 1). For the former, try:

awk -F '[~][|][*]' 'NF!=4{x=1;exit}END{print x+0}' file

and for the latter, try:

awk -F '[~][|][*]' '{print NR, (NF!=4)}' file
1 Like

Hi Don Cragun

Thanks

delim_flag=0
awk -F '[~][|][*]' 'BEGIN{delim_flag=0} NF != 30 {delim_flag=1} END{print delim_flag}' "filname"`

This code is working fine. This code test that file contain delimiter as ~|* if any character is missing in any line example ~| or ~* or |* then return 1 or 0

Thanks man you save my day

Hi Ravinder,
In code

awk -F"~|*" 'BEGIN{print "Line Number \t status"} {if(NF>4){print NR OFS OFS 1} else {print NR OFS OFS 0}}' OFS="\t"  Input_file

in the highlighted part , for the first record , condition would be like this

(1>4)

, that will hold false and should print the else part. Kindly explain

Hello looney,

There are 2 points here.
1st: When you take value of variable named NF it will give it's final value as follows. Like how many fields are there in a line. So condition which you mentioned it wouldn't work like that, it will not compare it field by field, rather than that it will take total number of field's value. Now if you want to print the number of fields value per line then following may help you in same.

awk -F '[~][|]
[*]' '{print NF}' Input_file

Output will be as follows.

4
4
4
4

Which means each line has 4 fields, thanks to Don to give this field separator

-F '[~][|]
[*]'

that's the correct one.

2nd: If you need to get the each field with it's respective value then following may help you.

awk -F '[~][|]
[*]' '{for(i=1;i<=NF;i++){print i OFS $i}}'  Input_file

Then you can see the field number and their value as follows here.

1 ABC
2 edgf
3 T1J333
4 20121130
1 ABC
2 sdaf
3 T1J333
4 20121130
1 ABC
2 fsdg
3 T1J333
4 20121130
1 ABC
2 dfsg
3 T1J333
4 20121130
 

So we can see here above each field with their respective field positions.
Hope this helps.

Thanks,
R. Singh

1 Like

Note that your code above has an unmatched backquote at the end of the awk command. And, the awk command and the assignment to the shell variable delim_flag are completely independent and do not affect each other in any way.

Furthermore, the awk command I suggested (modified to look for 29 field delimiters instead of 3:

awk -F '[~][|][*]' 'NF!=30{x=1;exit}END{print x+0}' file

produces the same results as your awk command, but runs faster if any lines are found in your input file that do not contain the desired number of field delimiters. (Your code reads every line in the input file; my suggested code stops reading the input file as soon as it finds a line that does not contain the specified number of fields.)

If you are trying to set a shell variable to a value to indicate whether or not an error (a line with the wrong number of delimiters) was found, the way to do that would be something like:

delim_flag="$(awk -F '[~][|][*]' 'NF!=30{x=1;exit}END{print x+0}' file)"

MY sample data

ABC|edgf|T1J333|201211304
ABC|edgf|T1J333|201211303
ABC|edgf|T1J333|201211302
ABC|edgf|T1J333|20121131
TRL00004

new requirement in which Trailer is add in file

delim_flag=0
	delim_flag_val=`awk -F "|" 'BEGIN{delim_flag=0} NF != 38 {delim_flag=1} END{print dlim_flag}' "FILENAME"`

please suggest how i ignore in same command

Please use code tags as required by forum rules!

Actually, regardless of the trailer being present or not, your code applied to the sample data will yield a "1" printed, as there won't be any lines with 38 fields.

And, in order to present a half way reasonable suggestion to solve your problem, additional info is needed:

  • is the trailer always one line?
  • is the trailer always one field?

MY sample data

ABC|edgf|T1J333|201211304
ABC|edgf|T1J333|201211303
ABC|edgf|T1J333|201211302
ABC|edgf|T1J333|20121131
TRL00004

new requirement in which Trailer is add in file

delim_flag=0
delim_flag_val=`awk -F "|" 'BEGIN{delim_flag=0} NF != 4 {delim_flag=1} END{print dlim_flag}' "FILENAME"`

please suggest how i ignore trailer in file . trailer always one line and one field and start with TRL

Please use code tags as required by forum rules!

---------- Post updated at 15:39 ---------- Previous update was at 15:36 ----------

A few options:

awk -F "|" 'BEGIN {LAST = 4} LAST != 4 {delim_flag=1; exit} {LAST = NF} END{print delim_flag+0}' file
0
awk -F "|" '/^TRL/ {next} NF != 4 {delim_flag=1; exit} END{print delim_flag+0}' file
0

Please let me know how to use code tag