Regex question

boncuk · September 26, 2018, 5:08pm

I want to match all occurrence of 01,03,05,07,10,11 at 9th and 10th position of a string .
I tried the following but its also matching characters like 33 or 11 on 9th and 10th position .

sed  "/^[0-9]\{8\}00[01,03,05,07,10,11]/d" A.TXT
000000001000
433483433339  <<< wrong
121121211100  <<< wrong 
167710000110
167735250310
167735260510
167735280710
167735301010
167735431010
167735451010
167710101110
167730691110
167730611111

RudiC · September 26, 2018, 5:17pm

You said you wanted the 11 ? Why, then, is the second "wrong" wrong?

In your regex, you want 00 in positions 9 and 10. And, you may want to reread the regex documentation, as ( man regex ):

So your bracket expr. seems to have too large a list, including 0 and , . On top, you are deleting the matching lines, so reverse the effect. Try instead:

sed  '/^[0-9]\{8\}\(0[1357]\|1[01]\)/!d' file

or, for better readability

sed  -E '/^[0-9]{8}(0[1357]|1[01])/!d' file

boncuk · September 26, 2018, 9:12pm

hi Rudi .. sorry it was a typo. . 2nd wrong is not wrong. but 1st wrong is wrong.
I ran your command it selects nothing ?

------ Post updated at 09:12 PM ------

no i dont want to 00 in the 9th and 10th position ..i was not sure what 00 meant there.
what i want is simple .. i want to detect any records that contain 01,03,05,07,10 or 11 are in position 9th and 10th position.

0-9 ↩︎

Don_Cragun · September 26, 2018, 9:35pm

Your statement of what you're trying to do is ambiguous. Writing a regular expression to match 01 , 03 , 05 , 07 , 10 , or 11 in character positions 9 and 10 is easy (and RudiC has shown you REs that do that). But what you mean by "detect any records that contain" those strings is not at all clear. If you match one of those strings, what do you want to do?

Do you want to delete all lines that match those strings in that position?

Do you want to delete all lines that DO NOT match those strings in that position?

Do you want to delete those strings from if they occur in that position leaving the rest of the characters on the lines the matched unchanged?

And, this is a prime example where it is crucial that you tell us what operating system (or at least which version of sed ) you're using. Some versions of sed accept non-standard RE forms that are used in some of RudiC's suggestions; other versions of sed will interpret those REs in a different way.

Please explain clearly the operating environment you're using and show us the output you're hoping to produce from the sample input you provided.

boncuk · September 26, 2018, 10:19pm

ok let me tell it more clearly ..lets say I have a file a.txt

$ type a.txt
000000001000
433483433339 record 2
121121211100
167710000110
167735250310
167735260510
167735280710
167735431010
167750000010 record 9
167710101110

I am attempting to find if on any line I DO NOT have either 01 , 03 , 05 , 07 , 10 or 11 in position 9th or 10th.
if you look at record 2 above it as "33" in 9th and 10th position which is not in my list right? similarly if you look at record 9 ,it also
doesnt meet my criteria since it has "00" in 9th and 10th position.
other than these two all records meet my criteria.
so in this case i want the sed to produce an output as follows which are the voilating lines:

433483433339
167750000010

------------------------------------------------------------------
lets take another example .. lets say my file a.txt contains

167750000110 
888881881188

in this case all of the records match my criteira so sed should output nothing
cand you modify my sed command to achieve this ?

Don_Cragun · September 26, 2018, 10:25pm

I repeat: What operating system are you using? What version of sed are you using?

boncuk · September 26, 2018, 11:47pm

Rudi's sed is not returning me anything

$ type a.txt
000000001000
433483433339
121121211100
167710000110
167735250310
167735260510
167735280710
167735301010
167735431010
167735451010
167750000010
167710101110
167730691110
167730611111
$ sed  -E "/^[0-9]{8}(0[1357]|1[01])/!d" a.txt
$

------ Post updated at 10:39 PM ------

open VMS .. almost same to other other operating systems.

------ Post updated at 10:41 PM ------

sorry i said it wrong earlier. . i want sed to notify me for any violating record. that's the goal

------ Post updated at 10:43 PM ------

sorry for making a mistake in my original explaination , i corrected it .. i just want sed to output the violating records if any

------ Post updated at 10:55 PM ------

anotehr example to clear things up , from this file i want sed to output

000000000000
433483433339

because both of them do not have either 01,03,05,07,10,11 at 9th and 10th position

$ type a.txt
000000001000
000000000000
433483433339
121121211100
167710000110
167735250310
167735260510
167735280710
167735301010
167735431010
167735451010
167750000010
167710101110
167730691110
167730611111

------ Post updated at 11:00 PM ------

i am stuck with production issue need help urgently .
Don you said finding 01,03,05,07,10,11 at 9th an 10th position in a string is easy .. can you give me sed command for it please?

------ Post updated at 11:08 PM ------

I tried Rudi's command on sun solaris, Linux . .also not working .
If you can give me this solution on solaris or Linux its also fine for me

oracle$ sed  -E '/^[0-9]{8}(0[1357]|1[01])/!d'
sed: illegal option -- E
oracle$ uname -a
SunOS  5.11 11.3 sun4v sparc sun4v
$

[oracle ~]$ uname -a
Linux 004160PZ000 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[oracle ~]$ sed  -E '/^[0-9]{8}(0[1357]|1[01])/!d' a.txt
sed: invalid option -- E

------ Post updated at 11:47 PM ------

[quote="rudic

in your regex, you want 00 in positions 9 and 10. and, you may want to reread the regex documentation, as ( man regex ): so your bracket expr. seems to have too large a list, including 0 and , . on top, you are deleting the matching lines, so reverse the effect. try instead:

sed  '/^[0-9"]
\{8\}\(0[1357]\|1[01]\)/!d' file

or, for better readability

sed  -E '/^[0-9]{8}(0[1357]|1[01])/!d' file

[/quote]

I tried your command on unix and its returning me the strings that have valid numbers at 9th and 10th position e.g "01","03","05","07","10","11" ,
I want only the three records from this file that have "33" ,"17" and "00" at 9th and 10th position.

operating system is sun solaris and Linux .

oracle:$ sed  '/^[0-9]\{8\}\(0[1357]\|1[01]\)/d' a.txt
000000001000
433483433339  <<< want this to be spitted out by sed  33 at 9th n 10th pos
121121211100 
167710001710  <<< want this to be spitted out by sed 17 at 9th and 10th pos
167735250310
167735260510
167735280710
167735301010
167735431010
167735451010
167710101110
167730691110
167730600000  <<< want this to be spitted out by sed 00 at 9th and 10th pos

RudiC · September 27, 2018, 2:54am

Now - OpenVMS definitely is NOT "almost same to other other operating systems." Refer to their help system (unfortunately it's quite some time so I can't remember the correct syntax) about what options and regexes they accept.

Do you have access to GNU sed on your system?

You could try the -r option, which on several systems is equivalent to -E .

Applied to your a.txt file from post#6 on Ubuntu 18.04:

sed  -E '/^[0-9]{8}(0[1357]|1[01])/d' file
000000000000
433483433339
167750000010
sed  '/^[0-9]\{8\}\(0[1357]\|1[01]\)/d' file
000000000000
433483433339
167750000010

EDIT: on FreeBSD 9.0-RELEASE, the BRE sed fails, but

sed  -E '/^[0-9]{8}(0[1357]|1[01])/d' file
000000000000
433483433339
167750000010

This works on my FreeBSD:

sed  '/^[0-9]\{8\}0[1357]/d; /^[0-9]\{8\}1[01]/d' file

Don_Cragun · September 27, 2018, 3:47am

Just using standard sed features and avoiding cases where systems allowing EREs or BREs make a difference, both of the following seem to also do what you want:

sed -e '/^........0[1357]/d' -e '/^........1[01]/d' a.txt

and:

sed '/^........0[1357]/d;/^........1[01]/d' a.txt

But, I have no experience with openVMS, so I can't say whether or not either of these will work there.

boncuk · September 27, 2018, 10:08am

sorry the browswer not putting code tags around the code when I click on button , so I am adding them manually hope it works.
can someone help me on my sed command please?

I am using sun solaris and Linux , what I want is SED to print any string (or output it to a file preferably) that does not have either "01","03","05","07","10" or "11" on the 9th and 10th position .
e.g from the file below I only want these three lines

433483433339
167710001710
167730600000

$cat a.txt
 000000001000
 433483433339 <<< print this since 33 is at 9th and 10th pos
 121121211100 
 167710001710 <<< print this since 17 is at 9th and 10th pos
 167735250310
 167735260510
 167735280710
 167730600000 <<< print this since 00 is at 9th and 10th pos

------ Post updated at 10:07 AM ------

don cragun:

Just using standard sed features and avoiding cases where systems allowing EREs or BREs make a difference, both of the following seem to also do what you want:
sed -e '/^........0[1357]/d' -e '/^........1[01]/d' a.txt
and:
sed '/^........0[1357]/d;/^........1[01]/d' a.txt
But, I have no experience with openVMS, so I can't say whether or not either of these will work there.

awesome this one worked

------ Post updated at 10:08 AM ------

how does this command works though .. I understand each parts but how the output of one gets piped to the other?

sed -e '/^........0[1357]/d' -e '/^........1[01]/d' a.txt

Don_Cragun · September 27, 2018, 10:38am

boncuk:

... ... ...
Originally Posted by Don Cragun
Just using standard sed features and avoiding cases where systems allowing EREs or BREs make a difference, both of the following seem to also do what you want:
sed -e '/^........0[1357]/d' -e '/^........1[01]/d' a.txt
and:
sed '/^........0[1357]/d;/^........1[01]/d' a.txt
But, I have no experience with openVMS, so I can't say whether or not either of these will work there.
awesome this one worked

------ Post updated at 10:08 AM ------

how does this command works though .. I understand each parts but how the output of one gets piped to the other?
sed -e '/^........0[1357]/d' -e '/^........1[01]/d' a.txt

There is a single sed command invocation here containing two editing commands. No piping is involved. The first editing command deletes every line that contains 01 , 03 , 05 m or 07 as the 9th and 10th characters on a line. If that command didn't delete the line, the 2nd editing commands deletes every line that contains 10 or 11 as the 9th and 10th characters on a line. If neither of those editing commands deleted the input line, the default sed action is to copy the input line to standard output.

When there are two editing commands to be performed by one invocation of sed , each of those editing commands can be introduced as separate -e option arguments (as shown in the first sed command above).

Many sed commands (including the delete command) can be entered as a single -e option argument by separating them with a semicolon. And, if only one editing command argument is needed, including the actual -e option is optional (as shown in the second sed command above).