extract unique pattern from large text file

Hi All,

I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like

12345- , i tried with egrep ,eg : egrep "[0-9]+[-]" text.txt

but which returns all the lines which contains any number of digits followed by hyhen , eg. 1- , 123- , 12345-

how can I modify this to extract only lines which starts with 5 digits followed by hyphen accurately.

Any suggestions in this regard is highly appreciated..

Thanks
Shiju V.Joseph

try this

'[0-9]{5}-'

shiju, you can use as John suggested but, you should probably use like this...

'^[0-9]{5}-'

otherwise it would list 5 or more digits before '-' symbol. you might end up extracting data like
123456-
1234567-
...

hi johnbach,

Thanks very much for replying ,
I tried this method ,but am getting
12345-
123456- etc

I am not getting the lines which has exactly 5digits followed by hyphen

Thanks
Shiju

---------- Post updated at 03:52 AM ---------- Previous update was at 03:50 AM ----------

Hi ilan ,

Thank you very much for replying , i tried

egrep '[1]{5}-' text.txt

but now also it is returning
12345-
123456- etc

I am not getting the line which exactly has 5digits followed by a hyphen ,like 12345-

Thanks
Shiju


  1. 0-9 ↩︎

  2. 0-9 ↩︎

try this dirty solution,

egrep  '[0-9]{5}-'  file |egrep -v '[0-9]{6}'

Hi John,

I tried that but unfortuantely dint work , returned nothing

egrep '[0-9]{5}-' text.txt gave output
egrep '[0-9]{5}-' text.txt |egrep -v '[0-9]{6}' dint give any ouput

shiju.joseph@linux-kmy7:~/Desktop> egrep '[0-9]{5}-' text.txt
123456-123213sdfsdfsdsdfsdfsd
654331-2342342342342342342342
454545-4353453453453453453453
345345-34534534534534534534534
57645756-32542352345235235234523
4234324157-2314234234234234234
shiju.joseph@linux-kmy7:~/Desktop> egrep '[0-9]{5}-' text.txt |egrep -v '[0-9]{6}'
MEA\shiju.joseph@linux-kmy7:~/Desktop>

Thanks
Shiju

grep -vw "[0-9]\{5\}-" filename

Try v and w -vw , it will work

Hi Pritish,

it dint work.

egrep  '[0-9]{5}-'  file |egrep -v '[0-9]{6}-'

Which OS and Flavor u r using man.

HI John,

I am really sorry to say it dint work either...
donnno wat is wrong...
I searched other aletrnatives also but dint get a feasible work around yet.

Thanks
Shiju

123-
123456-
12345-
23456-
123-
12-
212222-
Put all these thing in a file and then
grep -wv "[0-9]\{5\}-" filename

Try to fire it again in ur shell

its openSuSE 11

perl -nl -e ' print if ( /^[0-9]{5}-/ ); ' test

Sorry bro

Just omit -v option from

only use
grep -w "[0-9]\{5\}-" filename

done , and this is the output

linux-kmy7:/home/shiju.joseph/Desktop # grep -wv "[0-9]\{5\}-" text11.txt
123-
123456-
123-
12-
212222-
linux-kmy7:/home/shiju.joseph/Desktop #

I wanted 12345- and 23456- to come in the result,which dint come.

Why do you need to increase the font size? If that is intentional, its not encouraged here.

yes , it worked for the set of data you gave

linux-kmy7:/home/shiju.joseph/Desktop # grep -w "[0-9]\{5\}-" text11.txt
12345-
23456-
linux-kmy7:/home/shiju.joseph/Desktop #

But dint work with my test file with this contents

12345-123213sdfsdfsdsdfsdfsd
65433-2342342342342342342342
45454-4353453453453453453453
34534-34534534534534534534534
576457-32542352345235235234523
42343241-2314234234234234234
2345234523-4523523523452345234523453
23452345234-52345324532452345235
234523452345-234523523452345234523
2345342523452-35234523534252345234
32452345324532-45324523453452345234

and I wanted
12345-123213sdfsdfsdsdfsdfsd
65433-2342342342342342342342
45454-4353453453453453453453
34534-34534534534534534534534 to appear in the result

Whether the tips are working or not I am really happy to see the helping minds from the community , the real power of community , thanks to everyone who responded.

Thanks
Shiju

I am using opensuse, i tried it's working fine. Just add one * after hypen(-)

grep -w "[0-9]\{5\}-*" filename

Hey..that syntax worked...

linux-kmy7:/home/shiju.joseph/Desktop # grep -w "[0-9]\{5\}-*" text.txt
12345-123213sdfsdfsdsdfsdfsd
65433-2342342342342342342342
45454-4353453453453453453453
34534-34534534534534534534534
linux-kmy7:/home/shiju.joseph/Desktop #

thanks Pritish , thanks for your support.

Shiju