Extracting recording using egrep

vegasluxor · October 9, 2013, 6:32am

Hello,

Need help using egrep

file format:
test.gz

Date, time, , number 1, number 2, counter...
20130618,912154, ,009912345678,911111111111,10000, , ,abc
20130618,982148, ,009101373810,034028791952294,8999, , ,gh

Want to extract number1 OR number2 records in files based on a two or three digit country code.
example:
country code: 91
then output should be line containing number1 or number2 starting with 91. (cases 0091<number>, 091<number>, +91<number>,91<number> )

How to use regular expressions in egrep ? Search position after 3rd comma or 4th comma and check for country code. Please help. Thanks !!!

rbatte1 · October 11, 2013, 6:35am

You could try with the expression similar to this:-

grep "^[0-9]*,[0-9]*, ,009[19] inputfile

I think that this translates as:-

^ - from start of line
[0-9] - the digits
* - repeated as many times as needed
, - the literal character which is your field separator
This is shown twice, then a space and the string starting 009 then either a 1 or a 9.

Does that get you started? You may need to use grep -E or egrep with the character | as a logical or between them if you want to have multiple options, e.g.:-

grep -E "^[0-9]*,[0-9]*, ,009[19]|^[0-9]*,[0-9]*, ,9[19]|^[0-9]*,[0-9]*, ,[0-9]*,009[19]|^[0-9]*,[0-9]*, ,[0-9]*,9[19] inputfile

I hope that this helps. I might have missed the point, so please show me some more input & expected results along with what you have tried & the output and I will see if we can refine it a bit.

Robin
Liverpool/Blackburn
UK

ctsgnb · October 11, 2013, 6:59am

Why not using awk ?

$ cat tst2
Date, time, , number 1, number 2, counter...
20130618,912154, ,009912345678,911111111111,10000, , ,abc
20130618,912154, ,009912345678,921111111111,10000, , ,abc
20130618,982148, ,009101373810,034028791952294,8999, , ,gh
$ CountryCode=92
$ awk -F, -vC="$CountryCode" '($4~"^[ 0]*"C)||($5~"^[ 0]*"C)' tst2
20130618,912154, ,009912345678,921111111111,10000, , ,abc
$ CountryCode=91
$ awk -F, -vC="$CountryCode" '($4~"^[ 0]*"C)||($5~"^[ 0]*"C)' tst2
20130618,912154, ,009912345678,911111111111,10000, , ,abc
20130618,982148, ,009101373810,034028791952294,8999, , ,gh
$

vegasluxor · October 11, 2013, 8:22am

@Robin
Thanks a lot !!! Yes! this will defiantly help me to start.

---------- Post updated at 07:22 AM ---------- Previous update was at 07:21 AM ----------
@ctsgnb
Will try it... thanks a lot!

RudiC · October 11, 2013, 8:32am

How about

grep -E "^(.*,){3}[ 0+,]*91" file

rbatte1 · October 11, 2013, 8:36am

Whoa! RudiC What's all that?

It looks neat, but I will have to work it out. :o

It could be wonderful shorthand that I will be pleased to learn/use.

Thanks,
Robin

RudiC · October 11, 2013, 8:59am

This may be even better, as it makes the leading comma check compulsory for both fields:

grep -E "^(.*,){2}.*, *[0+]{,2}91" file

rbatte1 · October 18, 2013, 8:53am

^(.*,){2}.*, *[0+]{,2}91

Please tell me if I have understood correctly, trying to color code to make it obvious what I'm looking at. The above expression equates to:-

From the start of line
Zero or more characters finishing in a comma (a field to us)
Twice
Another field
Zero or more spaces
A zero or a plus character
{,2} - okay I'm stuck on this one.
Literal string 91

For the one I'm stuck on, i have something telling me about {3} , {3,} or {3,6} but not the format you have. I could guess that it is 'less than or equal to 2' but my simple testing doesn't seem to get very far.

Robin

RudiC · October 18, 2013, 1:39pm

Actually, the more I look at my code, the more I see it needs some improvements/corrections, as I didn't test it against ALL possible input constellations. It worked for the file given, also for a handful of other test cases, but there may be uncovered cases.
Anyhow, my grep (grep (GNU grep) 2.14) did not complain on that {,2}, but you may be right that this is not a usual extended regex.