Extract values only for certain tags

fretagi · December 10, 2014, 2:47am

Hi

Please I need help on extracting values, of certain tags

MSISDN:

, and

IMSI:

in the following text file

entryDS: 1
nodeId: 11
MSISDN: 258827475309
IMSI: 643012111658984
NAM: 0                                                                
CDC: 41
IMEISV:: U3URIGF2hoc=
AUTHINFO: 0
TSMO: 0
CSP: 27
SUBSCSPVERS: 7
PDPCP: 11
SUBSPDPCPVERS: 5
RSA: 1
SUBSRSAVERS: 4
dn: IMSI=643012112814555,dc=imsi,ou=identities,dc=mcel
objectClass: alias
objectClass: IMSI
structuralObjectClass: alias
entryDS: 0
IMSI: 643012112814555
aliasedObjectName: mscId=bbbbbbbbbbbbbbbbb643012112814555,ou=multiSCs,dc=mcel
CAMP: 0
serv: CSPS
CSLOC: 5
VLRADD: 1925882200060
PSLOC: 0
SGSNNUM: 1925882200054
GSMMSRNMSCN:: AJFSiCIAYPA=
SCLOCSTATE: 6
SZONELOCSTATE: 6
PURGEDATECS:: DgsT
RVLRI: 0
RSGSNI: 0
GMSCADDRESS:: kVKIIgCQ9Q==
CSIVLRSUPP: 1
GSMMAPVERS: 3
GSMUEFEAT: 0
GSMDUALNUMSUP: 1
GPRSRELSUPP: 1
OBOPRI: 1
OBOPRE: 1
SCHAR:: BAA=
CAT: 10
DBSG: 1
OFA: 0
SOCB: 0
PWD: 0000
PWDC: 0
SOCFB: 0
SOCFNRC: 0
SOCFNRY: 0
SOCFU: 0
SODCF: 0
SOSDCF: 7
SOCLIP: 0
SOCLIR: 2
SOCOLP: 0
BS3G: 1
TS11: 1
TS21: 1
TS22: 1
CAW: 1
HOLD: 1
BAIC: 1
BAOC: 1
BICRO: 1
BOIC: 1
BOIEXH: 1
CFB: 1
CFNRC: 1
CFNRY: 1
CFU: 1
CLIP: 1
CLIR: 1
CAWTS10ST: 8
CFBTS10ST: 8
CFUTS10ST: 8
CFNRCTS10ST: 8
CFNRYTS10ST: 8
BAICTS10ST: 8
BAOCTS10ST: 8
BICROTS10ST: 8
BOICTS10ST: 8
BOIEXHTS10ST: 8
BAICTS20ST: 8
BAOCTS20ST: 8
BICROTS20ST: 8
BOICTS20ST: 8
BOIEXHTS20ST: 8
CAWBS30ST: 8
CFBBS30ST: 8
CFUBS30ST: 8
CFNRCBS30ST: 8
CFNRYBS30ST: 8
BAICBS30ST: 8
BAOCBS30ST: 8
BICROBS30ST: 8
BOICBS30ST: 8
BOIEXHBS30ST: 8
SMSCADD32:: kVKIIgAw8A==
SMSCEXPDATE32:: DgwF
MNRF: 1


dn: serv=Identities,mscId=bbbbbbbbbbbbbbbbb643012111658984,ou=multiSCs,dc=mcel
structuralObjectClass: CUDBService
objectClass: CUDBService
objectClass: mscIdentities
entryDS: 1
nodeId: 11
serv: Identities
CDC: 0
IMSI: 643012111658984
imsiMask: '0000000000010001'B
MSISDN: 258827475309
msisdnMask: '0000000000000001'B

I am trying to use either

sed

or

awk

, but with no success

Don_Cragun · December 10, 2014, 3:29am

What awk and sed commands did you try?

fretagi · December 10, 2014, 3:55am

Hi!

I try the following to print only lines with capital M and capital I

sed -n '/^M\|^I/p' imsi

and to print me the lines with word "IMSI" by running

sed -n '/[IMSI]$/p' imsi

where

imsi

is the filename

in the first example the output was

objectClass:IMSI

and the second example, returned nothing

Don_Cragun · December 10, 2014, 4:12am

fretagi:

Hi!

I try the following to print only lines with capital M and capital I
sed -n '/^M\|^I/p' imsi
and to print me the lines with word "IMSI" by running
sed -n '/[IMSI]$/p' imsi
where
imsi
is the filename

in the first example the output was
objectClass:IMSI
and the second example, returned nothing

I would have expected that output from your second example and no output from your first example. The second example should print any line whose last character on the line is I , M , or S .

To print only lines whose first character is an M or an I , try:

sed -n '/^[MI]/p' imsi

and to print lines containing IMSI , try:

sed -n '/IMSI/p' imsi

fretagi · December 10, 2014, 4:28am

your last command worked fine, so how can I use your last command to also include the word

MSISDN

Don_Cragun · December 10, 2014, 4:41am

sed -n -e '/IMSI/p' -e '/MSISDN/p' imsi

fretagi · December 10, 2014, 4:44am

Thank you very much

RavinderSingh13 · December 10, 2014, 4:47am

Hello fretagi,

You can use following also.

awk -F":" '($1 == "MSISDN" || $1 == "IMSI") {print}'  Input_file

Output will be as follows.

MSISDN: 258827475309
IMSI: 643012111658984
IMSI: 643012112814555
IMSI: 643012111658984
MSISDN: 258827475309

Thanks,
R. Singh

fretagi · December 10, 2014, 4:51am

thank you Ravinder, neat output

RavinderSingh13 · December 10, 2014, 4:55am

Hello fretagi,

Above code will look for keywords in first column, if you need to look for given keywords in whole line, please use following for same.

awk '($0 ~ /MSISDN/ || $0 ~ /IMSI/) {print}'  Input_file

Output will be as follows.

MSISDN: 258827475309
IMSI: 643012111658984
dn: IMSI=643012112814555,dc=imsi,ou=identities,dc=mcel
objectClass: IMSI
IMSI: 643012112814555
IMSI: 643012111658984
MSISDN: 258827475309

Thanks,
R. Singh

fretagi · December 10, 2014, 5:04am

your first option is better, because I just want the numbers..., but please can you explain me the syntax?

RavinderSingh13 · December 10, 2014, 5:07am

Hello fretagi,

Here is the explaination for same.

awk -F":" '($1 == "MSISDN" || $1 == "IMSI") {print}'  Input_file

I have taken delimiter as : and then I am checking the first column of it if 1st column is either keyword MSISDN or IMSI then print the lines else do nothing.

Hope this helps, let me know if you have more doubts on same.

Thanks,
R. Singh

fretagi · December 10, 2014, 5:10am

Thank you

junior-helper · December 10, 2014, 5:38am

It should work if you use {print $2} instead of {print} (print only field 2 instead of whole line), but there will be a leading space.
If you don't want the preceding space, try this:
{sub(/^ /,"",$2); print $2} it means remove a single preceding space from the field 2 + print field 2. In fact it means substitute the preceding space with nothing ("").

Don_Cragun · December 10, 2014, 2:26pm

If just the values associated with the keywords were wanted (instead of keyword and value), it would be simpler to use the default awk field delimiters and look for the keywords with the colons added:

awk '$1 == "MSISDN:" || $1 == "IMSI:" {print $2}' Input_file

or:

awk '$1 ~ "^(MSISDN|IMSI):" {print $2}' Input_file

or if there might be lines where there is no whitespace or could be multiple whitespace characters after the colon, you could use:

awk -F':[[:blank:]]*' '$1 ~ "^(MSISDN|IMSI)$" {print $2}' Input_file

or:

awk -F':[[:blank:]]*' '$1 == "MSISDN" || $1 == "IMSI" {print $2}' Input_file

junior-helper · December 11, 2014, 5:17am

Thank you, Don
Those are very useful tips/hints and I will try to acquire them

PS: I think there are two typos in the 2 awk commands in the middle ( }'}' )

Don_Cragun · December 11, 2014, 5:27am

You're correct. I'll fix those two typos in post #15 in this thread.