Extract values only for certain tags

Hi

Please I need help on extracting values, of certain tags

MSISDN:

, and

IMSI:

in the following text file

entryDS: 1
nodeId: 11
MSISDN: 258827475309
IMSI: 643012111658984
NAM: 0                                                                
CDC: 41
IMEISV:: U3URIGF2hoc=
AUTHINFO: 0
TSMO: 0
CSP: 27
SUBSCSPVERS: 7
PDPCP: 11
SUBSPDPCPVERS: 5
RSA: 1
SUBSRSAVERS: 4
dn: IMSI=643012112814555,dc=imsi,ou=identities,dc=mcel
objectClass: alias
objectClass: IMSI
structuralObjectClass: alias
entryDS: 0
IMSI: 643012112814555
aliasedObjectName: mscId=bbbbbbbbbbbbbbbbb643012112814555,ou=multiSCs,dc=mcel
CAMP: 0
serv: CSPS
CSLOC: 5
VLRADD: 1925882200060
PSLOC: 0
SGSNNUM: 1925882200054
GSMMSRNMSCN:: AJFSiCIAYPA=
SCLOCSTATE: 6
SZONELOCSTATE: 6
PURGEDATECS:: DgsT
RVLRI: 0
RSGSNI: 0
GMSCADDRESS:: kVKIIgCQ9Q==
CSIVLRSUPP: 1
GSMMAPVERS: 3
GSMUEFEAT: 0
GSMDUALNUMSUP: 1
GPRSRELSUPP: 1
OBOPRI: 1
OBOPRE: 1
SCHAR:: BAA=
CAT: 10
DBSG: 1
OFA: 0
SOCB: 0
PWD: 0000
PWDC: 0
SOCFB: 0
SOCFNRC: 0
SOCFNRY: 0
SOCFU: 0
SODCF: 0
SOSDCF: 7
SOCLIP: 0
SOCLIR: 2
SOCOLP: 0
BS3G: 1
TS11: 1
TS21: 1
TS22: 1
CAW: 1
HOLD: 1
BAIC: 1
BAOC: 1
BICRO: 1
BOIC: 1
BOIEXH: 1
CFB: 1
CFNRC: 1
CFNRY: 1
CFU: 1
CLIP: 1
CLIR: 1
CAWTS10ST: 8
CFBTS10ST: 8
CFUTS10ST: 8
CFNRCTS10ST: 8
CFNRYTS10ST: 8
BAICTS10ST: 8
BAOCTS10ST: 8
BICROTS10ST: 8
BOICTS10ST: 8
BOIEXHTS10ST: 8
BAICTS20ST: 8
BAOCTS20ST: 8
BICROTS20ST: 8
BOICTS20ST: 8
BOIEXHTS20ST: 8
CAWBS30ST: 8
CFBBS30ST: 8
CFUBS30ST: 8
CFNRCBS30ST: 8
CFNRYBS30ST: 8
BAICBS30ST: 8
BAOCBS30ST: 8
BICROBS30ST: 8
BOICBS30ST: 8
BOIEXHBS30ST: 8
SMSCADD32:: kVKIIgAw8A==
SMSCEXPDATE32:: DgwF
MNRF: 1


dn: serv=Identities,mscId=bbbbbbbbbbbbbbbbb643012111658984,ou=multiSCs,dc=mcel
structuralObjectClass: CUDBService
objectClass: CUDBService
objectClass: mscIdentities
entryDS: 1
nodeId: 11
serv: Identities
CDC: 0
IMSI: 643012111658984
imsiMask: '0000000000010001'B
MSISDN: 258827475309
msisdnMask: '0000000000000001'B

I am trying to use either

sed

or

awk

, but with no success

What awk and sed commands did you try?

Hi!

I try the following to print only lines with capital M and capital I

sed -n '/^M\|^I/p' imsi

and to print me the lines with word "IMSI" by running

sed -n '/[IMSI]$/p' imsi

where

imsi

is the filename

in the first example the output was

objectClass:IMSI

and the second example, returned nothing

I would have expected that output from your second example and no output from your first example. The second example should print any line whose last character on the line is I , M , or S .

To print only lines whose first character is an M or an I , try:

sed -n '/^[MI]/p' imsi

and to print lines containing IMSI , try:

sed -n '/IMSI/p' imsi

your last command worked fine, so how can I use your last command to also include the word

MSISDN
sed -n -e '/IMSI/p' -e '/MSISDN/p' imsi
1 Like

Thank you very much

Hello fretagi,

You can use following also.

awk -F":" '($1 == "MSISDN" || $1 == "IMSI") {print}'  Input_file

Output will be as follows.

MSISDN: 258827475309
IMSI: 643012111658984
IMSI: 643012112814555
IMSI: 643012111658984
MSISDN: 258827475309

Thanks,
R. Singh

1 Like

thank you Ravinder, neat output

Hello fretagi,

Above code will look for keywords in first column, if you need to look for given keywords in whole line, please use following for same.

awk '($0 ~ /MSISDN/ || $0 ~ /IMSI/) {print}'  Input_file

Output will be as follows.

MSISDN: 258827475309
IMSI: 643012111658984
dn: IMSI=643012112814555,dc=imsi,ou=identities,dc=mcel
objectClass: IMSI
IMSI: 643012112814555
IMSI: 643012111658984
MSISDN: 258827475309

Thanks,
R. Singh

your first option is better, because I just want the numbers..., but please can you explain me the syntax?

Hello fretagi,

Here is the explaination for same.

awk -F":" '($1 == "MSISDN" || $1 == "IMSI") {print}'  Input_file

I have taken delimiter as : and then I am checking the first column of it if 1st column is either keyword MSISDN or IMSI then print the lines else do nothing.

Hope this helps, let me know if you have more doubts on same.

Thanks,
R. Singh

1 Like

Thank you

It should work if you use {print $2} instead of {print} (print only field 2 instead of whole line), but there will be a leading space.
If you don't want the preceding space, try this:
{sub(/^ /,"",$2); print $2} it means remove a single preceding space from the field 2 + print field 2. In fact it means substitute the preceding space with nothing ("").

If just the values associated with the keywords were wanted (instead of keyword and value), it would be simpler to use the default awk field delimiters and look for the keywords with the colons added:

awk '$1 == "MSISDN:" || $1 == "IMSI:" {print $2}' Input_file

or:

awk '$1 ~ "^(MSISDN|IMSI):" {print $2}' Input_file

or if there might be lines where there is no whitespace or could be multiple whitespace characters after the colon, you could use:

awk -F':[[:blank:]]*' '$1 ~ "^(MSISDN|IMSI)$" {print $2}' Input_file

or:

awk -F':[[:blank:]]*' '$1 == "MSISDN" || $1 == "IMSI" {print $2}' Input_file
1 Like

Thank you, Don :b:
Those are very useful tips/hints and I will try to acquire them :cool:

PS: I think there are two typos in the 2 awk commands in the middle ( }'}' )

1 Like

You're correct. I'll fix those two typos in post #15 in this thread.