Extract values in a line using awk

hello all,

I need your help in extracting values of some parameter within a line using awk.

for example:

i have the below line available in a file and i want to extract the values of only CustomerId, s_PackageId and s_HZINumbers in order the result to be as

967666666666|"HomeZone"|"17103","10803","100000","200000","300000"

the file called test.file which has the contents as below:

Read:RPP(CustomerId="967666666666",OfferProfileKey=1,Key=7,category=ONLINE,prefetchFilter=-1,s_ActivationEndTime=32532613199999,s_ActivationStartTime=1459112400000,s_CRMTitle="-",s_CanBeSharedByMultipleRops=FALSE,s_InsertedViaBatch=TRUE,s_PackageId="HomeZone",s_PeriodStartPoint=0,vValidFrom=2016-03-28 00:00:00,vInvalidFrom=4000-01-01 00:00:00,s_FreeOfChargeModifyTasks=0,s_HZINumbers=["17103","10803","100000","200000","300000"],s_PeriodicBonus={CreditLimit=0,CreditMax=1000000},s_Touched=TRUE,bCategory=ONLINE,bSeriesId=0,bValidFrom=2009-01-01 00:00:00,bInvalidFrom=MAX_DATEANDTIME,s_Active=TRUE,s_ExpireDate=3000-01-01,s_FreeOfChargePeriods=0,s_NextPeriodAct=32532613199999,s_OnTouchDate=32532613199999,s_StartDate=1970-01-01,s_Valid=TRUE);

i used the below command

awk '{if (!match($0, /CustomerId=([-0-9a-zA-Z]*)/)) next; cust = (substr($0, RSTART + 12, RLENGTH + 1));if (!match($0, /s_PackageId="HomeZone"/)) next; pkg = (substr($0, RSTART + 12, RLENGTH -12));if (!match($0, /s_HZINumbers=([-0-9a-zA-Z,]*)/)) next; hzin = (substr($0, RSTART + 14, RLENGTH - 14));printf("%s|%s|%s\n", cust,pkg,hzin); }' test.file

i get the result for customerid, and package only. the result for s_HZINumbers is not appearing.

967666666666|"HomeZone"|

i would appreciate it if you get it solved for me,

Thank you

You can try this with sed:

$ sed 's/.*CustomerId="\([^"]\+\)".*s_PackageId=\([^,]\+\).*s_HZINumbers=\[\([^]]\+\)\].*/\1|\2|\3/g' infile
967666666666|"HomeZone"|"17103","10803","100000","200000","300000"
1 Like

i have so many lines inside the file with different customerid and packageid values. i want to specify that if the packageid="HomeZone" then i need s_HZINumbers values to be listed along with the customerid, packageid and s_HZINumbers.

Hello nael_najib,

Could you please try following and let me know if this helps you.

awk '{match($0,/CustomerId=/);VAL=substr($0,RSTART+12,RLENGTH);match($0,/s_PackageId=/);VAL=VAL "|" substr($0,RSTART+12,RLENGTH-2);match($0,/HZINumbers=\[[^\]]*/);print VAL "|" substr($0,RSTART+12,RLENGTH-12)}'   Input_file

Output will be as follows.

96766666666|"HomeZone"|"17103","10803","100000","200000","300000"

EDIT: Adding a non-one liner form of solution.

awk '{
        match($0,/CustomerId=/);
        VAL=substr($0,RSTART+12,RLENGTH);
        match($0,/s_PackageId=/);
        VAL=VAL "|" substr($0,RSTART+12,RLENGTH-2);
        match($0,/HZINumbers=\[[^\]]*/);
        print VAL "|" substr($0,RSTART+12,RLENGTH-12)
     }
    '   Input_file

Thanks,
R. Singh

Thank you RavinderSingh13. i actually need to make (if) condition when specifying the ( packageid="HomeZone" ) or otherwise it will list lots of entries.

Hello nael_najib,

Your welcome, could you please try following and let me know how it goes then.

awk '{match($0,/CustomerId=/);VAL=substr($0,RSTART+12,RLENGTH);match($0,/s_PackageId=/);Q=substr($0,RSTART+12,RLENGTH-2);VAL=VAL "|" Q;match($0,/HZINumbers=\[[^\]]*/);if(Q == "\"HomeZone\""){print VAL "|" substr($0,RSTART+12,RLENGTH-12)}}'   Input_file

EDIT: Adding a non-one liner form of above solution too now.

awk '{
        match($0,/CustomerId=/);
        VAL=substr($0,RSTART+12,RLENGTH);
        match($0,/s_PackageId=/);
        Q=substr($0,RSTART+12,RLENGTH-2);
        VAL=VAL "|" Q;
        match($0,/HZINumbers=\[[^\]]*/);
                if(Q == "\"HomeZone\""){
                                        print VAL "|" substr($0,RSTART+12,RLENGTH-12)
                                       }
     }
    '     Input_file
 

Thanks,
R. Singh

1 Like

It worked now. I really appreciate the good and fast response RavinderSingh13. you are really great

Try this small adaption to zaxxon's file proposal:

sed -n '/HomeZone/ s/.*CustomerId="\([^"]\+\)".*s_PackageId=\([^,]\+\).*s_HZINumbers=\[\([^]]\+\)\].*/\1|\2|\3/p' file3
967666666666|"HomeZone"|"17103","10803","100000","200000","300000"
1 Like

awk with a logic differing slightly from RavinderSingh13's (and, it works also on IDs longer or shorter than 12 digits...):

awk '
match ($0, /s_PackageId="HomeZone"/)    {PI = substr ($0, RSTART+12, RLENGTH-12)
                                         match ($0, /CustomerId="[^"]*/)
                                         CI = substr ($0, RSTART+12, RLENGTH-12)
                                         match ($0, /HZINumbers=\[[^]]*/)
                                         HN = substr ($0, RSTART+12, RLENGTH-12)
                                         print CI, PI, HN
                                        }
' OFS="|" file
967666666666|"HomeZone"|"17103","10803","100000","200000","300000"
1 Like

Hello All again,

The codes above actually worked properly but i encountered another problem. sometimes, the field HZINumbers doesn't appear in the line and i want the default value for such case to be "0". how can i do that?

---------- Post updated at 04:05 AM ---------- Previous update was at 03:37 AM ----------

The below is an example:

Read:RPP(CustomerId="961666666666",OfferProfileKey=1,Key=7,category=ONLINE,prefetchFilter=-1,s_ActivationEndTime=32532613199999,s_ActivationStartTime=1459112400000,s_CRMTitle="-",s_CanBeSharedByMultipleRops=FALSE,s_InsertedViaBatch=TRUE,s_PackageId="HomeZone",s_PeriodStartPoint=0,vValidFrom=2016-03-28 00:00:00,vInvalidFrom=4000-01-01 00:00:00,s_FreeOfChargeModifyTasks=0,s_HZINumbers=["17103","10803","100000","200000","300000"],s_PeriodicBonus={CreditLimit=0,CreditMax=1000000},s_Touched=TRUE,bCategory=ONLINE,bSeriesId=0,bValidFrom=2009-01-01 00:00:00,bInvalidFrom=MAX_DATEANDTIME,s_Active=TRUE,s_ExpireDate=3000-01-01,s_FreeOfChargePeriods=0,s_NextPeriodAct=32532613199999,s_OnTouchDate=32532613199999,s_StartDate=1970-01-01,s_Valid=TRUE);


Read:RPP(CustomerId="961666666662",OfferProfileKey=1,Key=7,category=ONLINE,prefetchFilter=-1,s_ActivationEndTime=32532613199999,s_ActivationStartTime=1459112400000,s_CRMTitle="-",s_CanBeSharedByMultipleRops=FALSE,s_InsertedViaBatch=TRUE,s_PackageId="HomeZone",s_PeriodStartPoint=0,vValidFrom=2016-03-28 00:00:00,vInvalidFrom=4000-01-01 00:00:00,s_FreeOfChargeModifyTasks=0,,s_PeriodicBonus={CreditLimit=0,CreditMax=1000000},s_Touched=TRUE,bCategory=ONLINE,bSeriesId=0,bValidFrom=2009-01-01 00:00:00,bInvalidFrom=MAX_DATEANDTIME,s_Active=TRUE,s_ExpireDate=3000-01-01,s_FreeOfChargePeriods=0,s_NextPeriodAct=32532613199999,s_OnTouchDate=32532613199999,s_StartDate=1970-01-01,s_Valid=TRUE);

as you can see, in the first line the HZINumbers field is appearing but in the second line its not appearing and for that i don't get any result for the second line although the CustomerId and s_packageid are there.

and i want the output to be:

961666666666|"HomeZone"|"17103","10803","100000","200000","300000"
961666666662|"HomeZone"|0

Hello nael_najib,

Could you please try following and let me know how it goes then.

awk -vs1="CustomerId=" -vs2="PackageId" -vs3="s_HZINumbers" '!/^$/{match($0,/CustomerId=\"[0-9]+/);Q=substr($0,RSTART+length(s1)+1,RLENGTH-length(s1)-1);match($0,/PackageId[^"]*"/);P=substr($0,RSTART+length(s2)+1,RLENGTH-2);Q=substr($0,RSTART+length(s2)+1,RLENGTH-2)?Q OFS substr($0,RSTART+length(s2)+1,RLENGTH-1):Q OFS 0;match($0,/s_HZINumbers[^]]*/);Q=substr($0,RSTART+length(s3)+2,RLENGTH-length(s3)-2)?Q OFS substr($0,RSTART+length(s3)+2,RLENGTH-length(s3)-2):Q OFS 0;print Q}' OFS="|"   Input_file

Output will be as follows.

961666666666|"HomeZone"|"17103","10803","100000","200000","300000"
961666666662|"HomeZone"|0
 

Thanks,
R. Singh

1 Like

Thank you Ravinder, it worked. but if i want to add another field example (s_NextPeriodAct) and to have (0 zero) value if its not available in the line. how would the code look like?

I appreciate your patience and good response

Wouldn't it be nice if you posted some efforts by yourself to be discussed? Adapt the solution given before:

awk '
match ($0, /s_PackageId="HomeZone"/)    {PI = substr ($0, RSTART+12, RLENGTH-12)
                                         match ($0, /CustomerId="[^"]*/)
                                         CI = substr ($0, RSTART+12, RLENGTH-12)
                                         if (match ($0, /HZINumbers=\[[^]]*/))
                                                HN = substr ($0, RSTART+12, RLENGTH-12)
                                           else HN = 0
                                         print CI, PI, HN         
                                        }
' OFS="|" file
961666666666|"HomeZone"|"17103","10803","100000","200000","300000"
961666666662|"HomeZone"|0
1 Like

Now, the code is clearer and can add more fields. Thank you all for the good efforts