A script need help

green_k · October 1, 2019, 11:10pm

Hi Gurus,
I have below requirement and have no idea how to achieve this.
the input file like below. there are multiple sections in file, each section has multiple lines. I need to find certain lines (value1, value2, value3 are key words for line searching) and generate another file. in some section, maybe some lines missing. for missing line, output "Missing"
My OS is SunOS 5.10 Generic_150400-64 sun4v sparc sun4v

[section_abc]
xxxxx
xxxx
CITY=ABC
REGION=CDE
STREET=EFG
[section_cde]
xxxxx
xxxx
CITY=xyz
REGION=123
STREET=345
[section_efg]
xxxxx
xxxx
CITY=900
REGION=200
[section_ghi]
xxxxx
xxxx
REGION=500
STREET=600

expected output file like below:

[section_abc] CITY=ABC REGION=CDE STREET=EFG
[section_cde] CITY=xyz REGION=123 STREET=345
[section_efg] CITY=900 REGION=200 MISSING 
[section_ghi] missing    REGION=500 STREET=600

thanks in advance

Yoda · October 1, 2019, 11:31pm

Here is one approach using awk :-

awk -F= '
        /section/ {
                sc = $1
                S[sc]
                next
        }
        /^VALUE/ {
                A[sc FS $1] = $0
        }
        END {
                for ( k in S )
                        print k, A[k FS "VALUE1"] ? A[k FS "VALUE1"] : "MISSING", A[k FS "VALUE2"] ? A[k FS "VALUE2"] : "MISSING", A[k FS "VALUE3"] ? A[k FS "VALUE3"] : "MISSING"
        }
' file

Chubler_XL · October 1, 2019, 11:41pm

And another awk approach:

awk -v want="VALUE1,VALUE2,VALUE3" -F'[=\\][]' '
function prnsection(i) {
   if(length(section)) {
     printf "%s",section;
     for(i=1;i in keypos;i++) {
       printf " %s", keys[keypos]
       keys[keypos]="MISSING"
     }
     printf "\n"
   }
}
BEGIN {
   for(i=split(want, keypos, ",");i;i--) {
       keys[keypos]="MISSING";
   }
}
NF>2 { prnsection(); section=$0 }
$1 in keys { keys[$1]=$0 };
END { prnsection() }' infile

green_k · October 2, 2019, 12:08am

yoda:

Here is one approach using awk :-

awk -F= '
   /section/ {
   sc = $1
   S[sc]
   next
   }
   /^VALUE/ {
   A[sc FS $1] = $0
   }
   END {
   for ( k in S )
   print k, A[k FS "VALUE1"] ? A[k FS "VALUE1"] : "MISSING", A[k FS "VALUE2"] ? A[k FS "VALUE2"] : "MISSING", A[k FS "VALUE3"] ? A[k FS "VALUE3"] : "MISSING"
   }
' file

thanks Yoda for your quick response. the code work with my sample data. it is my bad I didn't provide sample data correctly. the value1 , value2 and value3 don't have any relation. for example: CITY, REGION, STREET. I updated my post.

--- Post updated at 12:08 AM ---

chubler_xl:

And another awk approach:

awk -v want="VALUE1,VALUE2,VALUE3" -F'[=\\][]' '
function prnsection(i) {
   if(length(section)) {
   printf "%s",section;
   for(i=1;i in keypos;i++) {
   printf " %s", keys[keypos]
   keys[keypos]="MISSING"
   }
   printf "\n"
   }
}
BEGIN {
   for(i=split(want, keypos, ",");i;i--) {
   keys[keypos]="MISSING";
   }
}
NF>2 { prnsection(); section=$0 }
$1 in keys { keys[$1]=$0 };
END { prnsection() }' infile

thanks Chubler_XL for you quick response, the code works as I expected.

green_k · October 2, 2019, 7:24pm

yoda:

Here is one approach using awk :-

awk -F= '
   /section/ {
   sc = $1
   S[sc]
   next
   }
   /^VALUE/ {
   A[sc FS $1] = $0
   }
   END {
   for ( k in S )
   print k, A[k FS "VALUE1"] ? A[k FS "VALUE1"] : "MISSING", A[k FS "VALUE2"] ? A[k FS "VALUE2"] : "MISSING", A[k FS "VALUE3"] ? A[k FS "VALUE3"] : "MISSING"
   }
' file

Hi Yoda,
I modify this code to match multiple pattern, it works fine. I have below question.

what's purpose of FS in this array. A[sc FS $1]. my understanding is when match find, then assign $0 to array A with index "section and $1".
is it possible using pattern match at below code value1, value2? in the file, this value are not same. for example:
there are some value like below: in this case, we consider these two city are same.

[section_abc]
CITY_1=ABC
[section_cde]
CITY_new=xyz

print k, A[k FS "VALUE1"] ? A[k FS "VALUE1"] : "MISSING", A[k FS "VALUE2"] ? A[k FS "VALUE2"] : "MISSING", A[k FS "VALUE3"] ? A[k FS "VALUE3"] : "MISSING"

green_k · October 3, 2019, 8:53am

chubler_xl:

And another awk approach:

awk -v want="VALUE1,VALUE2,VALUE3" -F'[=\\][]' '
function prnsection(i) {
   if(length(section)) {
   printf "%s",section;
   for(i=1;i in keypos;i++) {
   printf " %s", keys[keypos]
   keys[keypos]="MISSING"
   }
   printf "\n"
   }
}
BEGIN {
   for(i=split(want, keypos, ",");i;i--) {
   keys[keypos]="MISSING";
   }
}
NF>2 { prnsection(); section=$0 }
$1 in keys { keys[$1]=$0 };
END { prnsection() }' infile

Hi Chubler_XL, thanks for your answer. it works fine. since I am relatively new for unix/awk scripting. I am not able to fully understand the code. below is my understanding about this code, some part I don't know how it works and have some questions. could you please review and give me brief explanation

thanks in advance.

awk -v want="RECORD_COUNT,VALUE2,VALUE3" -F'[=\\][]' '  --- F'[=\\][]' need to understand how the regular exp works.. 
function prnsection(i) {                                --- function pass arg i in
   if(length(section)) {                                --- if section is not empty do following
     printf "%s",section;                               --- print section
     for(i=1;i in keypos;i++) {                         --- for loop, max i is number of array keypos: keypos[value1]=1, keypos[value2]=2, keypos[values3]=3 
       printf " %s", keys[keypos]                    --- array keys elements are: keys[1]=value1, keys[2]=value2, key[3]=value2 
       keys[keypos]="MISSING"                        --- if array keys element doens't have value , assign value "missing"
     }                                                 
     printf "\n"
   }
}
BEGIN {
   for(i=split(want, keypos, ",");i;i--) {              --- create array keypos element based on variable want
       keys[keypos]="MISSING";                       --- create array keys if keys is empty then assign value missing.
   }
}
NF>2 { prnsection(); section=$0 }                       ---if NF> 2  then call function and assign $0 to section. the function has one 
                                                                                    --- argument, but here is empty, 
														---how the value be passed in?	
														---what's the purpose to call this function?
$1 in keys { keys[$1]=$0 };                             --- first my understanding is $1 is VALUE1, VALUE2..., I tried command, with -F'[=\\][]' 
                                                                           ----as delimiter, NF=1, not sure how it works. 
END { prnsection() }' file                              ---here call the function to print result..

Chubler_XL · October 3, 2019, 6:38pm

Glad to explain what is going on in this code. Working thru and understanding is a great way to improve your awk skills.

Field separator RE [=\\][]

This is a simple bracket [] expression and matches any of the following characters as a field separator = , ] and [ .
the ] character needs to be escaped in the RE to stop it being interpreted as a close bracket for the list.
We also need to escape the escape to stop the shell eating it up before it's passed to awk.

After the init section the two arrays are populated as follows:

keypos[1]=RECORD_COUNT
keypos[2]=VALUE2
keypos[3]=VALUE3

keys[RECORD_COUNT]="MISSING"
keys[VALUE2]="MISSING"
keys[VALUE3]="MISSING"

The main use of keypos is to ensure the output is ordered the same as the want list.
If we just iterated thru keys the order is arbitrary and may change for different implementations of awk.
In prnsection() we use a for loop starting at i=1 and finishing when i is no longer in keypos (

i in keypos

)

They key array is initialized to "MISSING" at the start and at each new section header.

[icode]$1 in keys { keys[$1]=$0 }; [/code]
This code updates the key array when $1 (the part in front of the = sign) is in keys.

The argument in awk server two purposes 1 is for input purposes the 2nd is to define local variables.
Actual arguments should be specified first followed by any local variables.
Here there are not arguments and i is simply a local variable to prnsection().
Its a good habit to always use local variables in functions unless there is a reason for them to be
global. Imagine if you had a for loop using a counter i and i was not local in prnsection(),
the i would be changed by the function call.

green_k · October 3, 2019, 9:51pm

chubler_xl:

Glad to explain what is going on in this code. Working thru and understanding is a great way to improve your awk skills.

Field separator RE [=\\][]

This is a simple bracket [] expression and matches any of the following characters as a field separator = , ] and [ .
the ] character needs to be escaped in the RE to stop it being interpreted as a close bracket for the list.
We also need to escape the escape to stop the shell eating it up before it's passed to awk.

After the init section the two arrays are populated as follows:
keypos[1]=RECORD_COUNT
keypos[2]=VALUE2
keypos[3]=VALUE3

keys[RECORD_COUNT]="MISSING"
keys[VALUE2]="MISSING"
keys[VALUE3]="MISSING"
The main use of keypos is to ensure the output is ordered the same as the want list.
If we just iterated thru keys the order is arbitrary and may change for different implementations of awk.
In prnsection() we use a for loop starting at i=1 and finishing when i is no longer in keypos (
i in keypos
)

They key array is initialized to "MISSING" at the start and at each new section header.

[icode]$1 in keys { keys[$1]=$0 }; [/code]
This code updates the key array when $1 (the part in front of the = sign) is in keys.

The argument in awk server two purposes 1 is for input purposes the 2nd is to define local variables.
Actual arguments should be specified first followed by any local variables.
Here there are not arguments and i is simply a local variable to prnsection().
Its a good habit to always use local variables in functions unless there is a reason for them to be
global. Imagine if you had a for loop using a counter i and i was not local in prnsection(),
the i would be changed by the function call.

thanks for your explanation. I first though -F'[=\\]' is multi delimiter too, but after run below command I lost.

:/apps >echo "[abcde]"|awk -F'[=\\][]' '{print NF}'
2
:/apps >echo "abc=123"|awk -F'[=\\][]' '{print NF}'
1
:/apps >uname -a
SunOS  5.10 Generic_150400-64 sun4v sparc sun4v

Chubler_XL · October 3, 2019, 9:55pm

On Sun OS you need to use nawk or /usr/xpg4/bin/awk as the legacy Solaris awk is missing many POSIX features.

MadeInGermany · October 4, 2019, 5:26am

Some awk versions have a problem with parsing a complicated FS.
The following variant takes a simple FS:

awk -v want="CITY,REGION,STREET" '
function prnsection() {
   if (length(section)) {
     printf "%s",section
# quick loop in random order:
#    for (i in keypos)
# keep the order:
     for (i=1;i in keypos;i++) {
       printf " %s", keys[keypos]
       keys[keypos]="MISSING"
     }
     printf "\n"
   }
}
BEGIN {
   FS="="
# split puts CITY,REGION,STREET to keypos[1,2,3]
# the loop creates keys[CITY,REGION,STREET]
   for (i=split(want, keypos, ",");i;i--) {
       keys[keypos]="MISSING";
   }
}
/^\[/ { prnsection(); section=$0 }
($1 in keys) { keys[$1]=$0 }
END { prnsection() }' infile

system · May 15, 2020, 4:13pm

Moderator comments were removed during original forum migration.