Hi Gurus,
I have below requirement and have no idea how to achieve this.
the input file like below. there are multiple sections in file, each section has multiple lines. I need to find certain lines (value1, value2, value3 are key words for line searching) and generate another file. in some section, maybe some lines missing. for missing line, output "Missing"
My OS is SunOS 5.10 Generic_150400-64 sun4v sparc sun4v
thanks Yoda for your quick response. the code work with my sample data. it is my bad I didn't provide sample data correctly. the value1 , value2 and value3 don't have any relation. for example: CITY, REGION, STREET. I updated my post.
--- Post updated at 12:08 AM ---
thanks Chubler_XL for you quick response, the code works as I expected.
Hi Yoda,
I modify this code to match multiple pattern, it works fine. I have below question.
what's purpose of FS in this array. A[sc FS $1]. my understanding is when match find, then assign $0 to array A with index "section and $1".
is it possible using pattern match at below code value1, value2? in the file, this value are not same. for example:
there are some value like below: in this case, we consider these two city are same.
Hi Chubler_XL, thanks for your answer. it works fine. since I am relatively new for unix/awk scripting. I am not able to fully understand the code. below is my understanding about this code, some part I don't know how it works and have some questions. could you please review and give me brief explanation
thanks in advance.
awk -v want="RECORD_COUNT,VALUE2,VALUE3" -F'[=\\][]' ' --- F'[=\\][]' need to understand how the regular exp works..
function prnsection(i) { --- function pass arg i in
if(length(section)) { --- if section is not empty do following
printf "%s",section; --- print section
for(i=1;i in keypos;i++) { --- for loop, max i is number of array keypos: keypos[value1]=1, keypos[value2]=2, keypos[values3]=3
printf " %s", keys[keypos] --- array keys elements are: keys[1]=value1, keys[2]=value2, key[3]=value2
keys[keypos]="MISSING" --- if array keys element doens't have value , assign value "missing"
}
printf "\n"
}
}
BEGIN {
for(i=split(want, keypos, ",");i;i--) { --- create array keypos element based on variable want
keys[keypos]="MISSING"; --- create array keys if keys is empty then assign value missing.
}
}
NF>2 { prnsection(); section=$0 } ---if NF> 2 then call function and assign $0 to section. the function has one
--- argument, but here is empty,
---how the value be passed in?
---what's the purpose to call this function?
$1 in keys { keys[$1]=$0 }; --- first my understanding is $1 is VALUE1, VALUE2..., I tried command, with -F'[=\\][]'
----as delimiter, NF=1, not sure how it works.
END { prnsection() }' file ---here call the function to print result..
Glad to explain what is going on in this code. Working thru and understanding is a great way to improve your awk skills.
Field separator RE [=\\][]
This is a simple bracket [] expression and matches any of the following characters as a field separator = , ] and [ .
the ] character needs to be escaped in the RE to stop it being interpreted as a close bracket for the list.
We also need to escape the escape to stop the shell eating it up before it's passed to awk.
After the init section the two arrays are populated as follows:
The main use of keypos is to ensure the output is ordered the same as the want list.
If we just iterated thru keys the order is arbitrary and may change for different implementations of awk.
In prnsection() we use a for loop starting at i=1 and finishing when i is no longer in keypos (
i in keypos
)
They key array is initialized to "MISSING" at the start and at each new section header.
[icode]$1 in keys { keys[$1]=$0 }; [/code]
This code updates the key array when $1 (the part in front of the = sign) is in keys.
The argument in awk server two purposes 1 is for input purposes the 2nd is to define local variables.
Actual arguments should be specified first followed by any local variables.
Here there are not arguments and i is simply a local variable to prnsection().
Its a good habit to always use local variables in functions unless there is a reason for them to be
global. Imagine if you had a for loop using a counter i and i was not local in prnsection(),
the i would be changed by the function call.
Some awk versions have a problem with parsing a complicated FS.
The following variant takes a simple FS:
awk -v want="CITY,REGION,STREET" '
function prnsection() {
if (length(section)) {
printf "%s",section
# quick loop in random order:
# for (i in keypos)
# keep the order:
for (i=1;i in keypos;i++) {
printf " %s", keys[keypos]
keys[keypos]="MISSING"
}
printf "\n"
}
}
BEGIN {
FS="="
# split puts CITY,REGION,STREET to keypos[1,2,3]
# the loop creates keys[CITY,REGION,STREET]
for (i=split(want, keypos, ",");i;i--) {
keys[keypos]="MISSING";
}
}
/^\[/ { prnsection(); section=$0 }
($1 in keys) { keys[$1]=$0 }
END { prnsection() }' infile