I have a problem with later code in cases where there are spaces in the value of <name> and I would like to substitute underscore for space in the value of NAME in the above awk code before it is written. Is there a way to do this?
# name field tag to look for
name_field='<name>'
# value to add to beginning of name string
pre='ID_'
awk -v find_name=$name_field -v pre=$pre ' { OUT[++CNT] = $0 }
F==1 { gsub(" ", "_"); NAME = pre$0; F = 0 }
$0 ~ find_name { F = 1 }
$0 == "$$$$" { print NAME; for(i=2; i<=CNT; i++) print OUT; delete OUT; CNT = 0 }
' > $output_file_name
It's nice to know how to do that as I'm sure it won't be the last time it comes up. Is gsub() part of awk or a call to a different tool?
Thanks for the tip, in this case, the name line is the only one that I need to modify.
---------- Post updated at 07:31 PM ---------- Previous update was at 05:07 PM ----------
Perhaps I spoke too soon about not needing to make space replacements in other places in the code. What I need to do is use the space replaced version of NAME on the first line as the original code does, and also use it for the line following the <name> tag.
I think this will work but perhaps a more generalized solution would be better to allow for substitution on any requested line but not the entire input.
This works, as does the suggestion I posted above. I'm not sure which is preferable except that the suggestion of rdrtx1 does not require the conditional.
I assume that in the above case gsub() is changing the value of $0?
Yes, the gsub() call modifies $0 if no third argument is specified. MadeInGermany already said this in post #3 in this thread. (And, you quoted it and thanked him for that tip in post #4.)
I' afraid the code you posted doesn't give the desired output in post#1 as it suppresses the Mrv16a3102061815532D line. Plus, it is a bit overcomplicated.
If I read your code correctly, each line is read and stored in an array. When a line containing find_name is found ($0 ~ find_name), the next line is read by getline() using the incremented counter as in index. The character substitution is done and the modified line is assigned to the array by index value. When the end of record is reached ($0 == "$$$$"), the record is output.
It seems like the getline OUT[++CNT] instruction would cause the counter to be off by one. Is the changed value of ++CNT only in scope inside the {}?
In my bash, it seems like the array runs from 1 to n and not from 0 to n. If I run for(i=1; i<=CNT; i++) print OUT I get the entire record printed. That is why in my version I output the first line from a variable and then start the rest of the output at i=2 .
Do you need the line Mrv16a3102061815532D or not? The output as posted has it, the script doesn't provide it. Or, is there an empty line at the begin of the input not shown in the input sample?
Your analysis is correct.
No. The getline reads a new line that needs to be inserted exactly one count above the last line read. Your desired header ("ID_" + name value) will be inserted BEFORE all the other lines at the array's zero element. If the Mrv16a3102061815532D line in NOT wanted, assign to OUT[1] and run the loop starting from i=1 .