agama,
Many thanks, I'll try your new code right away and let you know asap.
I think if a group is missing for all groups blocks won't be a problem because it'll mean that only exist those groups that appear as unique Groups in the
file.
Best regards
---------- Post updated at 09:20 PM ---------- Previous update was at 08:21 PM ----------
Tested and works independently. Now I'll test it including it in my main awk script and following the structure of code as you suggested me. I'll let you know
Many thanks.
---------- Post updated at 11:00 PM ---------- Previous update was at 09:20 PM ----------
Hi again agama,
Is possible to process file1 first? how would be the structure?
I ask this because when awk code is processing file2 generates another array, but doing comparison with one of the arrays created when the code reads file1.
I've tried change the order as follow, but doesn't work (changes in red):
awk '
NR == FNR {
# ----------- blocks for processing file 1 ------------------------
/^<x / {
str = gensub(/(.+")([0-9]+)(">)(.+)(<\/.+)/, "\\2|\\4", "g")
split( str, a, "|" );
if( !seen[a[2]]++ ) # new group name, add it to the list
list[++nlist] = a[2];
agroup[group+0,a[2]] = a[1]; # changed to track across whole file
# your original code
B[gensub(/pattern/,"how","g")] #Storing desired data in array B
C[gensub(/pattern/,"how","g")] # #Storing desired data in array C
# small change to match D with A
dgroup[group+0,a[2]] = gensub(/pattern/, "\\2|\\4", "g") # changed to track across whole file
next;
}
/^<\/group>/ {
group++;
next;
}
{
# some other processing for file2
if($0 == Arr1[d+1]) {Ln[d+1]=FNR;if(d<length(Arr1)-1){d++}} # Arr1 is created when processing file1
next;
}
END {
asort( list );
for( g = 0; g < group; g++ ) # build A and D with groups seen
{
for( i = 1; i <= nlist; i++ )
{
A[++aidx] = sprintf( "%d|%s", agroup[g,list], list );
D[aidx] = dgroup[g,list];
}
}
# whatever end processing on A and D can be done here
for( i = 1; i <= length( A ); i++ ) # my testing to ensure they align
printf( "(%s) (%s)\n", A, D );
}
' file1 file2
Thanks again for your help
---------- Post updated 10-24-11 at 05:21 AM ---------- Previous update was 10-23-11 at 11:00 PM ----------
Hi agama again,
I've been able to adapt your code and suggestions into my main code. I saw that was much more complicated to
generate the array in the same awk code, then I generated an array and stored data in bash array. This bash array
is the input to main awk code.
At the beginnig I had some issues, but I was able to set the correct format of the array expected by the split() function.
The final code is as below:
oldIFS=$IFS # Default field separator in bash, IFS=" "
IFS=$'\n' # Changing temporaly to "|"
UnqGroups=( $( awk '/^<x /{print gensub(/(.+">)(.+)(<\/.+$)/,"\\2","g")}' file1 | sort -u | tr '\n' '|') ) #Unique groups
IFS=$oldIFS #Set it again to " ".
awk -v z="${UnqGroups
[*]}" 'BEGIN {nlist=split(z,list,"|")-1}
NR==FNR{
if($0 ~ /j v="/){
B[gensub(/pattern/,"\\2","g")]
x[gensub(pattern/,"","g")];asorti(x,C)
# To generate Array A
n=gensub(/.+="|".+$/,"","g")
group[gensub(/pattern/,"\\2","g")]=n;
next;
}
if($0 ~ /<\/group>/) {
for( i = 1; i <= nlist; i++ )
A[++w]=sprintf("%d|%s", group
, list );
delete group; # clear for next go round
next;}
next}
{
# code to work with file2
}
END{ Print arrays info }' file1 file2
Many thanks again both for all help and time.
Best regards