Help with missing XML tag

Hello All,

I am struggling with many huge XML files with lots of Account details including at least one Membership tag, in that Membership tag one xml tag was missed that is MembershipIdentifier:
(There are many Account tags with at least one Membership tag are there in each file)

......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

but some how MembershipIdentifier tag is compleatly missing. After missing its look like below:

......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

How can i find which AccountIdentifier missed MembershipIdentifier and if possible i need to replace with default MembershipIdentifier like PB00000000123456

So far i have tried with this for finding missed MembershipIdentifiers, but it didn't work:

awk '{if ($0 ~ /<//EnrollmentDate>/) {triggered=1;}if (triggered) {print; if ($0 ~ /<//Membership>/) { exit;}}}' filenames

Can somebody help me?

Thanks in advance...

This seems to work:

awk -v DMI="PB00000000123456789" '
/<Membership>/ {
	MIfound = 0
}
/<MembershipIdentifier>/ {
	MIfound = 1
}
/<\/Membership>/ && !MIfound {
	print "        <MembershipIdentifier>" DMI "</MembershipIdentifier>"
}
1' filenames

If a file named filenames contains:

......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

the above script produces the output:

......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000212799753</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......
......
 <Account>
    <AccountIdentifier>23123</AccountIdentifier>
    <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
    <MembershipInfos>
      <Membership>
        <ParticipationStatus>1</ParticipationStatus>
        <ModificationDate>2015-11-26T13:01:22-07:00</ModificationDate>
        <EnrollmentDate>2015-11-26T13:01:22-07:00</EnrollmentDate>
        <MembershipIdentifier>PB00000000123456789</MembershipIdentifier>
      </Membership>
      <Aliases>
	  ....
      </Aliases>
    </MembershipInfos>
  </Account>
  <Account>
......

adding the line marked in red.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

1 Like

Thank you So much Don for your time...

As i am dealing with large files(each file contains more that 10K Accounts and file count is more than 1000), this script just appending MI tag and displaying the output on the screen but not to the actual files, i am not sure how to add to actual files.

Also i would need to know which file and which AccountIdentifier(At least file name) missed MembershipIdentifier tag.

Could you please help me to get this done?
Thank in advance...

Making the wild assumption that the exec family of functions on your system can handle more than 1000 filenames in an argument list, the following should do what you want:

#!/bin/ksh
IAm=${0##*/}
tmpf="$IAm.$$"
awk -v DMI="PB00000000123456789" -v tmpf="$tmpf" '
function copyback() {
	if(oldf) {
		close(tmpf)
		if(cc) {
			cc = 0
			cmd = "cp \"" tmpf "\" \"" oldf "\""
			print "Running: " cmd
			if(system(cmd))
				failed++
		}
	}
	oldf = FILENAME
}
FNR == 1 {
	copyback()
}
/<AccountIdentifier>/ {
	split($0, AI, /<|>/)
}
/<Membership>/ {
	MIfound = 0
}
/<MembershipIdentifier>/ {
	MIfound = 1
}
/<\/Membership>/ && !MIfound {
	cc++
	print "MembershipIdentifier missing for AccountIdentifier " AI[3] \
	    " in file \"" oldf "\"."
	print "        <MembershipIdentifier>" DMI \
	    "</MembershipIdentifier>" > tmpf
}
{	print > tmpf
}
END {	copyback()
	if(failed) {
		print "*** Updating contents of " failed " files failed."
		exit 1
	}
}' "$@"
exit_code=$?
rm -rf "$tmpf"
exit $exit_code

Call this script with a list of files to be processed as operands. If your system can't handle an arg list that long, use xargs to invoke this script multiple times with subsets of the argument list.

This was written and tested using the Korn shell, but will work with any shell that understands basic POSIX shell parameter expansions (including ash , bash , dash , ksh , and zsh ) but will not work with a legacy Bourne shell and will not work with shells based on csh syntax.

And, as always, if you want to try this on a Solaris system, change awk to /usr/xpg4/bin/awk or nawk .

1 Like

This snippet saves any .xml file as .xml.rebuilt and it adds a default MembershipIndentifier. It logs the details in a file named rebuilt.log in the current directory, reporting the file name, the line number and the account missing the tag. Updates will be printed to your screen.

Save as VasuKukkapalli.pl and run as perl VasuKukkapalli.pl file1.xml file2.xml file3.xml ...
or
perl VasuKukkapalli.pl *.xml

#!/usr/bin/perl

use strict;
use warnings;

sub account_id
{
    my $account_line = shift;
    my ($id) = $account_line =~ /<AccountIdentifier>(\d+)</;
    return $id;
}

sub writefile
{
    my $filename = shift || die;
    print "Creating $filename\n";
    open my $fh, '>', $filename || die "Could not create $filename: $!\n";
    return $fh;
}

my @account = ();
my $membership =
    "<MembershipIdentifier>PB00000000123456789</MembershipIdentifier>\n";

my $current_file = $ARGV[0];
my $log = writefile("rebuilt.log");
my $tmp = writefile("$current_file.rebuilt");

while(<>){
    if($current_file ne $ARGV){
        close $tmp;
        $current_file = $ARGV;
        $tmp = writefile("$current_file.rebuilt");
        $. = 1;
    }
    push @account, [] if /<Account>/;
    if(exists $account[0]){
        push @{$account[0]}, $_;
        push @{$account[1]}, $.;
    }
    else{
        print $tmp "$_";
    }
    if(/<\/Account>/){
        if(!(@{$account[0]}[8] =~ /<MembershipIdentifier>/)){
            my ($spaces) = @{$account[0]}[7] =~ /(^\s+)/;
            splice @{$account[0]}, 8, 0, "$spaces$membership";
            my $id = account_id(@{$account[0]}[1]);
            print $log "File $ARGV: ",
                       "Line @{$account[1]}[8]: ",
                       "Account $id missing MembershipIdentifier\n";
        }
        print $tmp "@{$account[0]}";
        @account = ();
    }
}
print "Your files have been saved with the extension .rebuilt\n";
print "For details of missing MembershipIdentifier, please,",
      "look into rebuilt.log\n";

close $log;
close $tmp;

The file rebuilt.log will have something similar to:

File v2.xml: Line 26: Account 23123 missing MembershipIdentifier
File v2.xml: Line 41: Account 23125 missing MembershipIdentifier
File v3.xml: Line 26: Account 23123 missing MembershipIdentifier
File v3.xml: Line 41: Account 23125 missing MembershipIdentifier
1 Like