Need help in getting count from xml file

Team,
We need help in getting the customer count of multiple xml file present in a directory.

My xml structure is like below :

  
 <?xml version="1.0" encoding="ISO-8859-15"?>
<customers xmlns="Demandware - Commerce Platform | eCommerce Software">
<customer customer-no="2253998836943556"><credentials><login>susan@nealfam.com</login><password encrypted="true" encryptionScheme="scrypt">$s0$b0401$jzFfnnT2Z8SZiiJ+hT1nfA==$GdFi0r5ATbTjzRc2KdqYpxyQBqBJVgCl/E0qYdqHRws=</password><enabled-flag>true</enabled-flag><password-question></password-question><password-answer></password-answer></credentials><profile><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><email>susan@nealfam.com</email><phone-home></phone-home><phone-business></phone-business><phone-mobile></phone-mobile><fax></fax><creation-date>2000-04-20T02:18:36.000Z</creation-date><preferred-locale></preferred-locale><custom-attributes><custom-attribute  attribute-id="legacyTransition">notcomplete</custom-attribute><custom-attribute  attribute-id="legacyAccount">true</custom-attribute><custom-attribute  attribute-id="rewardsMemberFlag">false</custom-attribute><custom-attribute  attribute-id="rewardsID"></custom-attribute><custom-attribute  attribute-id="rewardsEmail">susan@nealfam.com</custom-attribute><custom-attribute  attribute-id="promoPref">false</custom-attribute><custom-attribute  attribute-id="rewardsPref">false</custom-attribute><custom-attribute  attribute-id="legacyID">2756</custom-attribute><custom-attribute  attribute-id="hasBrandedCard">false</custom-attribute><custom-attribute  attribute-id="hasPaypal"></custom-attribute></custom-attributes></profile><addresses><address address-id="56297194367043129" preferred="false"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>Gymboree</address1><address2>500 Howard St</address2><postbox></postbox><city>San Francisco</city><postal-code>94105</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>4152787561</phone><custom-attributes><custom-attribute  attribute-id="addressType">BIL</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194380317767" preferred="true"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>2745 Lake St</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121-1047</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>4157501501</phone><custom-attributes><custom-attribute  attribute-id="addressType">BIL,SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365826730" preferred="false"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>2745 Lake St</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365845003" preferred="false"><first-name>Zachary</first-name><second-name></second-name><last-name>Neal</last-name><address1>19 Milldam Road</address1><address2></address2><postbox></postbox><city>Acton</city><postal-code>01720</postal-code><state-code>MA</state-code><country-code>US</country-code><phone>9782632093</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365899569" preferred="false"><first-name>Christian  Oliver and Xander</first-name><second-name></second-name><last-name>Picot</last-name><address1>19304 Overleaf Lane</address1><address2></address2><postbox></postbox><city>Davidson</city><postal-code>28036</postal-code><state-code>NC</state-code><country-code>US</country-code><phone>631-283-7027</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365923826" preferred="false"><first-name>Mr. and Mrs. Seth</first-name><second-name></second-name><last-name>Bain</last-name><address1>Small Pond Studios</address1><address2>254 Ritch St</address2><postbox></postbox><city>San Francisco</city><postal-code>94107</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>415-498-2105</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366021537" preferred="false"><first-name>Carter</first-name><second-name></second-name><last-name>Croke</last-name><address1>5692 South Nome Street</address1><address2></address2><postbox></postbox><city>Englewood</city><postal-code>80111</postal-code><state-code>CO</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366053342" preferred="false"><first-name>Mr. and Mrs. Ward</first-name><second-name></second-name><last-name>Supplee</last-name><address1>301 22nd Avenue</address1><address2></address2><postbox></postbox><city>San Mateo</city><postal-code>94403</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-799-0032</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366053343" preferred="false"><first-name>Beth</first-name><second-name></second-name><last-name>sususus</last-name><address1>123 Lakdfj</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366142219" preferred="false"><first-name>sdddds</first-name><second-name></second-name><last-name>ssss</last-name><address1>dsdsds</address1><address2></address2><postbox></postbox><city>SAN FRANCISCO</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-8888</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address></addresses></customer>
<customer customer-no="2253998836943557"><credentials><login>susan1@nealfam.com</login><password encrypted="true" encryptionScheme="scrypt">$s0$b0401$jzFfnnT2Z8SZiiJ+hT1nfA==$GdFi0r5ATbTjzRc2KdqYpxyQBqBJVgCl/E0qYdqHRws=</password><enabled-flag>true</enabled-flag><password-question></password-question><password-answer></password-answer></credentials><profile><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><email>susan@nealfam.com</email><phone-home></phone-home><phone-business></phone-business><phone-mobile></phone-mobile><fax></fax><creation-date>2000-04-20T02:18:36.000Z</creation-date><preferred-locale></preferred-locale><custom-attributes><custom-attribute  attribute-id="legacyTransition">notcomplete</custom-attribute><custom-attribute  attribute-id="legacyAccount">true</custom-attribute><custom-attribute  attribute-id="rewardsMemberFlag">false</custom-attribute><custom-attribute  attribute-id="rewardsID"></custom-attribute><custom-attribute  attribute-id="rewardsEmail">susan@nealfam.com</custom-attribute><custom-attribute  attribute-id="promoPref">false</custom-attribute><custom-attribute  attribute-id="rewardsPref">false</custom-attribute><custom-attribute  attribute-id="legacyID">2756</custom-attribute><custom-attribute  attribute-id="hasBrandedCard">false</custom-attribute><custom-attribute  attribute-id="hasPaypal"></custom-attribute></custom-attributes></profile><addresses><address address-id="56297194367043129" preferred="false"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>Gymboree</address1><address2>500 Howard St</address2><postbox></postbox><city>San Francisco</city><postal-code>94105</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>4152787561</phone><custom-attributes><custom-attribute  attribute-id="addressType">BIL</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194380317767" preferred="true"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>2745 Lake St</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121-1047</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>4157501501</phone><custom-attributes><custom-attribute  attribute-id="addressType">BIL,SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365826730" preferred="false"><first-name>Susan</first-name><second-name></second-name><last-name>Neal</last-name><address1>2745 Lake St</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365845003" preferred="false"><first-name>Zachary</first-name><second-name></second-name><last-name>Neal</last-name><address1>19 Milldam Road</address1><address2></address2><postbox></postbox><city>Acton</city><postal-code>01720</postal-code><state-code>MA</state-code><country-code>US</country-code><phone>9782632093</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365899569" preferred="false"><first-name>Christian  Oliver and Xander</first-name><second-name></second-name><last-name>Picot</last-name><address1>19304 Overleaf Lane</address1><address2></address2><postbox></postbox><city>Davidson</city><postal-code>28036</postal-code><state-code>NC</state-code><country-code>US</country-code><phone>631-283-7027</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194365923826" preferred="false"><first-name>Mr. and Mrs. Seth</first-name><second-name></second-name><last-name>Bain</last-name><address1>Small Pond Studios</address1><address2>254 Ritch St</address2><postbox></postbox><city>San Francisco</city><postal-code>94107</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>415-498-2105</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366021537" preferred="false"><first-name>Carter</first-name><second-name></second-name><last-name>Croke</last-name><address1>5692 South Nome Street</address1><address2></address2><postbox></postbox><city>Englewood</city><postal-code>80111</postal-code><state-code>CO</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366053342" preferred="false"><first-name>Mr. and Mrs. Ward</first-name><second-name></second-name><last-name>Supplee</last-name><address1>301 22nd Avenue</address1><address2></address2><postbox></postbox><city>San Mateo</city><postal-code>94403</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-799-0032</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366053343" preferred="false"><first-name>Beth</first-name><second-name></second-name><last-name>sususus</last-name><address1>123 Lakdfj</address1><address2></address2><postbox></postbox><city>San Francisco</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-7561</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address><address address-id="56297194366142219" preferred="false"><first-name>sdddds</first-name><second-name></second-name><last-name>ssss</last-name><address1>dsdsds</address1><address2></address2><postbox></postbox><city>SAN FRANCISCO</city><postal-code>94121</postal-code><state-code>CA</state-code><country-code>US</country-code><phone>650-696-8888</phone><custom-attributes><custom-attribute  attribute-id="addressType">SHP</custom-attribute><custom-attribute  attribute-id="isLegacy">true</custom-attribute></custom-attributes></address></addresses></customer>
</customers>
  
  
 

From the above sample xml ,we need to take the total count of customer no from the file. So we need to consider the total no of 'customer-no' present in that file.

In a directory ,many similar xml files will be present.so we need to provide the total ''customer-no' present in all the files.

Can anyone kindly help me to getting the count of total customer from the all the files in a directory through unix scripting. As I'm new to scripting ,any help here will be appreciated.

Try

grep -hc customer-no *.xml

Hi Rudy/All,
By using the below command ,we are getting individual file count like below.

  
  grep -hc customer-no *.xml
31220
56492
57483
59503
64170
67882
68599
69292
66854
69652
70398
34722

 

We need total distinct count from the all the xml files available in the directory. Kindly help me in this regard.

So why the "distinct" count, all of a sudden? You didn't mention that before. How do you tell customers from each other?

As for the sum, try

echo $(( $(grep -hc customer-no *.xml | tr '\n' '+' ) 0 ))
1 Like

Hi Rudy,
Thanx a lot for your help.
I understand the 1st part

 grep -hc customer-no *.xml 
 

Could you kindly explain me how sum is calculating here for my understanding purpose.

grep gives you this

grep -hc customer-no *.xml
31220
56492
57483
59503
64170
67882
68599
69292
66854
69652
70398
34722

Which is the representation of this

31220\n56492\n57483\n59503\n64170\n67882\n68599\n69292\n66854\n69652\n70398\n34722\n

The command tr '\n' '+' translates any new line for a plus sign

31220+56492+57483+59503+64170+67882+68599+69292+66854+69652+70398+34722+

Display the sum of it

echo $((31220+56492+57483+59503+64170+67882+68599+69292+66854+69652+70398+34722+0))

You have a list of integer numbers in successive lines; tr converts <NL> (\n, 0x0A) characters to plus signs so you have 31220+56492+...+34722+ . The trailing + is considered offensive by the now to be applied "arithmetic evaluation" $(( ... )) , so a solitary zero is appended.

The standards say that grep and other text processing utilities produce unspecified behavior when an input file is not a text file. By definition, text files cannot have any lines longer than the LINE_MAX limit on your system. (On most systems, LINE_MAX is set to 2048 bytes (including the <newline> line terminator.) Your sample file includes lines that are more than 6950 bytes long. Unless the grep man page on your system indicates that it can process text file with unlimited line lengths (or at least lines with lengths longer than whatever the maximum line length is in your files), any results you get from a script using:

grep -hc customer-no file.xml

cannot be trusted.

If you are trying to count unique customer numbers, you'll need something more powerful than grep . If we make the very wild assumption that the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and that awk on your system (another text processing utility) supports line lengths at least as long as the longest lines in your XML files, you could try:

awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}' *.xml

If awk can't handle lines that long on your system and the <customer customer-no="xxxxxxxxxxxxxxxxx"> tag is the first tag on any line in which it appears and appears at the start of each of those lines, you could try the following:

cut -c1-50 *.xml | awk -F'"' '
/customer customer-no/ && !($2 in cust) {
	cust[$2]
	n++
	# print $2	# uncomment this line to list unique customer numbers
}
END {	print n
}'

Using other standard utilities, could you try something like this:-

egrep "^<customer customer-no=\"" *.xml \
 | cut -f1 -d">" \
 | sort \
 | uniq -c

It might be pretty heavy on processing, but it seems to work for me. If egrep is being unpredictable, try putting the cut first instead. That would give cut more lines to process, but I suppose egrep then has shorter lines to consider. I'm not sure which will perform better.

I hope that this helps,
Robin