Data extraction from .xml file

Hello,
I'm attempting to extract 13 digit numbers beginning with 978 from a data file with the following command:

awk '{ for(i=1;i<=NF;i++) if($i ~ /^978/) print $i; }' datafile > outfile

This typically works. However, the new data file is an .xml file, and this command is no longer working for this reason, I imagine.

How can I either modify this command or convert the file so that the command will function?

Thanks so much!

Without a representative sample of the contents of datafile and a clear statement of where in the file 978 followed by ten other decimal digits is supposed to be matched, we can only make wild guesses at what might meet your requirements...

Sample from the .xml file:

<PriceAmount>42.97</PriceAmount>
<CurrencyCode>USD</CurrencyCode>
</Price>
</SupplyDetail>
</Product>
<Product>
<RecordReference>9780028608129</RecordReference>
<NotificationType>03</NotificationType>
<RecordSourceType>04</RecordSourceType>
<ProductIdentifier>
<ProductIDType>15</ProductIDType>
<IDTypeName>ISBN-13</IDTypeName>
<IDValue>9780028608129</IDValue>
</ProductIdentifier>
<ProductIdentifier>
<ProductIDType>14</ProductIDType>
<IDTypeName>GTIN-14</IDTypeName>

Desired output:

9780028608129
9780028608129

Thanks again!

Are you only looking for values found between <RecordReference> tags and between <IDValue> tags, or are you looking for values between any kings of tags?

What operating system are you using?

Does the grep utility on your system have a -o option?

I wish to extract *all* such numbers (beginning with 978) from the file, irrespective of the tags.

Mac OS - El Capitan, XQuartz 2.7.8

Yes, it appears that grep has the -o option.

Thanks!

Try:

grep -Eo '978[0-9]{10}' datafile
1 Like

Perfect!