Hi,
There's an xml file produced from a front-end tool as shown below:
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="DIM_TABLE" OWNERNAME ="RAGS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
.... so on
I want to extract only one of these blocks based on the NAME for e.g. If the input is "FACT_TABLE" then only the 1st block OR lines within this block should be selected that is...
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
OR
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
Note: I cannot use any XML parser tool as I don't permission to download it in my office. Also the code would be a part of Korn shell script. Please suggest.
-dips
Like this?
awk '/NAME ="FACT_TABLE"/,/<\/INPUT>/' infile
v='"FACT_TABLE"'; sed -n "/NAME =$v/,/<\/INPUT>/p" infile
Because I like PERL:
use strict;
use warnings;
use Getopt::Long;
use File::Basename;
my $NAME = basename $0, '.pl';
my $name = '';
Getopt::Long::Configure(qw{ bundling no_ignore_case });
undef $/;
$\ = "\n";
$, = "\n";
unless (GetOptions('name|n=s' => \$name)) {
print STDERR "USAGE: $NAME [--name=<NAME> | -n <name> ] files...";
exit 1;
}
while (<>) {
my @M = m{<INPUT[^>]*NAME\s*=\s*"$name"[^>]*>\s*(.*?)\s*</INPUT>}gs;
print @M if 0 < @M;
}
Use:
./scriptname --name=DIM_TABLE inputfile
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
Hi Scrutinizer,
I tried both of your commands and they work partially i.e. it selects other blocks with different tags and omits the same tag block
The whole file would look like NOTE: I have shown all the tags but shrunk the no. of lines (In the earlier post I showed only an excerpt from the XML file, so may be you didn't get the idea how the whole file would look......I thought it should suffice but I was mistaken!!)
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE POWERHOUSE SYSTEM >
<DATAMART NAME="DEVELOPMENT" VERSION="1" CODEPAGE="Latin1" DATABASETYPE="Oracle">
<FOLDER NAME="CUSTOMER_USAGE" GROUP="" OWNER="Administrator" SHARED="NOTSHARED" DESCRIPTION="" PERMISSIONS="rwx---r--" UUID="acb98070232-agaiojhawv9780">
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="DIM_TABLE" OWNERNAME ="RAGS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="DEPT_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_DEPT_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_DEPT_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<OUTPUT DATABASE ="Oracle" DESCRIPTION ="TABLE STRUCTURE FOR ERROR TABLE" NAME ="ERROR_TABLE" OBJECTVERSION ="1" TABLEOPTIONS ="" VERSIONNUMBER ="1">
<OUTPUTFIELD DATATYPE ="date" DESCRIPTION ="" FIELDNUMBER ="1" KEYTYPE ="NOT A KEY" NAME ="TIME_KEY" NULLABLE ="NULL" PICTURETEXT ="" PRECISION ="19" SCALE ="0"/>
<OUTPUTFIELD DATATYPE ="number(p,s)" DESCRIPTION ="" FIELDNUMBER ="2" KEYTYPE ="NOT A KEY" NAME ="ERROR_KEY" NULLABLE ="NULL" PICTURETEXT ="" PRECISION ="15" SCALE ="0"/>
</OUTPUT>
<TRANSFER DESCRIPTION ="Calculates customer's spending in a store. This is a FACT_TABLE" NAME ="CAL_CUST_SPEND" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="CUSTOMER_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<TRANSFER DESCRIPTION ="Calculates store's inventory" NAME ="LKP_FACT_TABLE" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<TRANSFER DESCRIPTION ="Tags errors" NAME ="ERROR_TYPE" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="9">
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="ERROR_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<TRANSFERMFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="ERROR_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="ERROR_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</TRANSFER>
<INSTANCE DESCRIPTION ="This component pulls all the records from the FACT_TABLE" NAME ="FACT_TABLE" REUSABLE ="NO" INPUT_NAME ="FACT_TABLE" INPUT_TYPE ="Lookup Procedure" TYPE ="INPUT"/>
<INSTANCE DESCRIPTION ="This component pulls all the records from the DIM_TABLE" NAME ="DIM_TABLE" REUSABLE ="NO" INPUT_NAME ="DIM_TABLE" INPUT_TYPE ="Source Qualifier" TYPE ="INPUT">
<ASSOCIATED_INSTANCE NAME ="FACT_TABLE"/>
<ASSOCIATED_INSTANCE NAME ="ANOTHER_TABLE"/>
</INSTANCE>
<CONNECTOR FROMFIELD ="FROM_FIELD_1" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="FACT_TABLE" TOINSTANCETYPE ="Target Definition"/>
<CONNECTOR FROMFIELD ="FROM_FIELD_2" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="ANOTHER_TABLE" TOINSTANCETYPE ="Target Definition"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="MAIN_TABLE"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="ERROR_TABLE"/>
<ERPINFO/>
<METADATAEXTENSION COMPONENTVERSION ="1111111" DATATYPE ="STRING" DESCRIPTION ="name" DOMAIN ="User Defined Metadata Domain" ISEDITABLE ="YES" ISVISIBLE ="YES" ISREUSABLE ="YES" ISSHAREREAD ="NO" ISSHAREWRITE ="NO" MAXLENGTH ="256" NAME ="kk" VALUE ="" VENDORNAME ="ABC"/>
</MAP>
</FOLDER>
</DATAMART>
</POWERHOUSE>
Results into
<INPUT DATABASE ="ORACLE" DBNAME ="UNIX" NAME ="FACT_TABLE" OWNERNAME ="DIPS">
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" NAME ="STORE_KEY" PICTURETEXT ="" PORTTYPE ="INPUT" PRECISION ="15" SCALE ="0"/>
<INPUTFIELD DATATYPE ="double" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="VAR_STORE_KEY" EXPRESSIONTYPE ="GENERAL" NAME ="VAR_PREV_STORE_KEY" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="15" SCALE ="0"/>
</INPUT>
<INSTANCE DESCRIPTION ="This component pulls all the records from the FACT_TABLE" NAME ="FACT_TABLE" REUSABLE ="NO" INPUT_NAME ="FACT_TABLE" INPUT_TYPE ="Lookup Procedure" TYPE ="INPUT"/>
<INSTANCE DESCRIPTION ="This component pulls all the records from the DIM_TABLE" NAME ="DIM_TABLE" REUSABLE ="NO" INPUT_NAME ="DIM_TABLE" INPUT_TYPE ="Source Qualifier" TYPE ="INPUT">
<ASSOCIATED_INSTANCE NAME ="FACT_TABLE"/>
<ASSOCIATED_INSTANCE NAME ="ANOTHER_TABLE"/>
</INSTANCE>
<CONNECTOR FROMFIELD ="FROM_FIELD_1" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="FACT_TABLE" TOINSTANCETYPE ="Target Definition"/>
<CONNECTOR FROMFIELD ="FROM_FIELD_2" FROMINSTANCE ="COMP_INSERT" FROMINSTANCETYPE ="Router" TOFIELD ="TO_FIELD_1" TOINSTANCE ="ANOTHER_TABLE" TOINSTANCETYPE ="Target Definition"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="MAIN_TABLE"/>
<TARGETLOADORDER ORDER ="1" TARGETINSTANCE ="ERROR_TABLE"/>
<ERPINFO/>
<METADATAEXTENSION COMPONENTVERSION ="1111111" DATATYPE ="STRING" DESCRIPTION ="name" DOMAIN ="User Defined Metadata Domain" ISEDITABLE ="YES" ISVISIBLE ="YES" ISREUSABLE ="YES" ISSHAREREAD ="NO" ISSHAREWRITE ="NO" MAXLENGTH ="256" NAME ="kk" VALUE ="" VENDORNAME ="ABC"/>
</MAP>
</FOLDER>
</DATAMART>
</POWERHOUSE>
Please suggest how to select only the lines enclosed in tags <INPUT> & <\INPUT> and having NAME as "FACT_TABLE".
-dips
Try /<INPUT.*NAME ="FACT_TABLE"/
instead of /NAME ="FACT_TABLE"/
1 Like
Hi Scrutinizer,
Neat. It worked perfectly fine!! Thankyou.
This kind of "awk" construct means that the lines between FROM KEYWORD & TO KEYWORD be selected. Am I right?
awk '/FROM KEYWORD/,/TO KEYWORD/' infile
-dips
Yes, it is called a "range pattern"