we have a CDC file where we need to get the latest record of the Key columns
Key Columns will be CDC_FLAG and SRC_PMTN_I
and fetch the latest record from the CDC_PRCS_TS
Can we do it with a single awk command.
Please help.
CDC_PRCS_TS|CDC_SEQ_I|CDC_FLAG|CDC_UPDT_USER|SRC_PMTN_I|TGT_PMTN_I|PMTN_N|PMTN_DESC_T
2013-03-27 10:32:30|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:31|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:32|0|I|NOT SET |124|215|PRMO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:33|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:31|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:32|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:33|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|I|NOT SET |126|217|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |127|768|MDS PROJECT|UAT PHASE
2013-03-27 10:32:30|0|U|NOT SET |128|454|MDS PROJECT|UAT PHASE
2013-03-27 10:32:30|0|U|NOT SET |129|234|PROJECT|UAT PHASE
2013-03-27 10:32:30|0|D|NOT SET |130|123|PROMO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|D|NOT SET |131|212|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|D|NOT SET |132|213|PROO-2:PMI PROJECT|INITIAL PHASE
Can we assume that CDC_FLAG will always be the 3rd field, SRC_PMTN_I will always be the 5th field, and CDC_PRCS_TS will always be the 1st field; or do we have to match the strings against the header line to determine which fields to use?
Do the output records need to be in the same order as they appeared in the input file or can the output be in random order except that the 1st output line must be the 1st input line (the headings)?
Hi,
Please use CODE tags (rather than the color blue) to display sample input, output, and code.
You didn't really answer my questions. If you're trying to get a single script that can handle any of your files, how will that script know which fields need to be used to select entries, which field will be used to select the latest entry, and which field should be used to sort the output?
Is it always fields with the headers CDC_FLAG and SRC_PMTN_I as keys, always the field with the header CDC_PRCS_TS to select a line out of all lines with matching keys, and always sort the output with CDC_SEQ_I as the primary sort key and CDC_PRCS_TS as the secondary sort key?
Do you have a file that contains filenames that will be processed and the associated field numbers (or names) from which the above data can be extracted by the script?
The sample input file you showed is not sorted in the order that you said should be the output sort order. It seems to be sorted with field 5 as the primary sort key and field 1 as the secondary sort key, not field 2 as the primary sort key. Is this a mistake in your specification or is the script supposed to reorder the output as well as eliminate duplicates?