I have an input file which looks like the example below and I want to format it with 2 columns from the header based on the word "CUSIP" followed by a 9 digit string with first 3 being numeric and in the same line NNN.NN% pattern for the percentage value. (In RED)
I started of with nawk but with my limited knowledge of regex I am stuck any help would help.
Thanks
input file
[SourceFile]
****************************************************
* *
* THE FOLLOWING IS THE SOLICITOR MAIL FILE *
* *
****************************************************
1
THE DEPOSITORY TRUST COMPANY PAGE: 1
SPECIAL SECURITY POSITION LISTING PROGRAM: PXYD0001
FOR CUSIP / DESCRIPTION: 11617N-CD-1 / CDFS1.1%081811 BE+#
POSITIONS AS OF: 08/17/11
0--------------------------------
| PARTICIPANT | QUANTITY |
--------------------------------
| 101 |BANK OF NY| 111,000 |
| 111 |FRST CLEAR| 11,000 |
| 17 |JONES E D | 11,000 |
| 111 |JPMC CLEAR| 118,000 |
| 11 |MSSB | 11,000 |
| 116 |NFS LLC | 61,000 |
| 111 |PERSHING | 1,116,000 |
| 171 |SOUTHWEST | 111,000 |
|1111 |WFB/SAFEKP| 100,000 |
output file
Desired Output --- 2 new columns from header CUSIP# and %
I know which parts you want. I just don't know which of the junk is labels and which is actually in the file -- it's seemingly labelled twice, so at least one of them must be in the file, if not both...
If you use code tags instead of trying to label things ---------------------------like this---------------------- then the bounds of the file will be obvious. And, usefully, color tags like you've already used still work inside [code] tags.
$ cat input.awk
BEGIN { # Split 'lines' on --------------
RS="--------------------------------"
# Split 'columns' on newline
FS="\n"
}
NR==1 { # Third-last line contains the info we want, split apart on space
split($(NF-2), A, " ");
V1=A[5];
# Extract the part we want with a regex
match($(NF-2), /[0-9]+\.[0-9]+/);
V2=substr($(NF-2), RSTART, RLENGTH);
# Don't let the routines below get fed this
next }
NR==2 { # Stop splitting on ----, start splitting on lines
RS="\n"
# Start splitting fields on | instead of \n
FS="|"
# Use | as separator when printing too
OFS="|"
# Don't let the routines below be fed this record
next
}
# Only run this code block when we're on the third 'line' or more
NR>2 {
# Set last record to string we found earlier
$(NF)=V1
# Set record after that to string we found earlier
$(NF+1)=V2
# Only print when we're on the third 'line' or more and there's more than two columns
} (NF>2)&&(NR>2)
$ awk -f input.awk < input
| 101 |BANK OF NY| 111,000 |11617N-CD-1|1.1
| 111 |FRST CLEAR| 11,000 |11617N-CD-1|1.1
| 17 |JONES E D | 11,000 |11617N-CD-1|1.1
| 111 |JPMC CLEAR| 118,000 |11617N-CD-1|1.1
| 11 |MSSB | 11,000 |11617N-CD-1|1.1
| 116 |NFS LLC | 61,000 |11617N-CD-1|1.1
| 111 |PERSHING | 1,116,000 |11617N-CD-1|1.1
| 171 |SOUTHWEST | 111,000 |11617N-CD-1|1.1
|1111 |WFB/SAFEKP| 100,000 |11617N-CD-1|1.1
$