Formatting a report using awk

thaller · December 21, 2011, 1:07pm

Our vendor produces a report that I would like to format in a particular way.

Here is the sample output from their report:

# AA.INDEX                       2    11     2      239        52        (7,2)           07 MAY 11        203.1        55
# ACCOUNT                        2 89561     2  1103300     214.9    (60793,4)           07 MAY 11        182.5       129
# ACCT.ID.XREF                   2  4643     1      496        19       (19,1)           13 MAY 11          0.7         0
# ACH.ARCHIVE.FILE               2 26591     4   677153     208.5    (36241,4)           07 MAY 11         11.0       129
# ACH.BATCH.ARCHIVE              2177533     2   435120       158    (32771,2)           07 MAY 11         13.3        18
# ACH.BATCH.FILE                 2  1361     2    30193     152.6     (2221,2)           07 MAY 11         12.9       165
# ACH.CLEAN.ARCHIVE.WK           2   503     1   435120      18.8     (13367,1)           07 MAY 11         30.4      1588

As you can see on line 5, fields 3 & 4 are running into each other. This is due to the fields being fixed width and the value being to large (I'm assuming). I do not actually know the field widths, but could probably figure it out (by counting) if needed.

What I need is to ensure that fields 3 & 4 stay separated so I can perform some calculations with them (ie. $3 * $4). Filed 3 should always be a single digit. So "if length field 3 > 1 then split field 3 into fields 3 & 4 after the first digit in field 3".

Any ideas?

Thanks,

bartus11 · December 21, 2011, 1:20pm

The easiest way is inserting space after 3rd column character in all lines. It assumes fixed width fields:

awk '{print substr($0,1,34),substr($0,35)}' file

ctsgnb · December 21, 2011, 1:37pm

Maybe you should contact your vendor : adding a space will not tell you wether the field has been truncated or not.

I think this is up to your vendor to ensure that the generated report that is sent to you does not contain truncated values.

Of course, if you are sure that those fields never contain truncated values, then, indeed, you can separate them as per Bartus suggestion.

thaller · December 21, 2011, 1:53pm

Wouldn't that cause the lines without the issue to be double spaced?

Could I use something like this:

awk '{if (length($3) > 1)<more code>;}' file.txt

---------- Post updated at 01:53 PM ---------- Previous update was at 01:51 PM ----------

I'd agree with you normally, but they are a real pain, and turn around time would be lengthy. The actual end purpose would still work out, (dose not need to be exact) the first few place values would suffice.

bartus11 · December 21, 2011, 2:02pm

It will move lines without issues as well, but this way the alignment of the columns will be saved. If you inserted the space only on faulty line, then all the columns after that insertion will be out of line compared to the unchanged lines. Check that output (I've shortened your sample input lines a bit):

[root@linux ~]# awk '{print substr($0,1,34),substr($0,35)}' a
# AA.INDEX                       2     11     2      239        52        (7,2)           07
# ACCOUNT                        2  89561     2  1103300     214.9    (60793,4)           07
# ACCT.ID.XREF                   2   4643     1      496        19       (19,1)           13
# ACH.ARCHIVE.FILE               2  26591     4   677153     208.5    (36241,4)           07
# ACH.BATCH.ARCHIVE              2 177533     2   435120       158    (32771,2)           07
# ACH.BATCH.FILE                 2   1361     2    30193     152.6     (2221,2)           07
# ACH.CLEAN.ARCHIVE.WK           2    503     1   435120      18.8     (13367,1)          09
[root@linux ~]# 
[root@linux ~]# awk 'length($3)>1{$0=substr($0,1,34)" "substr($0,35)}1' a
# AA.INDEX                       2    11     2      239        52        (7,2)           07
# ACCOUNT                        2 89561     2  1103300     214.9    (60793,4)           07
# ACCT.ID.XREF                   2  4643     1      496        19       (19,1)           13
# ACH.ARCHIVE.FILE               2 26591     4   677153     208.5    (36241,4)           07
# ACH.BATCH.ARCHIVE              2 177533     2   435120       158    (32771,2)           07
# ACH.BATCH.FILE                 2  1361     2    30193     152.6     (2221,2)           07
# ACH.CLEAN.ARCHIVE.WK           2   503     1   435120      18.8     (13367,1)          09