Aligning text files by max field length

physalis2099 · August 11, 2011, 10:55am

Hello,
Is there anyway that I can align a pipe delimited text file by the maxium field length where the field is separated out by pipes for large text files with more than
100,000 rows?

So, far I have searched other forums and google about aligning text files in unix and I have noticed that several other users use the awk utility. Since I am new to awk
I have attempted in writing my own code after reading some of the awk utility syntax, but I am getting stuck.
If awk is not the best utility to achieve this is there any way to code this???

My test code:

#!/bin/ksh
awk 'BEGIN {FS = "|"}
{
for(i=1;i<=NF;i++)
{
if (length($i) > max)
max = length($i)
maxlen($i) = max
}
}
END
{
for (i in max) print (i,max)
}
' $(find . -name "testfile.txt")

Below is a sample of the text file that I have:

Pipe Delimited Text file
YEAR|NAME|PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001|Unix book|12354|01587|5487651484
2002|Programming|65487|6564548|654365146
2003|Airsoft Guns|6544888|548|65498
2004|Video Games|101100018|44|648
2010|Wayside Stories from wayside school|5487454|4|64645646
.
.
.

Desired Output:

YEAR|NAME |PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001 |Unix book |12354 |01587 |5487651484
2002 |Programming |65487 |6564548 |654365146
2003 |Airsoft Guns |6544888 |548 |65498
2004 |Video Games |101100018 |44 |648
2010 |Wayside Stories from wayside school|5487454 |4 |64645646
.
.
.

Thanks,

aigles · August 11, 2011, 11:16am

Try and adapt this awk script :

awk -v FS='|' -v OFS='|' '
    {
        if (MaxFields < NF) MaxFields = NF;
        for (i=1; i<=NF; i++) {
            Field[NR, i] = $i;
            l = length($i);
            if (Length < l) Length = l;
        };
    }
    END {
        for (i=1; i<=MaxFields; i++) Format = "%-" Length+0 "s";
        for (n=1; n<=NR; n++) {
            out = "";
            for (i=1; i<=MaxFields; i++) {
                out = out OFS sprintf(Format, Field[n, i]);
            }
            print substr(out,2);
        }
    }

' inputfile

Inputfile :

YEAR|NAME|PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001|Unix book|12354|01587|5487651484
2002|Programming|65487|6564548|654365146
2003|Airsoft Guns|6544888|548|65498
2004|Video Games|101100018|44|648
2010|Wayside Stories from wayside school|5487454|4|64645646

Output :

YEAR|NAME                               |PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001|Unix book                          |12354     |01587   |5487651484
2002|Programming                        |65487     |6564548 |654365146
2003|Airsoft Guns                       |6544888   |548     |65498
2004|Video Games                        |101100018 |44      |648
2010|Wayside Stories from wayside school|5487454   |4       |64645646

Jean-Pierre.

physalis2099 · August 11, 2011, 11:41am

Hello,

Thanks for helping me out. I have tried the code and I have received the following errors???

awk: syntax error near line 1
awk: bailing out near line 1

Thanks,

yazu · August 11, 2011, 11:44am

Use nawk instead of awk.

physalis2099 · August 11, 2011, 11:48am

Wow that is cool. nawk works!!!! Thanks so much. This is just what I needed.

tukuyomi · August 11, 2011, 12:22pm

Column can help too:

~/unix.com$ echo 'YEAR|NAME|PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001|Unix book|12354|01587|5487651484
2002|Programming|65487|6564548|654365146
2003|Airsoft Guns|6544888|548|65498
2004|Video Games|101100018|44|648
2010|Wayside Stories from wayside school|5487454|4|64645646' | column -ts'|' | sed -r 's/  ([[:alnum:]])/|\1/g'

physalis2099 · August 11, 2011, 4:11pm

Thanks for showing me the column command, as I have not heard of this command before, but it appears that I got an error out of it when I have attempted to use it.

sed: illegal option -- r
col.ksh[7]: syntax error at line 7 : `'' unmatched

Thanks,

alister · August 11, 2011, 6:30pm

You don't really need the non-portable -r option for that sed task. To avoid -r, simply use the equivalent basic regular expression syntax by backslash-escaping the parentheses.

Regards,
Alister