One Column To Two Columns

bronofski · December 30, 2012, 11:37am

Having difficulty reformatting a 1 column file into a 2 column file as shown below:

Example Input

Required Output

Field widths are always 2 in Section A and 7 in Section B.
Within each Section the field range is min 5 max 22.
Section A is in ascending numerical order
The Section titles are the only non 0-9 characters.

Any help would be much appreciated.

fuad · December 30, 2012, 12:58pm

cat filename | tr '\n' '\t' | awk ' { for (i=1; i<=NF/2; i++) { print $i, $(i+(NF/2)), "\n" } }'

Solution edited and fixed

rdrtx1 · December 30, 2012, 1:03pm

try also:

awk '
/[A-Za-z]+/ {a[c++]=$0; n=0}
!/[A-Za-z]+/ {o[a[c-1], n++]=$0}
END {
  for (i=0; i<c; i++) printf (i<c-1) ? a"\t":a"\n";
  for (j=0; j<n; j++) {
    for (i=0; i<c; i++) printf (i<c-1) ? o[a, j]"\t":o[a, j]"\n";
  }
}' input.txt

Scott · December 30, 2012, 1:07pm

If there's the same number of rows each time:

$ cat file
A
01
02
05
07
08
11
B
1193941
7231222
5192121
3221312
2211149
1783289

$ split -l7 < file

$ ll
total 24
drwxr-xr-x   5 scott  staff   170 30 Dec 19:05 .
drwxr-xr-x  79 scott  staff  2686 30 Dec 19:05 ..
-rw-r--r--   1 scott  staff    70 30 Dec 19:05 file
-rw-r--r--   1 scott  staff    20 30 Dec 19:05 xaa
-rw-r--r--   1 scott  staff    50 30 Dec 19:05 xab

$ paste x*
A	B
01	1193941
02	7231222
05	5192121
07	3221312
08	2211149
11	1783289

fuad · December 30, 2012, 2:50pm

I checked it and I find a small bug and here how I fix my first solution

cat filename | tr '\n' '\t' | awk ' { for (i=1; i<=NF/2; i++) { print $i, $(i+(NF/2)), "\n" } }'

$ cat test
A
01
02
05
07
08
11
B
1193941
7231222
5192121
3221312
2211149
1783289

$ cat test | tr '\n' '\t' | awk ' { for (i=1; i<=NF/2; i++) { print $i, $(i+(NF/2)), "\n" } }'
A B 

01 1193941 

02 7231222 

05 5192121 

07 3221312 

08 2211149 

11 1783289

Don_Cragun · December 30, 2012, 3:18pm

If you want to verify each of the conditions stated and print warnings when the conditions aren't met, try the following:

#!/bin/ksh
awk '/[^[:digit:]]/ {
        h[++hc] = $0
        next
}
{       o[hc,++oc[hc]] = $0
}
END {   if(hc != 2) {
                printf("Expected 2 heading lines; found %d: ", hc)
                for(i = 1; i <= hc; i++)
                        printf("\"%s\"%s", h, i == hc ? "\n" : " ")
                ec = 1
        }
        if(oc[1] != oc[2] || oc[1] < 5 || oc[1] > 22) {
                printf("%s\n%s %d %s \"%s\" and %d %s \"%s\".\n",
                        "Expected 5 <= x <= 22 entries under both headings;",
                        "found", oc[1], "under heading", h[1], oc[2],
                        "under heading", h[2])
                ec += 2
        }
        printf("%-3s%s\n", h[1], h[2])
        c = oc[1] > oc[2] ? oc[1] : oc[2]
        last = -1
        for(i = 1; i <= c; i++) {
                if(i <= oc[1] && o[1, i] <= last) {
                        ec += ec % 8 < 4 ? 4 : 0
                        m = sprintf(" %d not > %d.",
                                o[1, i], last)
                } else  m = ""
                if(i <= oc[1] && length(o[1, i]) != 2) {
                        ec += ec % 16 < 8 ? 8 : 0
                        m = sprintf("%s 1st field not 2 digits.", m)
                }
                if(i <= oc[2] && length(o[2, i]) != 7) {
                        ec += ec < 16 ? 16 : 0
                        m = sprintf("%s 2nd field not 7 digits.\n", m)
                } else  m = sprintf("%s\n", m)
                printf("%s %s%s", o[1, i], o[2, i], m)
                last = o[1, i]
        }
        exit ec
}' Input

Replace /bin/ksh in #!/bin/ksh with an absolute pathname of the Korn shell (or any shell that conforms to POSIX shell command language requirements) on your system.

If you're on a Solaris system, use /usr/xpg4/bin/awk or nawk instead of awk .

The exit code will be 0 if all conditions are met, or 1 through 31 depending on which set of conditions fail.

bronofski · December 30, 2012, 5:06pm

Many thanks for the very quick replies.

Scott: the number of rows varies between 5(min) and 22(max)

It's a hard choice between between fuad's second post code and rtrdx1's code.

fuad's output rows are separated by blank lines.

rdtx1's output columns are separated by a tab(?) instead of single space.

On balance I'm inclined to go with fuad's single-liner largely because I know how to delete
blank lines

---------- Post updated at 05:06 PM ---------- Previous update was at 04:44 PM ----------

Don Cragun

Thanks for a very comprehensive reply but as I'm able to verify the file meets the stated requirements before reformatting I don't have to cater for all the exception conditions.

Don_Cragun · December 30, 2012, 6:13pm

OK. If you've pre-verified the input file, you can strip my suggestion down to:

#!/bin/ksh
awk '/[^[:digit:]]/ { h[++hc] = $0; next}
{       o[hc,++oc[hc]] = $0}
END {   printf("%-3s%s\n", h[1], h[2])
        for(i = 1; i <= oc[1]; i++) printf("%s %s\n", o[1, i], o[2, i])
}' Input

and get the same output.

binlib · December 30, 2012, 7:33pm

pr -t -s' ' -2 -l 7 input

pamu · December 31, 2012, 1:21am

awk '/[a-zA-Z]/{i=0}
{i++;A=A?A"\t"$0:$0}
END{for(j=1;j<=i;j++){print A[j]}}' file

Scrutinizer · December 31, 2012, 2:23am

awk '/[^0-9]/{f=NR} f>1{print A[NR-f+1],$0; next} {A[NR]=$0}' file

fuad · December 31, 2012, 2:45am

You can just add

sed 's/ /\t/g'

to the end to replace the space to tab or you can edit the original code to be like this:

cat test | tr '\n' '\t' | awk ' { for (i=1; i<=NF/2; i++) { print $i, "\t", $(i+(NF/2)), "\n" } }'

Toiday · December 31, 2012, 3:27pm

I see people give you the codes but I think it will be more benefit to you if you're doing it yourself. So, instead of give you the code, I would rather show you one of many ways on how to do it. You can split the file into 2 files, column_a and column_b, then merge them back to a single file with two columns. You can do that first by "grep" for where column b starts with the "-n" option to show line number. Then with the combination of using "head" and "tail" with a "-number" based on your "grep" result (you cannot use that exact number, why?), you can split the file in to two files: column_a and column_b. You can then use the "paste" command to merger the two files in to one with 2 columns, separated by tab.

You can also add "pre" check stuffs to check the input file is correct such as the line where column b start is right in the middle of the file.

Have fun writing your codes

bronofski · December 31, 2012, 4:29pm

Toiday

You're quite right about the need to understand the options available rather than just using code snippets. Meanwhile I'm still at the cut-and-paste stage of my scripting learning curve so thanks again to all those who replied to my original post.