Converting tables of row data into columns of tables

justthisguy · July 13, 2007, 5:18pm

I am trying to transpose tables listed in the [Input] format into [Output] format. Any help would be greatly appreciated.

Input:

test_data_1              
1  2  90%
4  3  91%
5  4  90%
6  5  90%
9  6  90%

test_data_2              
3  5  92%
5  4  92%
7  3  93%
9  2  92%
1  1  92%
...

Output:

test_data_1           test_data_2       ...
1  2  90%             3  5  92%         ...
4  3  91%             5  4  92%         ...
5  4  90%             7  3  93%         ...
6  5  90%             9  2  92%         ...
9  6  90%             1  1  92%         ...

Thanks much,

Chris Larson
Just this guy.

reborg · July 13, 2007, 6:31pm

paste file1 file2 file3 ...

justthisguy · July 13, 2007, 6:47pm

Reborg,

Thanks. I should have clarified that the input data resides in a single file.

Also, I'm not sure how paste is going to help place the tables into a columnar layout. What I'm looking for is a way to take a single file in the first format and end up with a file containing the date tables side by side [by side by side etc.].

In other words, the [Output] section is an example of what I would like the output file to look like.

Chris Larson
JustThisGuy

reborg · July 13, 2007, 6:50pm

Is that the actual format of the data and not a simplified version?

justthisguy · July 13, 2007, 7:18pm

That's the actual format. There are occasional tables with 2 elements per line, rather than 3, as the example below, but other than that, it's just as I posted above.

test_data_9
2  95%
3  94%
4  91%
5  92%
6  93%
7  90%

drl · July 13, 2007, 9:10pm

Hi.

Here is one approach:

#!/bin/sh

# @(#) s1       Demonstrate context splitting, pasting, adjusting columns.

set -o nounset
echo
echo "GNU bash $BASH_VERSION" >&2
csplit --version | head -1 >&2
echo

FILE=${1-data1}
csplit -k -z $FILE /^test_data/ "{*}"

echo
echo " Lines per file:"
wc -l xx*

echo
paste xx* |
column -s"      " -t
# Note: there is a TAB inside -s" ".

exit 0

which, on your data file (with extra line in data set 9) produces:

% ./s1

GNU bash 2.05b.0(1)-release
csplit (coreutils) 5.2.1

76
76
54

 Lines per file:
  6 xx00
  6 xx01
  7 xx02
 19 total

test_data_1                test_data_2                test_data_9
1  2  90%                  3  5  92%                  2  95%
4  3  91%                  5  4  92%                  3  94%
5  4  90%                  7  3  93%                  4  91%
6  5  90%                  9  2  92%                  5  92%
9  6  90%                  1  1  92%                  6  93%
7  90%

See man pages for details on the options ... cheers, drl

justthisguy · July 16, 2007, 3:52pm

Thank y'all! drl, I largely used your example, thank you much for taking the time!

:: >> drl

I've pasted a commented version of the guts of my solution, should anyone else have the same or similar questions.

It's rough edged (I need to sit down and work on handling the indenting of the header in cases other than 3 element data sets), but functional.

In the scenario I posted above, this script (let's name it 'massagetocolumn.sh') would be called as follows:

./massagetocolumn.sh data.file test_data_

Chris Larson
JustThisGuy

#!/bin/sh

# Exit if any variable is not set.
set -o nounset

# Input file:
DATA_FILE=${1}
echo "Data File: "$DATA_FILE

# Dataset Header Prefix.
HEADER_PREFIX=${2}
echo "Header Prefix: "$HEADER_PREFIX

# Make columns from space-delimited file.
# -e indicates a command, several of which can be included in one
# sed call. s indicates the string to search for, which is prefixed
# by '/'. The replacement string is prefixed by the second '/' and
# closed with a final '/'.
# Following the list of commands is the input file.
# ' > $DATA_FILE.temp" directs the output to an output file,
# in this case with '.temp' added to the filename.
# This file is removed when the script finishes. In this
# script, the 's/[[ ]]*/\t/' is finding all spaces and replacing them
# with TAB (\t). The 's/$HEADER_PREFIX\S*/&\t\t/' is finding all strings beginning
# with '$HEADER_PREFIX' and appending two tabs after each occurrence.
# Oh, and the 'g' tells sed to replace all occurrences, not just the first occurrence
# per line, which is the default behavior.
sed -e 's/[[ ]]*/\t/g' -e 's/$HEADER_PREFIX\S*/&\t\t/' $DATA_FILE > $DATA_FILE.temp

# Cut datasets from input file into separate temporary files, named as xx##.
# The '-k' option leaves the temp files in place in the case of an error.
# The '-s' option silences the default byte counts that csplit offers.
# The '-z' option deletes any output files that are empty.
# csplit cuts the data sets based on the search string, in this case:
# whatever you put as the second argument to the script.
csplit -k -s -z $DATA_FILE.temp /^$HEADER_PREFIX/ "{*}"

# Paste temporary files into output, piped through 'column' to create columns.
# NOTE: there is a TAB inside -s" ".
# The 'paste' command pastes multiple files into one, with the contents of all files side by side.
paste xx* | column -s"      " -t > $DATA_FILE.out

#Remove the temporary files.
rm xx*
rm $DATA_FILE.temp

# Exit
exit 0

drl · July 16, 2007, 4:42pm

Hi, Chris Larson.

You're welcome. Thanks for posting the near-final code; that may help someone in the future.

Good idea documenting it. Usually if I don't do that internally or externally, I'll forget what I did in a few weeks.

I hope it continues to work well for you ... cheers, drl