Break output file into three files

cpolikowsky · November 14, 2016, 8:30pm

Help!

I am getting an output file that looks similar to below.

EMAIL_ADDR
-----------------------------------------------------------------------------------
user@gmail.com
DATABASENAME
-----------------------------------------------------------------------------------
db1
db2
db3
db4

RoleName
----------------------------------------------------------------------------------
dbrole1
dbrole2
dbrole3
dbrole4
dbrole5

I would love to get three files:
file1.txt - with contents being:

user@gmail.com

file2.txt - with contents being:

db1
db2
db3
db4

file3.txt - with contents being:

dbrole1
dbrole2
dbrole3
dbrole4
dbrole5

I know how to cut off the top line of the file but I am not sure how to parse the files to make them look like this.

ideas?

senhia83 · November 14, 2016, 11:49pm

Could be a start..

awk -v x=0 '/---/{x="F"++i;next}{print > x;}' infile

Aia · November 15, 2016, 12:19am

Not knowing how much of your example is real, here's a generic attempt based on the structure shown.

Save as process.pl
Run as perl process.pl cpolikowsky.input

#!/usr/bin/perl
#
use strict;
use warnings;

my $ver;
my @keep;

while(<>) {
   my $current = $_;
   if(/^-+$/) {
       pop @keep;
       $current = "";
   }

   if($current !~ /^$/) {
       push @keep, $current;
   }
   elsif(@keep) {
       write_out(++$ver, @keep);
       @keep=();
   }
}
write_out(++$ver, @keep);

sub write_out {
    my ($n, @lines) = @_;

    open my $fw, '>', "file$n.txt" || die;
    for my $line (@lines) {
        print $fw $line;
    }
    close $fw;
}

Output:

for f in file*.txt; do echo "$f"; cat "$f"; echo; done
file1.txt
user@gmail.com

file2.txt
db1
db2
db3
db4

file3.txt
dbrole1
dbrole2
dbrole3
dbrole4
dbrole5

RudiC · November 15, 2016, 6:53am

Try also

awk '
/^---/  {FN = LLINE ".txt"
         LLINE = ""
         next
        }

LLINE   {print LLINE  >  FN
        }

        {LLINE = $0
        }

END     {print LLINE  >  FN
        }
' file

drl · November 15, 2016, 10:04am

Hi.

Command csplit was designed for situations like this. The core of this solution is the single line:

csplit -b'%d.txt' -f'file' -k -z -q --suppress-matched $FILE '/^-----/' '{*}'

A sample script might be:

#!/usr/bin/env bash

# @(#) s1       Demonstrate splitting by context.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf csplit
pe
dixf csplit

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
grep file. expected-output.txt

pl " Results:"
csplit -b'%d.txt' -f'file' -k -z -q --suppress-matched $FILE '/^-----/' '{*}'
head file*

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
dixf (local) 1.12
csplit (GNU coreutils) 8.23

csplit  split a file into sections determined by context lines (man)
Path    : /usr/bin/csplit
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help

-----
 Input data file data1:
EMAIL_ADDR
-----------------------------------------------------------------------------------
user@gmail.com
DATABASENAME
-----------------------------------------------------------------------------------
db1
db2
db3
db4

RoleName
----------------------------------------------------------------------------------
dbrole1
dbrole2
dbrole3
dbrole4
dbrole5

-----
 Expected output:
file1.txt - with contents being:
file2.txt - with contents being:
file3.txt - with contents being:

-----
 Results:
==> file0.txt <==
EMAIL_ADDR

==> file1.txt <==
user@gmail.com
DATABASENAME

==> file2.txt <==
db1
db2
db3
db4

RoleName

==> file3.txt <==
dbrole1
dbrole2
dbrole3
dbrole4
dbrole5

Some earlier versions of csplit might need a slightly different scheme for repeating the pattern. See your man page for details.

Best wishes ... cheers, drl

RudiC · November 15, 2016, 10:14am

@drl: I considered csplit as well, but noticed that column headers will be part of the next file, then.

drl · November 15, 2016, 11:03am

Hi, RudiC.

Yes, I agree.

My take on many problems is that the user often does not have awk , perl , etc., expertise in order to craft a solution. However, doing many operations with shell can often lead to good, but approximate results. If cleaner results are desired, one can interactively edit the results (for one-off problems), or use a few well-placed simple grep or sed commands to help obtain the desired results.

Sometimes even the input might be able to be cleaned up by the source, or perhaps pre-process as an intermediate step before the approximating command(s).

However, in general, I agree with you about the csplit output in this case.

Thanks for the comments. I think discussions like this help users to see how we all approach problems.

cheers, drl