split input data file and put into same output file

rasmith · April 15, 2011, 12:22pm

Hi All,
I have two input file and need to generate a CSV file. The existing report just "GREP" the records with the Header and Tailer records with the count of records.
Now i need to split the data into 25 records each in the same CSV file.

id_file (Input file )

227050994
232510151

report_data (Input file)

13,227050994,LALN3819959,2089851292,2085254977,20110224
.
.
.
13,227050994,LFLN3449126,2082113563,2082113396,20110224
283,232510151,LALC3914497,152469347,152466752,20110224
283,232510151,LFSD3449916,1329836600,1329836311,20110224
.
.
.
.
283,232510151,LFSL3455668,1142303778,1142301334,20110224
283,232510151,LFST3462358,1425672593,1425672226,20110224

Existing Report (output file)

Start of Report 20110224~ 
227050994 20110224 
13 227050994 LALN3819959 2089851292 2085254977 20110224
.
.
.
13 227050994 LFLN3449126 2082113563 2082113396 20110224
~End of Report 227050994 19
Start of Report 20110224~ 
232510151 20110224 
283 232510151 LALC3914497 152469347 152466752 20110224
283 232510151 LZNI0568201 2891873461 2891871770 20110224
.
.
.
.
283 232510151 LFSL3455668 1142303778 1142301334 20110224
283 232510151 LFST3462358 1425672593 1425672226 20110224
~End of Report 232510151 79

Script to process the files

OUT_FILE="report.csv"
for line in `cat id_file.dat`
do
  echo "Report,`date +%Y%m%d`~" >>$OUT_FILE
  echo "$line,`date +%Y%m%d`" >>$OUT_FILE
  grep ",$line," report_data.dat >>$OUT_FILE
  echo "~End of Report,$line,`grep -c ",$line," report_data.dat`" >>$OUT_FILE
done

Thank You,
Rasmith

DGPickett · April 15, 2011, 1:43pm

So, you want N files of 25 and a file for residue, if any? Feed the grep to an inner loop, something like:

 
ln=0 fn=0
grep . . . | while read line2
do
 if (( ++ln == 1 ))
 then
  date "+header stuff" >$OUT_FILE-$(( ++fn ))
 fi
 echo $line2 >$OUT_FILE-$fn
 if (( ln == 25 ))
 then
  echo trailer stuff $ln >$OUT_FILE-$fn
  ln=0
 fi
done
if (( ln > 0 ))
then
 echo trailer stuff $ln >$OUT_FILE-$fn
fi

rasmith · April 16, 2011, 11:11am

Need to split the data in multiples of 25 with left over data of the ID in the same CSV file as given below and not N files of 25.

Start of Report    20110224~
227050994    20110224               
13    227050994    LALN3819959    2089851292    2085254977    20110224
.
.
13    227050994    LFLN3449126    2082113563    2082113396    20110224
~End of Report    227050994    19           
Start of Report    20110224~               
232510151    20110224               
283    232510151    LALC3914497    152469347    152466752    20110224
.
.
.
283    232510151    LBWM3183936    1905799142    1905773331    20110224
~End of Report    232510151    25           
Start of Report    20110224~               
232510151    20110224               
283    232510151    LBWW2842523    1752698299    1752690888    20110224
.
.
283    232510151    LDND3387092    1474815506    1474814211    20110224
~End of Report    232510151    25           
Start of Report    20110224~               
232510151    20110224               
283    232510151    LHSS3133014    1793433382    1793513907    20110224
.
.
283    232510151    LFLV3454969    1514942972    1514279499    20110224
~End of Report    232510151    25           
Start of Report    20110224~               
232510151    20110224               
283    232510151    LFMY3455276    1943464440    1943466154    20110224
.
.
283    232510151    LFST3462358    1425672593    1425672226    20110224
~End of Report    232510151    4           
Start of Report    20110224~               
232510151    20110224

Thanks
Rasmith

---------- Post updated 04-16-11 at 10:11 AM ---------- Previous update was 04-15-11 at 05:16 PM ----------

hi David,

Can you please explain your script.

Thanks,

Rasmith

DGPickett · April 19, 2011, 3:23pm

This bit of script can be put into a shell subroutine or subshell and will process lines as follows:

Initialize two variables to zero, line number and file number,
grep out the desired lines and pipe them to a while read loop.
Increment the line number, and if it is 1, spit out a header using the date command + option (not echo and `date`) to a new file using the incremented file number in the entry name.
Spit out the current line. (s/b >>)
if this is line 25, spit out a trailer (s/b >>) and then zero the line counter.
After the loop ends (EOF), if there is a partial file, add a trailer (using $ln).

It is good to start designing froun the outside in and with a high perspective, but then code in layers from the inside out. This is an inside bit to chop the files up as you asked.

summer_cherry · April 21, 2011, 1:41am

my %hash = ('227050994'=>1,'232510151'=>1);
while(<DATA>){
	chomp;
	my @arr = split(",",$_);
	next if not exists $hash{$arr[1]};
	$result{$arr[1]}->{$arr[$#arr]}->{'CNT'}++;
	$result{$arr[1]}->{$arr[$#arr]}->{'STR'}=$result{$arr[1]}->{$arr[$#arr]}->{'STR'}."\n".$_;
}
foreach my $key(keys %result){
	foreach my $k(sort {$a <=> $b} keys %{$result{$key}}){
		print "Start of Report ",$k,"~\n";
		print $key," ", $k;
		print $result{$key}->{$k}->{'STR'},"\n";
		print "~End of Report ", $key," ", $result{$key}->{$k}->{'CNT'},"\n\n";
	}
}
__DATA__
13,227050994,LALN3819959,2089851292,2085254977,20110224
13,227050994,LFLN3449126,2082113563,2082113396,20110224
283,232510151,LALC3914497,152469347,152466752,20110224
283,232510151,LFSD3449916,1329836600,1329836311,20110224
283,232510151,LFSL3455668,1142303778,1142301334,20110225
283,232510151,LFST3462358,1425672593,1425672226,20110226