Separate output based on dates

Hi guys!

First time poster on here, was wondering if someone could help with a problem which I'm facing.

Is it possible to use awk or sed to separate the below output based on TRANSACTION_DATE? The output would then feed into DataStage.

Contents of CSV file:

TRANSACTION_DATE_TIME,TRANSACTION_DATE,TRANSACTION_ID,STATUS
4/8/2014 1:50,4/8/2014,MTS_68,COMPLETE    
4/8/2014 2:30,4/8/2014,MTS_99,COMPLETE    
5/8/2014 5:10,5/8/2014,MTS_100,COMPLETE    
5/8/2014 5:26,5/8/2014,MTS_101,COMPLETE    
5/8/2014 5:30,5/8/2014,MTS_102,COMPLETE    
6/8/2014 8:05,6/8/2014,MTS_103,COMPLETE    
6/8/2014 8:25,7/8/2014,MTS_104,COMPLETE

So
Result set 1:

TRANSACTION_DATE_TIME,TRANSACTION_DATE,TRANSACTION_ID,STATUS
4/8/2014 1:50,4/8/2014,MTS_68,COMPLETE    
4/8/2014 2:30,4/8/2014,MTS_99,COMPLETE 

Result set 2:

5/8/2014 5:10,5/8/2014,MTS_100,COMPLETE    
5/8/2014 5:26,5/8/2014,MTS_101,COMPLETE    
5/8/2014 5:30,5/8/2014,MTS_102,COMPLETE    

Result set 3:

6/8/2014 8:05,6/8/2014,MTS_103,COMPLETE   

etc...

Thank you kindly.

Do you need separate output files?- I do not know what Datastage uses.

Hi jim mcnamara,

No, just to be outputted into the unix command line.

Hello,

Could you please try following, following will help you to create files according to first column.

awk -F" |," 'BEGIN{i=1;}{if(A != $1){i++}} {print $0 >> i".txt";A=$1}' Input_file

In this case 3 files will be created as only 3 unique dates are there in given input.

cat 4.txt
6/8/2014 8:05,6/8/2014,MTS_103,COMPLETE
6/8/2014 8:25,7/8/2014,MTS_104,COMPLETE
 
cat 3.txt
5/8/2014 5:10,5/8/2014,MTS_100,COMPLETE
5/8/2014 5:26,5/8/2014,MTS_101,COMPLETE
5/8/2014 5:30,5/8/2014,MTS_102,COMPLETE
 
cat 2.txt
4/8/2014 1:50,4/8/2014,MTS_68,COMPLETE
4/8/2014 2:30,4/8/2014,MTS_99,COMPLETE

EDIT: Just saw OP wants to show on command line only so following may help.

sort -k1 Input_file | awk -F" |," 'BEGIN{i=1;}{if(A != $1 && A){print "**************************************"}} {print $0;A=$1}'

Output will be as follows.

4/8/2014 1:50,4/8/2014,MTS_68,COMPLETE
4/8/2014 2:30,4/8/2014,MTS_99,COMPLETE
**************************************
5/8/2014 5:10,5/8/2014,MTS_100,COMPLETE
5/8/2014 5:26,5/8/2014,MTS_101,COMPLETE
5/8/2014 5:30,5/8/2014,MTS_102,COMPLETE
**************************************
6/8/2014 8:05,6/8/2014,MTS_103,COMPLETE
6/8/2014 8:25,7/8/2014,MTS_104,COMPLETE

You can print new line etc in place of **** too. Hope this helps.

EDIT: Also if you need to take date as variable then following may help.

awk -F" |," -vs1="4/8/2014" '{if($1 ~ s1){print $0}}' Input_file

Output will be as follows.

4/8/2014 2:30,4/8/2014,MTS_99,COMPLETE
4/8/2014 1:50,4/8/2014,MTS_68,COMPLETE

Thanks,
R. Singh

1 Like

Try :

 awk -F'[ ,]' 'FNR==1{h=$0;next}{f=sprintf("%s.txt",$1); gsub(/\//,"_",f); if(f in F){ print >> f } else { print h RS $0 > f; F[f] } close(f) }' file

---------- Post updated at 09:09 PM ---------- Previous update was at 09:07 PM ----------

If you want to try this on a Solaris/SunOS system, change awk at the start of the script to /usr/xpg4/bin/awk or /usr/xpg6/bin/awk or nawk .

1 Like

Hi RavinderSingh13,

This was exactly what I was after, thanks a lot!