File splitting, naming file according to internal field

Leedor · September 15, 2010, 1:12pm

Hi All,

I have a rather stange set of requirements that I'm hoping someone here could help me with. We receive a file that is actually a concatenation of 4 files (don't believe this would change, but ideally the solution would handle n files).

The super-file looks like:

FileHeader,Filename.csv,FileType,RowCount,Date,Time
ColumnHeaders....
Data.....
...
Data.....
FileHeader,Filename.csv,FileType2,RowCount2,Date2,Time2
ColumnHeaders2....
Data.....
...
Data.....
FileHeader,Filename.csv,FileType3,RowCount3,Date3,Time3
ColumnHeaders3....
Data.....
...
Data.....
FileHeader,Filename.csv,FileType4,RowCount4,Date4,Time4
ColumnHeaders4....
Data.....
...
Data.....

I would like to split that super-file into the 4 constituent files (each time the constant "Fileheader" is seen at start of line), naming each file: Filename-FileType.csv. Rowcounts dates and times can remain unchanged in their separate files.

Additionally, if possible, I would then like to update the Filename.csv in each sub-file to its newly allocated filename (Filename_FileType.csv)

Unfortunately my awking skills are extremely minimal. Can someone please help me with this?

Many many thanks in advance.

Lee

116 · September 15, 2010, 1:25pm

try the code below:

while read line; do

  if [ $line ~  /^FileHeader/ ]; then
    filename=`sed s/^FileHeader\(.*\).csv,\([^,]*\),.*/\1_\2.csv/`
  else
    echo $line >>$filename
  fi

done < superfilename

Franklin52 · September 15, 2010, 1:31pm

Something like this?

awk '/^FileHeader/{fn=$2 "-" $3 ".csv"}{print > fn}' file

Leedor · September 15, 2010, 2:11pm

Thanks guys for the very quick responses, unfortunately I can't get either solution to work.

116: I get the error below

 
split.sh[5]: /home/dlee: 0403-012 A test command parameter is not valid

I've tried to put it into a .sh script, so I hope that's not having any adverse effect:

 
#!/usr/bin/ksh
 
while read line; do
 
  if [ $line ~  /^FileHeader/ ]; then
    filename=`sed s/^FileHeader\(.*\).csv,\([^,]*\),.*/\1_\2.csv/`
  else
    echo $line >>$filename
  fi
 
done < $1

Franklin: I don't get any output from yours

To clarify, here's a sample file:

<<ICON_NIL_Trans.csv>>

FileHeader,ICON_NIL_Trans.csv,Trades,2,20100818,09:50:00,,
PortfolioCode,SourceSystem,AssetCode
1,2,3
FileHeader,ICON_NIL_Trans.csv,Cash,2,20100818,09:50:00,,
PortfolioCode,SourceSystem,AccountCode
4,5,6
FileHeader,ICON_NIL_Trans.csv,Fx,2,20100818,09:50:00,,
PortfolioCode,SourceSystem,BuyAccSecCode
7,8,9
FileHeader,ICON_NIL_Trans.csv,Inc,2,20100818,09:50:00,,
PortfolioCode,SourceSystem,AssetCode
0,1,2

and I'd want 4 outputs:

<<ICON_NIL_Trans_Trades.csv>>

PortfolioCode,SourceSystem,AssetCode
1,2,3

<<ICON_NIL_Trans_Cash.csv>>

PortfolioCode,SourceSystem,AccountCode
4,5,6

<<ICON_NIL_Trans_Fx.csv>>

PortfolioCode,SourceSystem,BuyAccSecCode
7,8,9

<<ICON_NIL_Trans_Inc.csv>>

PortfolioCode,SourceSystem,AssetCode
0,1,2

Many thanks again,

Lee

116 · September 15, 2010, 2:32pm

Sorry, I presumed u were using a bash shell. I guess it is ksh. Then can try this

#!/usr/bin/ksh
 
while read line; do
 
  if echo $line | grep  /^FileHeader/ >/dev/null ; then
    filename=`sed s/^FileHeader\(.*\).csv,\([^,]*\),.*/\1_\2.csv/`
  else
    echo $line >>$filename
  fi
 
done < $1

---------- Post updated at 01:32 PM ---------- Previous update was at 01:25 PM ----------

I guess Franklin code might work with a bit of addition

awk 'BEGIN{FS=","} /^FileHeader/{fn=$2 "-" $3 ".csv"}{print > fn}' file

Franklin52 · September 15, 2010, 3:49pm

Try this:

awk -F, '/^FileHeader/{s=$2;sub(".csv","",s);fn=s "-" $3 ".csv"}{print > fn}' OFS="," file

danmero · September 15, 2010, 3:54pm

awk -F, '/^File/{print>($3"-"$2)}' in.file

---------- Post updated at 03:54 PM ---------- Previous update was at 03:49 PM ----------

or

awk -F, '/^File/{_=".";split($2,a,_);print>(a[1]_ $3_ a[2])}'  in.file

Leedor · September 16, 2010, 5:19am

Thanks to all who replied, I tested all options with the test data and Franklin52's came out on top.

Cheers!