need help in Parsing a CSV file and generate a new output file

Hi Scripting Gurus,
I am trying to parse a csv file and generate a new output file.
The input file will be a variable length in turns of rows and columns.
output file will have 8 columns. we have three columns from the header for each set.
just to give little bit more clarification each row will have n number of sets as seen in the input file we have two sets .
we can have two solutions one is split each row into two rows or bring set2 under set 1 and so on if we have more sets.

|-----------set 1----------|----------set 2----------------|
INPUT FILE
Header,4567,USD,BMI,,4568,USD,BMI,
Date,LEVEL,NET,TOTAL,MCAP,LEVEL,NET,TOTAL,MCAP
22-Dec-97,27.06492573,,20.9289574,,26.17032566,,20.60702136,
23-Dec-97,27.11627283,,20.96866346,,26.50090757,,20.86732796,
24-Dec-97,27.17448579,,21.01367878,,26.53347955,,20.89299111,
25-Dec-97,27.17854775,,21.02091981,,26.66484879,,20.99775394,
26-Dec-97,27.13228338,,20.98513719,,26.55463469,,20.91096371,

Desired output

Date, index_id,currency,index_pro, level, net, total, mcap
22-Dec-97, 4567, USD, BMI, 27.06492573, , 20.9289574 ,
22-Dec-97, 4568, USD, BMI, 26.17032566, , 20.60702136,
23-Dec-97, 4567, USD, BMI, 27.11627283, , 20.96866346,
23-Dec-97, 4568, USD, BMI, 26.50090757, , 20.86732796,
24-Dec-97, 4567, USD, BMI, 27.17448579, , 21.01367878,
24-Dec-97, 4568, USD, BMI, 26.53347955, ,20.89299111,
25-Dec-97, 4567, USD, BMI, 27.17854775, ,21.02091981,
25-Dec-97, 4568, USD, BMI, 26.66484879, ,20.99775394,
26-Dec-97, 4567, USD, BMI, 27.13228338, ,20.98513719,
26-Dec-97, 4568, USD, BMI, 26.55463469, ,20.91096371,

here is the script i wrote and iam having issues geting the proper out put.

#!/bin/sh

hdr=`cat inputfile.csv|grep 'Header'|awk -F"," '{ t += NF - 1 } END { print t }'`
hdr_str=`cat inputfile.csv|grep 'Header'`
hdr=`expr $hdr + 1`
while read line
do
  i=2;
  j=1;
  dte=`echo $line | awk -F"," '{ print $1 }'`
  k=2;
  str_tmp="";
  l=1;
  while [ $k -le ${hdr} ]
  do
    if [ $l -le 3 ]; then
      str_id_cur_iprv=`echo $hdr_str | awk -F"," '{ print $'$k' }'`
      if [ $l -eq 1 ]; then
        str_tmp=$str_id_cur_iprv
        l=`expr $l + 1`
        k=`expr $k + 1`
      else
        l=`expr $l + 1`
        k=`expr $k + 1`
        str_tmp=$str_tmp","$str_id_cur_iprv
      fi
    else
      break
    fi
  done
  str=${dte}","$str_tmp
  while [ $i -le ${hdr} ]
  do
    if [ $j -le 4 ]; then
      var=`echo $line | awk -F"," '{ print $'$i' }'`
      str=${str}","${var}
      i=`expr $i + 1`
      j=`expr $j + 1`
    else
      echo $str >> output_file.dat
      str=${dte}","$str_tmp
      j=1;
    fi
  done
  if [ $j -ne 1 ]; then
    echo $str >> output_file.dat
    str=${dte}","$str_tmp
  fi
done < inputfile.csv

here is the out put i get with the above script.

Header,4567,USD,BMI,4567,USD,BMI,
Header,4567,USD,BMI,4568,USD,BMI,
Date,4567,USD,BMI,LEVEL,NET,TOTAL,MCAP
Date,4567,USD,BMI,LEVEL,NET,TOTAL,MCAP
22-Dec-97,4567,USD,BMI,27.06492573,,20.9289574,
22-Dec-97,4567,USD,BMI,26.17032566,,20.60702136,
23-Dec-97,4567,USD,BMI,27.11627283,,20.96866346,
23-Dec-97,4567,USD,BMI,26.50090757,,20.86732796,
24-Dec-97,4567,USD,BMI,27.17448579,,21.01367878,
24-Dec-97,4567,USD,BMI,26.53347955,,20.89299111,
25-Dec-97,4567,USD,BMI,27.17854775,,21.02091981,
25-Dec-97,4567,USD,BMI,26.66484879,,20.99775394,
26-Dec-97,4567,USD,BMI,27.13228338,,20.98513719,
26-Dec-97,4567,USD,BMI,26.55463469,,20.91096371,

Gurus can any one take a look at my script and modify it to get the correct form.
Thanks
vkr

put your script in code tags

You can do the job with 1 awk command:

awk 'BEGIN{FS=OFS=",";print "Date, index_id,currency,index_pro, level, net, total, mcap"}
NR==1{s1=$2 FS $3 FS $4; s2=$6 FS $7 FS $8; getline; next}
{printf("%s,%s,%s,%s,%s,\n", $1,s1,$2,$3,$4)}
{printf("%s,%s,%s,%s,%s,\n", $1,s2,$6,$7,$8)}
' inputfile.csv

Regards

Hi Franklin,
Thank you very much. It works.You are a real guru.
As i am a rookie in shell scripting could you please modify the above script so that i will know the mistake that i am making.
Thanks
vkr

hi Franklin,
This is a static script. I want it to be generic where I get variable sets. it can be 2 or 3 or 4 or for that matter any number of sets side by side. could you please give me a generic script where i can generate an out put file.
I appreciate all your help.
Thanks
vkr

hi Franklin,
This is a static script. I want it to be generic where I get variable sets. it can be 2 or 3 or 4 or for that matter any number of sets side by side. could you please give me a generic script where i can generate an out put file.
I appreciate all your help.
Thanks
vkr

Try and adapt the following script :

awk '

BEGIN {
  OFS = FS = ",";
  Columns_by_set = 4;
  Infos_by_set   = 3; # Output columns count from header for each set
  print "Date","index_id","currency","index_pro",
               "level","net","total","mcap";
}

$1 == "Header" {
   Sets_by_line   = int((NF -1) / Columns_by_set);
   Fields_by_line = NF;

   for (set=1; set<=Sets_by_line; set++) {
      infos = "";
      field = 2 + (set-1)*Columns_by_set
      for (col=1; col<=Infos_by_set; col++) {
         infos = infos $(field+col-1) (col<Infos_by_set ? OFS : "");
      }
      Header_infos[set] = infos;
   }

   next;
}

$1 == "Date" {
   next;
}

Sets_by_line {
   if (NF != Fields_by_line) {
      printf("Invalid datas line %d: Fields count is %d, must be %d\n",
             NR, NF, Fields_by_line);
      next;
   }

   for (set=1; set<=Sets_by_line; set++) {
      datas = $1 OFS Header_infos[set];
      field = 2 + (set-1)*Columns_by_set
      for (col=1; col<=Infos_by_set; col++) {
         datas = datas OFS $(field+col-1);
      }
      print datas
   }
}
' inputfile

Inputfile:

Header,4567,USD,BMI,,4568,USD,BMI,
Date,LEVEL,NET,TOTAL,MCAP,LEVEL,NET,TOTAL,MCAP
22-Dec-97,27.06492573,,20.9289574,,26.17032566,,20.60702136,
23-Dec-97,27.11627283,,20.96866346,,26.50090757,,20.86732796,
24-Dec-97,27.17448579,,21.01367878,,26.53347955,,20.89299111,
25-Dec-97,27.17854775,,21.02091981,,26.66484879,,20.99775394,
26-Dec-97,27.13228338,,20.98513719,,26.55463469,,20.91096371,
Header,9876,EUR,BMI,
Date,LEVEL,NET,TOTAL,MCAP
22-Dec-97,27.06492573,,20.9289574,
23-Dec-97,27.11627283,,20.96866346,
24-Dec-97,27.17448579,,21.01367878,
25-Dec-97,27.17854775,,21.02091981,
26-Dec-97,27.13228338,,20.98513719,

Output:

Date,index_id,currency,index_pro,level,net,total,mcap
22-Dec-97,4567,USD,BMI,27.06492573,,20.9289574
22-Dec-97,4568,USD,BMI,26.17032566,,20.60702136
23-Dec-97,4567,USD,BMI,27.11627283,,20.96866346
23-Dec-97,4568,USD,BMI,26.50090757,,20.86732796
24-Dec-97,4567,USD,BMI,27.17448579,,21.01367878
24-Dec-97,4568,USD,BMI,26.53347955,,20.89299111
25-Dec-97,4567,USD,BMI,27.17854775,,21.02091981
25-Dec-97,4568,USD,BMI,26.66484879,,20.99775394
26-Dec-97,4567,USD,BMI,27.13228338,,20.98513719
26-Dec-97,4568,USD,BMI,26.55463469,,20.91096371
22-Dec-97,9876,EUR,BMI,27.06492573,,20.9289574
23-Dec-97,9876,EUR,BMI,27.11627283,,20.96866346
24-Dec-97,9876,EUR,BMI,27.17448579,,21.01367878
25-Dec-97,9876,EUR,BMI,27.17854775,,21.02091981
26-Dec-97,9876,EUR,BMI,27.13228338,,20.98513719

Jean-Pierre.

This script should works with more sets assuming your file has a fixed structure e.g. if your file has 3 sets the header should looks like:

Header,4567,USD,BMI,,4568,USD,BMI,,4569,USD,BMI,

And the data with 3 sets:

22-Dec-97,27.06492573,,20.9289574,,26.17032566,,20.60702136,,26.17032500,,20.60702199,

Script:

awk 'BEGIN{FS=",";print "Date, index_id,currency,index_pro, level, net, total, mcap"}
NR==1{
  nSets=(NF-1)/4
  j=2
  for(i=1;i<=nSets;i++) {
    s=$j FS $(j+1) FS $(j+2)
    j+=4
  }
  getline; next
}
{
  j=2
  for(i=1;i<=nSets;i++) {
    printf("%s,%s,%s,%s,%s,\n", $1,s,$j,$(j+1),$(j+2))
    j+=4
  }
}' awk.dat

Regards

awk:

awk 'BEGIN{
FS=","
print "Date, index_id,currency,index_pro, level, net, total, mcap"}
{
if(NR==1){
        index1=$2
        cur1=$3
        pro1=$4
        index2=$6
        cur2=$7
        pro2=$8
}
if(NR>=3){
        print $1","index1","cur1","pro1","$2","$3","$4","
        print $1","index2","cur2","pro2","$6","$7","$8","
}
}' file

perl:

print "Date, index_id,currency,index_pro, level, net, total, mcap\n";
open (FH,"<a");
while(<FH>){
	@arr=split(",",$_);
	if($.==1){
		$index1=$arr[1];
		$cur1=$arr[2];
		$pro1=$arr[3];
		$index2=$arr[5];
		$cur2=$arr[6];
		$pro2=$arr[7];
	}
	if($.>=3){
		print $arr[0],",",$index1,",",$cur1,",",$pro1,",",$arr[1],",",$arr[2],",",$arr[3],",\n";
		print $arr[0],",",$index2,",",$cur2,",",$pro2,",",$arr[5],",",$arr[6],",",$arr[7],",\n";
	}
}
close(FH);

Hi Franklin,
I tried to redirect the output to a file by
print "Date, index_id,currency,index_pro, level, net, total, mcap"} >> output_file.dat ( or $2 as an argument so that i can give at the command line.) but I get an an error.
could you please lhelp me how i can redirect the output to a file and at the same time i should be able to give the input file from the command line.
Thanks
vkr

Hi Aigles,
I am trying to test the script that you wrote but aim getting the following errors.
awk: syntax error near line 17 ---- for (col=1; col<=Infos_by_set; col++) {
awk: illegal statement near line 17
awk: syntax error near line 26 ----- }
awk: bailing out near line 26

Could please help me resolve this so that i will try to test and see the results.

Thanks for all your help.
vkr

An example how you can append the value of the 2nd argument of a shell script to a file with awk:

#!/bin/sh

awk -v MyVar=$2 'BEGIN{print MyVar >> "output_file.dat"}' 

The script works fine on my box (awk is gawk).
Try to replace awk by nawk or gawk.

Jean-Pierre.

Hi Gurus,
your help was a big help till now but i am in more deep trouble in solving this issue. this time the problem is more complicated.

I am trying to adapt with the above script that Jean and franklin52 wrote .

I have a problem in parsing the below mentioned type of files and obtain the desired output.
Input could be in any form.below are the few samples

Note: the set columns in the file could be in any order. but once the first set is defined in a certain order the rest of the sets will follow the same order of the columns in Input file.

INPUT File
File type1
Header, 4567, USD, SFR, 4568, EUR, SBFR,
Date, LEVEL, NET, TOTAL, MCAP, LEVEL,NET TOTAL MCAP
22-Dec-97, 27.06, 34, 20.92, 11, 26.17, 1, 20.60, 88
23-Dec-97, 27.11, 44, 20.9, 223, 26.50, 2, 20.86, 97
24-Dec-97, 27.17, 2, 21.01, 44, 26.53, 3, 20.89, 54
25-Dec-97, 27.17, 56, 21.02, 55, 26.6, 4, 20.99, 23
26-Dec-97, 27.13, 10, 20.98, 77, 26.55, 5, 20.91, 45

INPUT file Type2
file2
Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, LEVEL,TOTAL,LEVEL,TOTAL,LEVEL,TOTAL,LEVEL,TOTAL
22-Dec-97,27.06,20.92,26.17,20.60,26.06,20.92,27.06,20.92
23-Dec-97,27.11,20.96,26.50,20.86,25.06,20.92,21.06,20.92
24-Dec-97,27.17,21.01,26.53,20.89,24.06,20.92,21.06,20.92
25-Dec-97,27.17,21.02,26.66,20.99,23.06,20.92,21.06,20.92
26-Dec-97,27.13,20.98,26.55,20.91,29.06,20.92,21.064,20.92

Input file type 3
file type3
Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, NET, TOTAL,NET, TOTAL, NET,TOTAL,NET,TOTAL
22-Dec-97,27.06 20.92,26.17,20.60,26.06,20.92,27.06,20.9289574
23-Dec-97,27.11 20.96,26.50,20.86,25.06,20.92,21.06,20.9289574
24-Dec-97,27.17 21.01,26.53,20.89,24.06,20.92,21.06,20.9289574
25-Dec-97,27.17 21.02,26.66,20.99,23.06,20.92,21.06,20.9289574

file type 4

Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, MCAP, TOTAL,MCAP, TOTAL, MCAP,TOTAL,MCAP,TOTAL
22-Dec-97,27.06 20.92,26.17,20.60,26.06,20.92,27.06,20.928
23-Dec-97,27.11 20.96,26.50,20.86,25.06,20.92,21.06,20.925

it can be in any combination but i need to get the out put in the below format

Desired output format for file 4
Date, index_id, currency, index_pro, level, net, total, mcap
22-Dec-97,4567, USD, SBFR, , , 20.92,27.06
22-Dec-97,4568, EUR, SBCC, , , 20.60,26.06
22-Dec-97,4569, JPY, FACT, , , 20.92,26.06
22-Dec-97,4570, USD, FACT, , , 20.928,27.0
23-Dec-97,4567, USD, SBFR, , , 20.96, 27.11
23-Dec-97,4568, EUR, BCC, , , 20.86,26.50
23-Dec-97,4569, JPY, FACT, , , 20.92,25.06
23-Dec-97,4570, USD, FACT, , , 20.925,21.06

I appreciate if you could please help me solve this problem. Your help will be appreciated.
Thanks
VKR

The problem is more complicated.
Try and adapt the following script :

awk -v Columns="LEVEL,NET,TOTAL,MCAP"                   \
    -v Header_cols="index_id,currency,index_pro"        \
'

#
# Initializations
#

BEGIN {
  OFS = FS = ",";

  Infos_by_set = gsub(",", OFS, Header_cols) + 1;

  Columns_count = split(Columns, Columns_label, ",");
  for (i=1; i<=Columns_count; i++) {
     Columns_pos[Columns_label] = i;
  }

  gsub(/,/, OFS, Columns);
  print "Date", Header_cols, tolower(Columns);

}

#
# Header line
#

$1 == "Header" {
   gsub(/ /,"");
   Sets_by_line   = int((NF -1) / Infos_by_set);

   for (set=1; set<=Sets_by_line; set++) {
      infos = "";
      field = 1 + (set-1)*Infos_by_set
      for (col=1; col<=Infos_by_set; col++) {
         infos = infos $(field+col) (col<Infos_by_set ? OFS : "");
      }
      Header_infos[set] = infos;
   }

   next;
}

#
# Date line
#

$1 == "Date" {
   gsub(/ /,"");

   Fields_by_set  = int((NF-1) / Sets_by_line);
   Fields_by_line = 1 + Fields_by_set * Sets_by_line;

   for (i=1; i<=Columns_count; i++)
      Columns_field = 0;
   for (i=1; i<=Fields_by_set; i++) {
      if ($(i+1) in Columns_pos)
        Columns_field[Columns_pos[$(i+1)]] = i;
   }

   next;
}

#
# Datas line
#

Sets_by_line {
   if (NF != Fields_by_line) {
      printf("Invalid datas line %d: Fields count is %d, must be %d\n",
             NR, NF, Fields_by_line);
      next;
   }

   for (set=1; set<=Sets_by_line; set++) {
      field = 1 + (set-1)*Fields_by_set
      datas = $1 OFS Header_infos[set];
      for (col=1; col<=Columns_count; col++) {
         datas = datas OFS
         if (Columns_field[col])
            datas = datas $(field+Columns_field[col]);
      }
      print datas
   }
}

' inputfile

Inputfile:

Header, 4567, USD, SFR, 4568, EUR, SBFR
Date, LEVEL, NET, TOTAL, MCAP, LEVEL,NET ,TOTAL ,MCAP
22-Dec-97, 27.06, 34, 20.92, 11, 26.17, 1, 20.60, 88
23-Dec-97, 27.11, 44, 20.9, 223, 26.50, 2, 20.86, 97
24-Dec-97, 27.17, 2, 21.01, 44, 26.53, 3, 20.89, 54
25-Dec-97, 27.17, 56, 21.02, 55, 26.6, 4, 20.99, 23
26-Dec-97, 27.13, 10, 20.98, 77, 26.55, 5, 20.91, 45
Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, LEVEL,TOTAL,LEVEL,TOTAL,LEVEL,TOTAL,LEVEL,TOTAL
22-Dec-97,27.06,20.92,26.17,20.60,26.06,20.92,27.06,20.92
23-Dec-97,27.11,20.96,26.50,20.86,25.06,20.92,21.06,20.92
24-Dec-97,27.17,21.01,26.53,20.89,24.06,20.92,21.06,20.92
25-Dec-97,27.17,21.02,26.66,20.99,23.06,20.92,21.06,20.92
26-Dec-97,27.13,20.98,26.55,20.91,29.06,20.92,21.064,20.92
Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, NET, TOTAL,NET, TOTAL, NET,TOTAL,NET,TOTAL
22-Dec-97,27.06,20.92,26.17,20.60,26.06,20.92,27.06,20.9289574
23-Dec-97,27.11,20.96,26.50,20.86,25.06,20.92,21.06,20.9289574
24-Dec-97,27.17,21.01,26.53,20.89,24.06,20.92,21.06,20.9289574
25-Dec-97,27.17,21.02,26.66,20.99,23.06,20.92,21.06,20.9289574
Header,4567,USD,SBFR,4568,EUR,SBCC,4569,JPY,FACT,4570,USD,FACT
Date, MCAP, TOTAL,MCAP, TOTAL, MCAP,TOTAL,MCAP,TOTAL
22-Dec-97,27.06,20.92,26.17,20.60,26.06,20.92,27.06,20.928
23-Dec-97,27.11,20.96,26.50,20.86,25.06,20.92,21.06,20.925

Output:

Date,index_id,currency,index_pro,level,net,total,mcap
22-Dec-97,4567,USD,SFR, 27.06, 34, 20.92, 11
22-Dec-97,4568,EUR,SBFR, 26.17, 1, 20.60, 88
23-Dec-97,4567,USD,SFR, 27.11, 44, 20.9, 223
23-Dec-97,4568,EUR,SBFR, 26.50, 2, 20.86, 97
24-Dec-97,4567,USD,SFR, 27.17, 2, 21.01, 44
24-Dec-97,4568,EUR,SBFR, 26.53, 3, 20.89, 54
25-Dec-97,4567,USD,SFR, 27.17, 56, 21.02, 55
25-Dec-97,4568,EUR,SBFR, 26.6, 4, 20.99, 23
26-Dec-97,4567,USD,SFR, 27.13, 10, 20.98, 77
26-Dec-97,4568,EUR,SBFR, 26.55, 5, 20.91, 45
22-Dec-97,4567,USD,SBFR,27.06,,20.92,
22-Dec-97,4568,EUR,SBCC,26.17,,20.60,
22-Dec-97,4569,JPY,FACT,26.06,,20.92,
22-Dec-97,4570,USD,FACT,27.06,,20.92,
23-Dec-97,4567,USD,SBFR,27.11,,20.96,
23-Dec-97,4568,EUR,SBCC,26.50,,20.86,
23-Dec-97,4569,JPY,FACT,25.06,,20.92,
23-Dec-97,4570,USD,FACT,21.06,,20.92,
24-Dec-97,4567,USD,SBFR,27.17,,21.01,
24-Dec-97,4568,EUR,SBCC,26.53,,20.89,
24-Dec-97,4569,JPY,FACT,24.06,,20.92,
24-Dec-97,4570,USD,FACT,21.06,,20.92,
25-Dec-97,4567,USD,SBFR,27.17,,21.02,
25-Dec-97,4568,EUR,SBCC,26.66,,20.99,
25-Dec-97,4569,JPY,FACT,23.06,,20.92,
25-Dec-97,4570,USD,FACT,21.06,,20.92,
26-Dec-97,4567,USD,SBFR,27.13,,20.98,
26-Dec-97,4568,EUR,SBCC,26.55,,20.91,
26-Dec-97,4569,JPY,FACT,29.06,,20.92,
26-Dec-97,4570,USD,FACT,21.064,,20.92,
22-Dec-97,4567,USD,SBFR,,27.06,20.92,
22-Dec-97,4568,EUR,SBCC,,26.17,20.60,
22-Dec-97,4569,JPY,FACT,,26.06,20.92,
22-Dec-97,4570,USD,FACT,,27.06,20.9289574,
23-Dec-97,4567,USD,SBFR,,27.11,20.96,
23-Dec-97,4568,EUR,SBCC,,26.50,20.86,
23-Dec-97,4569,JPY,FACT,,25.06,20.92,
23-Dec-97,4570,USD,FACT,,21.06,20.9289574,
24-Dec-97,4567,USD,SBFR,,27.17,21.01,
24-Dec-97,4568,EUR,SBCC,,26.53,20.89,
24-Dec-97,4569,JPY,FACT,,24.06,20.92,
24-Dec-97,4570,USD,FACT,,21.06,20.9289574,
25-Dec-97,4567,USD,SBFR,,27.17,21.02,
25-Dec-97,4568,EUR,SBCC,,26.66,20.99,
25-Dec-97,4569,JPY,FACT,,23.06,20.92,
25-Dec-97,4570,USD,FACT,,21.06,20.9289574,
22-Dec-97,4567,USD,SBFR,,,20.92,27.06
22-Dec-97,4568,EUR,SBCC,,,20.60,26.17
22-Dec-97,4569,JPY,FACT,,,20.92,26.06
22-Dec-97,4570,USD,FACT,,,20.928,27.06
23-Dec-97,4567,USD,SBFR,,,20.96,27.11
23-Dec-97,4568,EUR,SBCC,,,20.86,26.50
23-Dec-97,4569,JPY,FACT,,,20.92,25.06
23-Dec-97,4570,USD,FACT,,,20.925,21.06

Jean-Pierre.

Hi Jean,
Thank you so much. I am learning so much from you.The way you share your knowledge is very much appreciative and motivates me to learn more.Once again thank you so much. YOU ARE A REAL GURU(MASTER).
VKR