Shell script to extract data from csv file

Hi everyone,

I have a csv file which has data with different heading and column names as below.

 
Static Data Ingested
,,,,,,,,,,,,Known Explained Rejections
Column_1,column_2,Column_3,Column_4,,Column_6,Column_7,,% Column_8,,Column_9 ,Column_10 ,
viv,Public_456,08/28/2013,1234566,,08/31/2013,5896325,,0.00,,0,0
SDS,Public_ddd,08/28/2013,589652,,08/31/2013,2365896,,0.00,,0,0
 
Static Data extracted
Column_1,Column_2,Column_3,Column_4,,Column_6
rms,k2c,08/28/2013,1234566,08/31/2013,5896325,0.00,0,0
SDS,k3c,08/28/2013,589652,08/31/2013,2365896,0.00,0,0

In the first one there are 10 columnwhere columns are separated by either , or ,,
Static Data Ingested is the heading for first data.

Similarily for second Static Data extracted is the heading and has 6 columns all separated by ,.

i want to write a shell script to extract 1 and 6-10 columns from first one and 1 and 4-6 from second.
Please help.

In the first section you could use ",*" ad the field separator and "," in the second section, but the number of columns does not appear to be 10 in the first section, not 6 in the second section.

Could you give a sample of the output?

Try this (but I'm afraid your input files' structure is somewhat inconsistent):

awk 'FNR==1 {f++; next} f==1 {print $1, $6, $7, $8, $9, $10} f==2 {print $1, $4, $5, $6}' OFS=, FS=",+" file1 file2
,,,,,
Column_1,Column_7,% Column_8,Column_9,Column_10 ,
viv,5896325,0.00,0,0,
SDS,2365896,0.00,0,0,
Column_1,Column_4,Column_6,
rms,1234566,08/31/2013,5896325
SDS,589652,08/31/2013,2365896

Sample output:

Static Data Ingested:
Column_1,Column_6,Column_7,,% Column_8,,Column_9 ,Column_10
viv           5896325,  0.00      ,,    0          ,,0            ,1

Static Data extracted
Column_1,Column_2,Column_3,Column_4,Column_6
rms,k2c,08/28/2013,1234566,08/31/2013,5896325,0.00,0,0
SDS,k3c,08/28/2013,589652,08/31/2013,2365896,0.00,0,0

That output can't be obtained from your sample input files with the specifications gieven in post#1; e.g. if the field separator is allowed to be ;; . Where's "Known Explained Rejections"? How come $1, $4, $5, and $6 are more that four columns?

Agree with RudiC
I think you are confused about the format of the input file. Also, your sample output doesn't make any sense with your original request.

--ahamed

ok..even if the files are only , separated how can i extract the output as below:

Static Data Ingested:
Column_1,Column_6,Column_7,% Column_8,Column_9 ,Column_10
viv           5896325,  0.00      ,    0          ,0            ,1

Static Data extracted:

Column_1,Column_2,Column_3,Column_4,Column_6
rms,k2c,08/28/2013,1234566,08/31/2013,5896325,0.00,0,0
SDS,k3c,08/28/2013,589652,08/31/2013,2365896,0.00,0,0

where Static Data Ingested and Static Data extracted are theheadings of the respective records.

Again your output file description does not match your output sample nor can the output file be obtained from your sample input files.

Please do not leave people guessing. Show a representative sample of input, desired output, attempts at a solution and specify OS and versions being used, or this thread will be closed.

Hi everyone,

I have a csv file which has data with different heading and column names as below.It has few null columns

Static Data Ingested
,,,,,,,,,,,,Known Explained Rejections
Column_1,column_2,Column_3,Column_4,,Column_6,Column_7,,% Column_8,,Column_9 ,Column_10.
viv,Public_456,08/28/2013,1234566,,08/31/2013,5896325,,0.00,,0,1
SDS,Public_ddd,08/28/2013,589652,,08/31/2013,2365896,,0.00,,0,0
Static Data extracted
Column_1,Column_2,Column_3,Column_4,Column_5
rms,k2c,0.00,0,0
SDS,k3c,0.00,0,1

The above two records are present in the same csv file.
First data heading is Static Data Ingested and is at line 1.
At line 2 is ,,,,,,,,,,,,Known Explained Rejections
Line 3 has column names and line 4,5 and so on has multiple rows containing the values.

After few lines/rows say 20 start the second heading Static Data extracted,then its columns and then its values.

Now i want to check if in first heading column 9 and 10 have any other value than 0,output should be as below:

Sample output:
Static Data Ingested:
Column_1,column_2,Column_3 Column_9 ,Column_10
SDS,k3c,0.00,0,1
Static Data extracted
Column_1,Column_2,Column_3,Column_4,Column_5
viv,Public_456,08/28/2013,0,1

Similarily for second heading 0's should be checked at column 4 and 5.
All the out for both first and second should be sent in a mail containing both the headings(Static Data Ingested and Static Data extracted) along with their columna nd respective values.

I am completely lost by your description of this problem.

Your "Static Data Ingested" input has 12 fields with some columns unlabeled and some with labels that do not match what is in your desired output. The output that you say is expected for "Static Data Ingested" has four fields with headings Column_1 , column_2 , Column_3 Column_9 , and Column_10 , but you show five comma separated output fields. There is no heading column_2 (with a lowercase "c") in your input, there is no heading Column_3 Column_9 in your input, and there is a Column_10. (with a trailing period) in your input, but not a Column_10 heading. The data line in your input starting with SDS :

SDS,Public_ddd,08/28/2013,589652,,08/31/2013,2365896,,0.00,,0,0

has zero (or null) values in input columns 9 and 10 and in the columns labeled Column_9 (with the trailing space) and Column_10. (with the trailing period). But, you said there should only be output for this line if the values are not 0. And, how did you get the output:

SDS,k3c,0.00,0,1

from the input line above.

Then, I don't see any way to get the output:

viv,Public_456,08/28/2013,0,1

from the input:

rms,k2c,0.00,0,0
SDS,k3c,0.00,0,1

in the "Static Data extracted" sections of your input and output files???

What have you tried so far to solve this problem?

How are the field numbers or field names to be evaluated supposed to be entered into whatever script is supposed to perform this task for you?

How are the field numbers or field names that are supposed to be output entered into whatever script is supposed to perform this task for you?

Is an empty field to be treated as having value 0?

Is a field containing 0.00 to be treated as 0? (I.e., is it a numeric comparison or a string comparison?)

Threads merged

Can anyone help me how to get the data from the files along with respective headers.sorry for the confusion..again is the below of input file and output file.,, indicates olumn have null values.

Static Data Ingested
,,,,,,,,,,,,Known Explained Rejections
Column_1,column_2,Column_3,Column_4,column_5,Column_6,Column_7,% Column_8,Column_9 ,Column_10.
viv,Public_456,08/28/2013,1234566,,08/31/2013,,0.00,,0,1
SDS,Public_ddd,08/28/2013,589652,,08/31/2013,,0.00,,0,0
Static Data extracted
Column_1,Column_2,Column_3,Column_4,Column_5
rms,k2c,0.00,0,0
SDS,k3c,0.00,0,1

The output should display only the records having column 10 as 1 in first heading(static data ingested) and column 5 as 1 in second heading(static data extracted).The output should be as below.

Static Data Ingested:
Column_1,column_2,Column_3,Column_4,column_5,Column_6,Column_7,% Column_8,Column_9 ,Column_10
viv,Public_456,08/28/2013,1234566,,08/31/2013,,0.00,,0,1

Static Data extracted
Column_1,Column_2,Column_3,Column_4,Column_5
SDS,k3c,0.00,0,1

In the input file at line 1 is heading,at line 2 another heading and line 3 are the column names and line 4 are the values of columns.
Then after few lines say 10 again there is a new heading then column names and then column values.

Your data in the first part has 11 columns, not 10.

So using column 11 in the variable s :

$ awk '/Static Data extracted/{s=t; getline; print} NR==3||$s==1' s=11 t=5 FS=, file
Column_1,column_2,Column_3,Column_4,column_5,Column_6,Column_7,% Column_8,Column_9 ,Column_10.
viv,Public_456,08/28/2013,1234566,,08/31/2013,,0.00,,0,1
Column_1,Column_2,Column_3,Column_4,Column_5
SDS,k3c,0.00,0,1

Please make sure your description and your sample data match...

Hi..Thnaks for the reply.
But how we will get two headings both

Static Data Ingested and Static Data extracted 

each having its own columns and its respective values.

Also consider there are 11 columns in total in first heading.

Like so?

awk '/Static Data extracted/{s=t; getline n; print $0 RS n} NR==1||NR==3||$s==1' s=11 t=5 FS=, file