CSV to SQL insert: Awk for strings with multiple lines in csv

Hi Fellows,

I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem:

I have csv files in the following pipe-delimited format:

Column1|Column2|Column3|Column4|NEWLINE
Address Type|some descriptive text|"Mailing
Physical
Shipping"|some text here|NEWLINE
Street Number/Name|some description for rec 2|None|some text here|NEWLINE
City|description for rec 3|None|some text here too|NEWLINE
Zip|description for rec 4|text here|sdfla sdflsdhlsdfsk|NEWLINE
Record 5|description|"Main - Main record
Administrative - Administrative phone number
HR - HR phone number
Services - Services line
Fax - Facsimile phone number
Other"|some text here|NEWLINE
Grades Offered|description here|"IT - Infant/toddler
PR - Preschool
PK - Prekindergarten
TK - Transitional Kindergarten
KG - Kindergarten
01 - First grade
02 - Second grade
UG - Ungraded
Other"|some text here|NEWLINE

The output format I am looking for is the following:

insert into MyTBl values ( Column1,Column2,Column3,Column4,NEWLINE);
insert into MyTBl values ( 'Address Type','some descriptive text','Mailing \nMailing \n Shipping' ,'some text here','NEWLINE');
insert into MyTBl values ( 'Street Number/Name','some description for rec 2','None','some text here','NEWLINE');
insert into MyTBl values ( 'City','description for rec 3','None','some text here too','NEWLINE');
insert into MyTBl values ( 'Zip','description for rec 4','text here','sdfla sdflsdhlsdfsk','NEWLINE');
insert into MyTBl values ( 'Record 5','description','Main - Main record \nAdministrative - Administrative phone number \nHR - HR phone number \nServices - Services line \nFax - Facsimile phone number \nOther','some text here','NEWLINE');
insert into MyTBl values ( 'Grades Offered','description here','IT - Infant/toddler \nPR - Preschool \nPK - Prekindergarten \nTK - Transitional Kindergarten \nKG - Kindergarten \n01 - First grade \n02 - Second grade \nUG - Ungraded \nOther','some text here','NEWLINE');

I am having difficulty preserving the spaces in the fourth column data which is often a multiple line text in double quotes (strings). I need to preserve these new lines in that column in my database table.

I wonder if someone can help in using awk or sed to do this. I use cygwin on windows platform.

thanks a lot!

Well, for starters, you need to adjust the CSV into the right number of columns...

Does this line

Column1|Column2|Column3|Column4|NEWLINE

actually exist in your input file?

The do not actually are in each of the files but some files have headers.
Thank you for your message.

How should I tell apart the files with headers from the files without?

Put this in a text file:

BEGIN { FS="|"; OFS="," }
# Append 'short' lines into one longer line
NF<5 { if(S){ $0="\\n" $0; } S=S $0; $0=S }
# When it fially ends in "NEWLINE", we know we have an entire record
$NF == "NEWLINE" {
        for(N=1; N<=NF; N++)
        # Add single quotes to things lacking double quotes
        if(substr($N,1,1) != "\"")
                $N = "'" $N "'";

        print "insert into MyTBI values("$1,$2,$3,$4,$5");";
        S="";
}

and use it like this:

$ awk -f myawk.txt < data
insert into MyTBI values('Column1','Column2','Column3','Column4','NEWLINE');
insert into MyTBI values('Address Type','some descriptive text',"Mailing\nPhysical\nShipping",'Unchanged from Version 1','NEWLINE');
insert into MyTBI values('Street Number/Name','some description for rec 2','None','some text here','NEWLINE');
insert into MyTBI values('City','description for rec 3','None','some text here too','NEWLINE');
insert into MyTBI values('Zip','description for rec 4','text here','sdfla sdflsdhlsdfsk','NEWLINE');
insert into MyTBI values('Record 5','description',"Main - Main record\nAdministrative - Administrative phone number\nHR - HR phone number\nServices - Services line\nFax - Facsimile phone number\nOther",'Unchanged from Version 1','NEWLINE');
insert into MyTBI values('Grades Offered','description here',"IT - Infant/toddler\nPR - Preschool\nPK - Prekindergarten\nTK - Transitional Kindergarten\nKG - Kindergarten\n01 - First grade\n02 - Second grade\nUG - Ungraded\nOther",'Unchanged from Version 1','NEWLINE');
$

Does | ever happen inside double-quotes? That won't be handled properly.

Headers can be suppressed in csv generation process.
Pipes never occur in any records. That's why they are used as a delimiter.

Thank you Corona. That works great.
kind regards!