I have a comma delimited text file where character fields (as opposed to numeric and date fields) are always enclosed with double quotes. Records are separated by the newline character. In a shell script I would like to split a particular field into two separate fields (enclosed with double quotes). The field I would like to split always begins with <description> and ends with </description> and is always the 5th field in a record.
e.g. I would like to convert this:
18,"A",2008-02-11,"Y","<description> some long text </description>","N",1
to this:
18,"A",2008-02-11,"Y","<description> some lo","ng text </description>","N",1
I'm not bothered where in the field the split occurs - somewhere in the middle is optimal.
I should have said in my initial post that there may be text in between the double quotes which themselves are in double quotes and may contain commas,
e.g. 18,"<description><job_title value="some text, more text" /></description>",2008-02-19,"N"
I think this makes it a lot more complicated?
I'm also having to use nawk (I'm on Solaris) as each record is likely to be more than 3000 characters (max for awk), but I think the syntax is the same/similar to awk.