Hello All,
I need to convert a csv file to pipeline delimiter file in UNIX. The data in file itself contains comma with double qouted qualifier apart from the comma separator. Let me know how to do it. Appreciate any help if awk can be used to do it.
WIll that work for all. I mean this file will be extracted on monthly basis and it's not mandatory that the columns which now have double qoutes will have double qoutes when i download the file next month. Is that a generic code?
My point is the script should replace comma with pipeline at the same time it should check whether the column has comma inside the double qouted qualifier for each record and shouldnt replace those. A column will have double qoutes in the data only if it has comma inside it. Next month i may get different column which has comma inside it and being double qouted.
Please use code tags for code and data sample.
And always provide every details about input and desired output.
Please try to give your desired output.
Sorry i didnt try that time. which i will take care in future before posting any thread. Wow Pamu ... Awesome.. great work.. that really worked.
Just a small help if the script can erase the double qoutes after the file has been converted to pipeline.
And if you have time please can you explain how the script works.
awk -F '"' 'NF>1{ for(i=1;i<=NF;i+=2){gsub(",","|",$i)}} # For FS='"' (doble quotes), If NF(number of fields) is more than 1 then use for loop and replace "," with "|" for all odd entries of fields.
Odd entries marked as Red.
Menstrain,A0022699,"Woodstock,IL,2",A0013195,"Rockford,IL"
NF==1{gsub(",","|")}1' OFS='"' file # When file don't have any FS('"') in it then NF==1. then replace all the entries of "," with "|". and Set OFS='"' to regain all double quotes.
@Scrutinizer - That's cool piece of awk with the help RS and ORS..
Thanks very much Pamu.... for taking your time and explaining. @Scrutinizer... great work.. seems you guys a Einstein of awk.
Thanks very much for the help. I have one more script to be created .. can i raise a different thread or continue in this? Sorry i am pretty new to this very helpful website..
The difference is that Pamu uses a double quote to split fields in a record or line ( FS=\" ), and a for loop to go to every odd field in a record and remove the commas, whereas my version uses RS=\" to split the input file into small records and only removes the commas in odd records ( NR%2 ), which are the records that lie outside the double quotes...
urx02:mmadusr:/development/mma/az/data> awk 'NR%2{gsub(/,/,"|")}1' RS=\" ORS= DCIA_C150_R01_PLN_SEG_DATA.TXT > pp1.txt
awk: Input line
0000030075,IPA,4,AE cannot be longer than 3,000 bytes.
The input line number is 44. The file is DCIA_C150_R01_PLN_SEG_DATA.TXT.
The source line number is 1.
It could be that there are records that do not contain double quotes and that the record length therefore becomes too long for this version of awk on HP-UX.