How to avoid Delimiter occuring in column values in .csv file

ritesh.bhawsar · June 25, 2012, 7:17am

Hello Gurus,

I need to create a file from a .csv file extracting specific columns only.

File structure is

Column1,Column2,Column3,Column4
abcd,1234,"asdf, tew,123",123456
efgh,234,asdf,654321

My output file should have

abcd,123456
efgh,654321

Can you pls help me with the code.

Thanks

jayan_jay · June 25, 2012, 7:19am

What you have tried so far ..

Sheel · June 25, 2012, 7:24am

For this case particulary :

awk -F"," '{print $1","$NF}' inputFile

. Will not work if the last column has more than one field wrapped inside quotes.

ritesh.bhawsar · June 25, 2012, 7:28am

I tried cut -d but it takes the commas inside "" as well in count

while read line
do
   OUTLINE=`echo $line | cut -d, -f1`","`echo $line | cut -d, -f4`
   echo $OUTLINE
done < file.csv

I get following output :

Column1,Column4
abcd, tew

Thanks

methyl · June 25, 2012, 7:33am

The answer from @Sheel looks good because it finds the first and last comma-separated field (providing that both fields are numbers or strings which contain no comma characters).

ritesh.bhawsar · June 25, 2012, 7:39am

Thanks Sheel... But it's possible that I have more that one fields wrapped inside in the last column as well, I need to somehow escape commas inside "". Really appreciate if you can provide an alternative.

Sheel · June 25, 2012, 8:22am

I dont think we can do it in a single line. I can suggest you a way though: Filter out the rows having double quotes into a new file and write a command particualry for that file.

---------- Post updated at 07:22 AM ---------- Previous update was at 06:49 AM ----------

Here is another way. Replace the commas withing double quotes with a different character (say pipe) and then read the columns using comma as delimiter.

Command to replace the commas within double quotes

 awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' inputFile

You can revert the pipe to comma after getting the desired output.

ritesh.bhawsar · June 25, 2012, 8:48am

@Sheel, the code you gave is giving me syntax error

Can this be done by writing a script if a one line command is not possible.

Sheel · June 25, 2012, 8:53am

Whats the error ?

ritesh.bhawsar · June 25, 2012, 8:55am

Code:

awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' file.csv
awk: syntax error near line 1
awk: illegal statement near line 1

Sheel · June 25, 2012, 9:04am

It worked fine for me. See if there are syntactical errors in your file. Here is the script you might be looking for :

Script - read.ksh

# STEP 1 - Replace the commas within double quotes with pipe & write it to a temp file
awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' inputFile.csv > tempFile.csv
 
# STEP 2 - Get the first and last columns from temp file
while read line
do
        firstColumn=`echo $line|awk -F"," '{print $1}'`
        echo "First Column= $firstColumn"
        lastColumn=`echo $line|awk -F"," '{print $NF}'`
        echo "Last Column= $lastColumn"
 
# STEP 3 - Replace pipe with comma
 finalLastColumn=`echo $lastColumn | sed 's/\|/\,/g'`
 echo "Finally Last Column= $finalLastColumn"
done < tempFile.csv

inputFile.csv

"asdf, tew,123",abcd,123456
abcd,"asdf, tew,123",123456
abcd,1234,"asdf, tew,123"

Test Output

./read.ksh
First Column= "asdf| tew|123"
Last Column= 123456
Finally Last Column= 123456
First Column= abcd
Last Column= 123456
Finally Last Column= 123456
First Column= abcd
Last Column= "asdf| tew|123"
Finally Last Column= "asdf, tew,123"

Missed reverting pipe to comma in first column. You can add these lines after step 3

finalFirstColumn=`echo $firstColumn | sed 's/\|/\,/g'`
 echo "Finally first Column= $finalFirstColumn"