How to avoid Delimiter occuring in column values in .csv file

Hello Gurus,

I need to create a file from a .csv file extracting specific columns only.

File structure is

Column1,Column2,Column3,Column4
abcd,1234,"asdf, tew,123",123456
efgh,234,asdf,654321

My output file should have

abcd,123456
efgh,654321

Can you pls help me with the code.

Thanks

What you have tried so far ..

For this case particulary :

awk -F"," '{print $1","$NF}' inputFile

. Will not work if the last column has more than one field wrapped inside quotes.

I tried cut -d but it takes the commas inside "" as well in count

while read line
do
   OUTLINE=`echo $line | cut -d, -f1`","`echo $line | cut -d, -f4`
   echo $OUTLINE
done < file.csv

I get following output :

Column1,Column4
abcd, tew

Thanks

The answer from @Sheel looks good because it finds the first and last comma-separated field (providing that both fields are numbers or strings which contain no comma characters).

Thanks Sheel... But it's possible that I have more that one fields wrapped inside in the last column as well, I need to somehow escape commas inside "". Really appreciate if you can provide an alternative.

I dont think we can do it in a single line. I can suggest you a way though: Filter out the rows having double quotes into a new file and write a command particualry for that file.

---------- Post updated at 07:22 AM ---------- Previous update was at 06:49 AM ----------

Here is another way. Replace the commas withing double quotes with a different character (say pipe) and then read the columns using comma as delimiter.

Command to replace the commas within double quotes

 awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' inputFile 

You can revert the pipe to comma after getting the desired output.

@Sheel, the code you gave is giving me syntax error :frowning:

Can this be done by writing a script if a one line command is not possible.

Whats the error ?

Code:

awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' file.csv
awk: syntax error near line 1
awk: illegal statement near line 1

It worked fine for me. See if there are syntactical errors in your file. Here is the script you might be looking for :

Script - read.ksh

# STEP 1 - Replace the commas within double quotes with pipe & write it to a temp file
awk -F'"' 'BEGIN{OFS="\""}{gsub(",","|",$2);print}' inputFile.csv > tempFile.csv
 
# STEP 2 - Get the first and last columns from temp file
while read line
do
        firstColumn=`echo $line|awk -F"," '{print $1}'`
        echo "First Column= $firstColumn"
        lastColumn=`echo $line|awk -F"," '{print $NF}'`
        echo "Last Column= $lastColumn"
 
# STEP 3 - Replace pipe with comma
 finalLastColumn=`echo $lastColumn | sed 's/\|/\,/g'`
 echo "Finally Last Column= $finalLastColumn"
done < tempFile.csv

inputFile.csv

"asdf, tew,123",abcd,123456
abcd,"asdf, tew,123",123456
abcd,1234,"asdf, tew,123"

Test Output

./read.ksh
First Column= "asdf| tew|123"
Last Column= 123456
Finally Last Column= 123456
First Column= abcd
Last Column= 123456
Finally Last Column= 123456
First Column= abcd
Last Column= "asdf| tew|123"
Finally Last Column= "asdf, tew,123"

Missed reverting pipe to comma in first column. You can add these lines after step 3

finalFirstColumn=`echo $firstColumn | sed 's/\|/\,/g'`
 echo "Finally first Column= $finalFirstColumn"