Inverse of Cut

landossa · February 29, 2012, 10:14pm

Hi List,

I have a CSV file which I am manipulating. Each time I receive the CSV for processing, there will be extra columns of data.

Here's what I want to do; I want to be able to remove specific columns from the CSV data file, but keep the remaining columns, where the remaining columns are variable in how many columns there are. (there will be more columns each time the script is run on new data)

Lets say I want to remove columns 2,3,4,5,10 and keep the rest

Normally, in a fixed length CSV, which has the same amount of columns every time it is processed, say 20 columns each time, I would use the following command to achieve this: (thereby specifying columns I do want)

cut -d"," -f 1,6,7,8,9,11,12,13,14,15,16,17,18,19,20

But since the amount of columns will increase each time I have to run this process, then I need a means of specifying columns that I don't want (inverse cut) rather than what I do want. Perhaps this could be achieved in sed or awk..

Has anyone got any ideas?

Any help much appreciated.

thanks,
land

Chubler_XL · February 29, 2012, 10:34pm

If your cut supports --complement then:

cut -d, --complement -f2-5,10 infile

Otherwise, try this:

awk -F, -vR="2,3,4,5,10" 'BEGIN{split(R,A,",");for(v in A) S[A[v]]=1}
{for(i=1;i<=NF;i++)if(!(i in S)) printf $i " ";print ""}' infile

---------- Post updated at 01:34 PM ---------- Previous update was at 01:29 PM ----------

Or for something a bit more fancy:

awk -F, -vR="2-5,10" '
BEGIN {
 split(R,A,",");
 for(v in A) if (split(A[v],V,"-") > 1)
 for(i=V[1];i<=V[2];i++)S=1
 else S[V[1]]=1}
{for(i=1;i<=NF;i++)if(!(i in S)) printf $i " ";print ""}' infile

landossa · February 29, 2012, 10:57pm

Thanks for the comprehensive options Chubler.. I'll do some testing with this next time I'm logged in.

alister · March 1, 2012, 9:05am

chubler_xl:

If your cut supports --complement then:

cut -d, --complement -f2-5,10 infile

awk -F, -vR="2,3,4,5,10" 'BEGIN{split(R,A,",");for(v in A) S[A[v]]=1}
{for(i=1;i<=NF;i++)if(!(i in S)) printf $i " ";print ""}' infile

awk -F, -vR="2-5,10" '
BEGIN {
 split(R,A,",");
 for(v in A) if (split(A[v],V,"-") > 1)
 for(i=V[1];i<=V[2];i++)S=1
 else S[V[1]]=1}
{for(i=1;i<=NF;i++)if(!(i in S)) printf $i " ";print ""}' infile

You can easily do this with cut itself, using standard functionality:

cut -d, -f 1,6-9,11-

Regards,
Alister