Delete columns with a specific title XXX, where the position change in each file

echo_manolis · December 19, 2016, 5:46am

Goodmorning,

I know how to cut a string and a column, and how to find a word.

I have a file with over 100 columns. All columns have a title in the first line. I have to delete all columns with the XXX title.

I can't use cut -f because the position of XXX columns change in each file, and in each file I have many of them.

That mean that I have to find the columns XXX and then to delete them... I think
Which is the bridge between these function?

I'm not expert but in general I can understand.

Ubuntum, Bash version: 4.3.46
Bash, Perl

The position of XXX columns change in each file

Input file

aaa bbb XXX ddd XXX XXX   <-- Title
123 afa 133 2e2 dqd 24f
134 feg 566 5tf erg fe4
546 rgr 135 g5r hyt grt


Output file

aaa bbb ddd   <-- Title
123 afa 2e2
134 feg 5tf
546 rgr g5r

Thank a lot.
manolis

RudiC · December 19, 2016, 6:21am

Welcome to the forum.

It is always helpful to complete a request with system info like OS and shell, preferred tools, and adequate sample input and output data to avoid ambiguities and keep people from guessing.
In these forums, similar problems (identifying a column in a file and modify it) have been discussed and solved umpteen times. Did you consider searching for and adapting these? The links at the bottom of this page might be a good start.

echo_manolis · December 19, 2016, 7:26am

Dear moderator, thanks for your tips!

I hope now that my post is ok.

Before this post I tried to use the "search" buttom and I find many other post but not that I was looking for.

Please, if you have any url were I can find my answers, please report it in this post and then you can delete it.

Thanks a lot!
manolis

RudiC · December 19, 2016, 8:18am

This OR this might point you in the right direction to find the column(s). Instead of printing them, remove them.

drl · December 19, 2016, 11:41am

Hi.

This looks like a CSV-like problem, perhaps more generally and precisely, a delimiter-separated values (also DSV) format problem: Delimiter-separated values - Wikipedia. So here is a solution that uses command csvtool :

#!/usr/bin/env bash

# @(#) s1       Demonstrate elimination of some fields, csvtool.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csvtool pass-fail

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat $E

# Extract names, remove XXX, separate remainder with commas.
keeping=$( head -1 $FILE | sed 's/ XXX//g ; s/ /,/g' )
pl " Keeping these columns:" $keeping

pl " Results:"
csvtool -t " " -u " " namedcol $keeping  $FILE |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
csvtool - ( /usr/bin/csvtool, 2014-08-06 )
pass-fail (local) 1.9

-----
 Input data file data1:
aaa bbb XXX ddd XXX XXX
123 afa 133 2e2 dqd 24f
134 feg 566 5tf erg fe4
546 rgr 135 g5r hyt grt

-----
 Expected output:
aaa bbb ddd
123 afa 2e2
134 feg 5tf
546 rgr g5r

-----
 Keeping these columns: aaa,bbb,ddd

-----
 Results:
aaa bbb ddd
123 afa 2e2
134 feg 5tf
546 rgr g5r

-----
 Verify results if possible:

-----
 Comparison of 4 created lines with 4 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.

We first get the header names, eliminate all XXX, collect into a CSV string, and tell csvtool to produce those columns (fields) that correspond to the names we kept.

Details for csvtool:

csvtool tool for performing manipulations on CSV files from sh... (man)
Path    : /usr/bin/csvtool
Version : - ( /usr/bin/csvtool, 2014-08-06 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help
Home    : https://github.com/Chris00/ocaml-csv

Install from repository when you can, otherwise see csvtool home as noted above.

Best wishes ... cheers, drl

echo_manolis · December 20, 2016, 3:28am

Oh my god !!!

I thought that was simply... ok, now I will try!

Thanks a lot guys!

RudiC · December 20, 2016, 5:15am

How about

awk -vRM="XXX" 'NR == 1 {for (i=1; i<=NF; i++) if ($i == RM) DL} {for (i in DL) $i = ""; gsub (FS FS, FS)} 1' file
aaa bbb ddd
123 afa 2e2
134 feg 5tf
546 rgr g5r

echo_manolis · December 20, 2016, 9:07am

It works !!!

I love you guys !!!

echo_manolis · December 23, 2016, 4:30pm

Dear RudiC,

your script is ok and I use it with tab files.

I would like to use it with csv files.

I think that I have to add -F or -F or -F"," or -F',' .... but I can't find the right place in your string ... does not work

Thanks in advance!

Best wishes to all of you

Aia · December 23, 2016, 5:07pm

If you input is to be separated by comas then:

awk -F, '{...}' ...

or

awk 'BEGIN{FS=","} {...}' ...

echo_manolis · December 26, 2016, 11:03am

Thanks Aia,

unfortunately I can't do it to work with comas files. I usually work on bash... and I don't know awk.

awk -vRM="xxx" 'NR == 1 {for (i=1; i<=NF; i++) if ($i == RM) DL} {for (i in DL) $i = ""; gsub (FS FS, FS)} 1' file

In {} I have to include the actions, and it is ok! ... {for / if} and {for / gsub}

With ' ' I have a block of actions...

The -vRM is a condition to the whole action, while -F, is the file type...

I try to read in internet and to search in our forum...

I think that my error is that I have two - (commands)

awk -F, -vRM

I'm right? then I have ' ... { ... } ..'

I try to include it but doesn't work...

RudiC · December 26, 2016, 11:38am

What in awk -F, -vRM... did not work?

echo_manolis · December 26, 2016, 11:43am

works but my output is a multi-space delimeted file ... no any more comas file

RudiC · December 26, 2016, 11:48am

Had you described that behaviour upfront, you could have had the solution much earlier: set the output field separator, like e.g. -vOFS=","

echo_manolis · December 26, 2016, 11:56am

You're right, I had not thought about the output type but only the input file type... thanks again !!!