Hi.
I agree that as soon as one knows that fields are involved, one should consider awk
.
However, if you are going to be dealing with csv files frequently, you might want to consider csv-specific utilities. Here's an example of one, along with a field-specific utility.
#!/usr/bin/env bash
# @(#) s1 Demonstrate field differences, csv-format file.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csvfix dwdiff
pl " Input data files:"
head data1 data2
csvfix order -fn col_1,col_2,col_3 data1 >new1
csvfix order -fn col_1,col_2,col_3 data2 >new2
pl " Results, csvfix to generate named column files in order:"
head new1 new2
pl " Results, delimited-word diff:"
dwdiff -3 -d',' new1 new2
exit 0
producing:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.7 (jessie)
bash GNU bash 4.3.30
csvfix - ( local: ~/executable/csvfix, 2014-05-17 )
dwdiff 2.0.9
-----
Input data files:
==> data1 <==
col_1,col_2,col_3
1,2,4
1,3,6
==> data2 <==
col_1,col_3,col_2,col_5,col_6
1,2,3,4,5
1,6,3,,,
-----
Results, csvfix to generate named column files in order:
==> new1 <==
"col_1","col_2","col_3"
"1","2","4"
"1","3","6"
==> new2 <==
"col_1","col_2","col_3"
"1","3","2"
"1","3","6"
-----
Results, delimited-word diff:
======================================================================
[-"2"-]{+"3"+}
======================================================================
[-"4"-]{+"2"+}
======================================================================
The csvfix
selects the named columns, placing them in order, and dwdiff
compares the fields (words) separately.
So "2" is replaced by "3", and "4" is replaced by "2". There are a number of options on dwdiff
, see man page.
Here are some details on those utilities:
csvfix Manipulate csv files (doc)
Path : ~/executable/csvfix
Version : - ( local: ~/executable/csvfix, 2014-05-17 )
Type : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Home : https://neilb.bitbucket.io/csvfix/
dwdiff a delimited word diff program (man)
Path : /usr/bin/dwdiff
Version : 2.0.9
Type : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help : probably available with --help
Repo : Debian 8.7 (jessie)
and here are some other csv-specific tools:
CSV, comma separated variable file (comma typical, it can be almost anything)
1) csvfix
https://neilb.bitbucket.io/csvfix/ (checked 2017.03)
2) csvtool
3) Text::CSV -- perl module, DIY
4) crush, Google collection
http://crush-tools.googlecode.com/files/crush-tools-2013-04.tar.gz
5) csvkit
https://github.com/wireservice/csvkit
Best wishes ... cheers, drl