I am having trouble sorting one file based on another file. I tried the grep -f function and failed. Basically what I have is two files that look like this:
File 1 (the list)
gh
aba
for
hmm
File 2 ( the file that needs to be sorted)
aba 2 4 6 7
for 2 4 7 4
hmm 1 2 7 4
gh 2 5 7 9
So file 1 is a list that has names in a particular order and I want to sort file 2 according to that order while also extracting the other columns.
So the end output would look like this.
Final file
gh 2 5 7 9
aba 2 4 6 7
for 2 4 7 4
hmm 1 2 7 4
Thanks
Phil
---------- Post updated at 03:30 PM ---------- Previous update was at 03:29 PM ----------
There has to be a one to one correspondance between file1 and file2 - ie., if file1 is missing one of the keys that is in file2, that line will not print at all.
Here is a script that uses a non-standard sort utility that admits alternate collating sequences, msort:
#!/usr/bin/env bash
# @(#) s1 Demonstrate alternate collating sequence.
# msort-home http://freshmeat.net/projects/msort
# Section 1, setup, pre-solution.
# Infrastructure details, environment, commands for forum posts.
# Uncomment export command to test script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
C=$HOME/bin/context && [ -f $C ] && . $C specimen msort
set -o nounset
pe
FILE=${1-data1}
shift
CS=${1-data2}
# Section 2, display input file and collating sequence file.
# Display sample of data file, with head & tail as a last resort.
pe " || start [ first:middle:last ]"
specimen $FILE $CS \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"
# Section 3, solution.
pl " Results:"
msort -q -n 1,1 -u n -l -c lexicographic -s $CS -1 $FILE
exit 0
producing:
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.7 (lenny)
GNU bash 3.2.39
specimen (local) 1.17
msort - ( /usr/bin/msort Apr 24 2008 )
|| start [ first:middle:last ]
Whole: 5:0:5 of 4 lines in file "data1"
aba 2 4 6 7
for 2 4 7 4
hmm 1 2 7 4
gh 2 5 7 9
Whole: 5:0:5 of 4 lines in file "data2"
gh
aba
for
hmmm
|| end
-----
Results:
gh 2 5 7
aba 2 4 6 7
for 2 4 7
hmm 1 2 7 4
If you are using Debian GNU/Linux, msort is in the repository for lenny and squeeze, but not in wheezy yet. The freshmeat site has links to a number of packages for other OSs.