Comparing lines of two different files

mira · May 16, 2011, 2:12pm

Hello,

Please help me with this problem if you have a solution.

I have two files:
<file1> : In each line, first word is an Id and then other words that belong to this Id

piMN-1 abc pqr xyz py12
niLM y12 FY4 pqs
fiRLym F12 kite red

<file2> : same as file1, but can have extra lds and also same Ids with different words belonging to that Id

niLM FY4 gt45
piMN-1 pqr ss3 abc
aiPQ44 asf ggy-9

I need an output file which checks for all the Ids of file2 in file1 and if matches, it checks for the associated words and prints only the matched words (order doesn�t matter). If the Id of file2 does not exist in file1 or do not match with any of its words, then print only the Id.

<file3>

niLM FY4 
piMN-1 pqr abc
aiPQ44

I�ll be really thankful to you for your help.

Shell_Life · May 16, 2011, 3:15pm

Here is one way of doing it:

#!/usr/bin/ksh
typeset -i mCnt
while read mLine
do
  mOutLine=""
  for mFld in ${mLine}
  do
    if [[ "${mOutLine}" = "" ]]; then
      mOutLine=${mFld}
      mTag=${mFld}
      continue
    fi
    mCnt=$(egrep -c "${mTag}.*${mFld}" target_file)
    if [[ ${mCnt} -ne 0 ]]; then
      mOutLine=${mOutLine}' '${mFld}
    fi
  done
  echo $mOutLine
done < source_file

radoulov · May 16, 2011, 3:20pm

awk 'NR == FNR {
  for (i = 1; ++i <= NF;)
    k[$1, $i] 
  next
  }
{  
  split(x, ok)
  for (i = 1; ++i <= NF;) {
    if (($1, $i) in k) 
      ok[$1] = ok[$1] ? ok[$1] FS $i : $i      
    }
  print $1, ok[$1]  
  }' file1 file2

With GNU awk you can use delete ok instead of split(x, ok) .

ygemici · May 17, 2011, 2:24pm

@Shell_Life your script contains some mistakes.
and there is a solution without awk with justdoit..

# cat file1
piMN-1 abc pqr xyz py12
niLM y12 FY4 pqs
fiRLym F12 kite red

# cat file2
niLM FY4 gt45
piMN-1 pqr ss3 abc
aiPQ44 asf ggy-9

# ./justdoit.sh
niLM FY4
piMN-1 pqr abc
aiPQ44

#!/bin/bash
while read -r linef; do
 la=();
 m1=$(echo $linef|sed -n "s/\([^ ]*\).*/\1/p ")
 while read -r lineff; do
  m2=$(echo $lineff|sed -n 's/\([^ ]*\).*/\1/p ')
if [[ "$m1" = "$m2" ]] ; then
  for i in $(echo "$lineff $linef"|sed 's/ /\n/g')
   do
   la=(${la[@]} $i)
   done
x=0
y=$(echo ${#la[@]} )
while [ $(( y -= 1 )) -gt -1 ]
do
  for i in ${la[@]}
   do
   if [ ${la[x]} = $i ] ; then
   ((c++))
   fi
   done
  if [ $c -ge 2 ] ; then
  added=(${added[@]} ${la[x]})
  fi
  for i in ${added[@]}
   do
   if [ $i = ${la[x]} ] ; then
   ((adc++))
   fi
  done
  if [[ $adc -gt 1 ]] ; then
   ar=(${ar[@]} ${la[x]})
  fi
  adc=0 ;((x++)); c=0
done
 echo ${ar[@]}
 ne=0
else
 ne=1
fi
done<file1
if [ $ne -eq 1 ] ; then
 if [[ ${ar[0]} != $m1 ]] ; then
  echo $m1
 fi
fi
ar=();eq=1
done<file2

regards
ygemici