Help on writing data from 2 different files to one based on a common factor

Hello all,

I have 2 text files.
For example:
File1.txt contains data

A
B
C
D
****NEXT****
X
Y
Z
****NEXT****
L
M
N

and File2.txt contains data

P
Q
X
****NEXT****
E
F
G
B
****NEXT**** 
J
K
L

As you can see the data is grouped and if you observe the 2 files, there are common data in each group - X,B,L is common in both files. Now based on these 2 files data and common factor I move to a third file in this format

B A B C D E F G
X X Y Z P Q
L L M N J K

Is this insane request even possible in shell script. Please help.

I'm lost. What goes where based on which criteria?

The 2 files contains the data one below the other as shown in the examples above and if you take a look at both the files, the common data is available in both the text files with the ***NEXT*** indicating that a group ends....

So I need the data from both the files with the common factor (data) as the catalyst to merge the group and write into a third file.

Example once again: 'B' is common in both files which is termed as a group
ABCD and EFGB... so these 2 needs to be merged based on the common factor B and written into a third file with the data now shown B A B C D E F G with the common data first and the remaining data in one line. HTH

It's lit bit lenghty :smiley:

try

$ awk -v a=1 -v b=1 'NR==FNR{if($0 ~ /****NEXT****/){a++}else{A[$0]++;X[a]=X[a]?X[a] FS $0 : $0}next}{
if($0 ~ /****NEXT****/){b++}else{Y=Y?Y FS $0 : $0}}END{
for(i in X){n=split(X,P)
for(j in Y){
for(t=1;t<=n;t++){
if(Y[j] ~ P[t]){gsub(P[t],"",Y[j]);print P[t],X,Y[j]}}}}}' file1 file2
 
B A B C D E F G
X X Y Z P Q
L L M N J K
 

shortened

Thanks Pamu. But I get this error

couldn't set locale correctly
couldn't set locale correctly
awk: syntax error near line 1
awk: bailing out near line 1

Why then is it B A B C D E F G and not B B A C D E F G ? Why move the second group's B up front, but not the first one's?

Use /usr/xpg4/bin/awk or nawk on Solaris.

I need the original order (in the first 2 files) as it is

---------- Post updated at 06:55 AM ---------- Previous update was at 06:53 AM ----------

Pamu - I am a bit naive when it comes to shell scripting, so thanks for your patience... I changed to nawk and this is the result

couldn't set locale correctly
couldn't set locale correctly
nawk: illegal primary in regular expression ****NEXT**** at ***NEXT****
 source line number 1
 context is
        NR==FNR{if($0 ~ >>>  /****NEXT****/ <<< ){a++}else{A[$0]++;X[a]=X[a]?X[a] FS $0 : $0}next}{

---------- Post updated at 06:59 AM ---------- Previous update was at 06:55 AM ----------

And when I use /usr/xpg4/bin/awk

couldn't set locale correctly
couldn't set locale correctly
/usr/xpg4/bin/awk: /****NEXT****/: ?, *, +, or { } not preceded by valid regular expression  Context is:
>>>     11NR==FNR{if($0 ~ /****NEXT****/        <<<

Try with removing those *

 
$ nawk -v a=1 -v b=1 'NR==FNR{if($0 ~ /NEXT/){a++}else{A[$0]++;X[a]=X[a]?X[a] FS $0 : $0}next}{
if($0 ~ /NEXT/){b++}else{Y=Y?Y FS $0 : $0}}END{
for(i in X){n=split(X,P)
for(j in Y){
for(t=1;t<=n;t++){
if(Y[j] ~ P[t]){gsub(P[t],"",Y[j]);print P[t],X,Y[j]}}}}}' file1 file2
 
B A B C D E F G
X X Y Z P Q
L L M N J K
 
1 Like

Awesome, it worked... you are a rock star.. thanks a lot

awk '
  /NEXT/||!NF{
    next
  }
  {
    $1=$1
    for(i=1;i<=NF;i++) {
      if(NR==FNR){
        p=$0
        sub($i " *",x,p)
        A[$i]=p
      } 
      else if($i in A) print $i,$0,A[$i]
    }
  }
' RS=\* file2 file1

Try

awk -v a=1 -v b=1 'NR==FNR{if($0 ~ /NEXT/){a++}else{A[$0]++;X[a]=X[a]?X[a] FS $0 : $0}next}{
if($0 ~ /NEXT/){b++}else{Y=Y?Y OFS $0 : $0}}END{
for(i in X){n=split(X,P)
for(j in Y){
for(t=1;t<=n;t++){
if(Y[j] ~ P[t]){gsub(P[t],"",Y[j]);gsub(" ",",",X);print P[t],X,Y[j]}}}}}' OFS="," file1 file2
1 Like

Try with /usr/xpg4/bin/awk if on Solaris

awk '
  /NEXT/||!NF{
    next
  }
  {
    $1=$1
    for(i=1;i<=NF;i++) {
      if(NR==FNR){
        p=$0
        sub($i " *",x,p)
        A[$i]=p
      } 
      else if($i in A) print $i,$0,A[$i]
    }
  }
' RS=\* OFS=, file file1