Compare and print out data only appear in file 1 problem

patrick87 · September 27, 2010, 12:52am

Below is the data content of file_1 and file_2:
file_1

>sample_1
FKGJGPOPOPOQA
ASDADWEEWERE
ASDAWEWQWRW
ASDASDASDASDD

file_2

>sample_1
DRTOWPFPOPOQA
ASDADWEEASDF
ASDADRTYWRW
ASDASDASDASDD

I got try the following perl script. Unfortunately, it can't give my desired output result

cat file_1
>sample_1
FKGJGPOPOPOQA
ASDADWEEWERE
ASDAWEWQWRW
ASDASDASDASDD

cat file_2
>sample_1
DRTOWPFPOPOQA
ASDADWEEASDF
ASDADRTYWRW
ASDASDASDASDD

perl -e ' ($file1, $file2) = @ARGV; $printed = 0; open F2, $file2; while (<F2>) { $h2{$_}++ }; $count2 = $.; open F1, $file1; while (<F1>) { if (! $h2{$_}) { print $_; $printed++; } } $count1 = $.; warn "\nRead $count1 lines from $file1 and $count2 lines from $file2.\nPrinted $printed lines found in $file1 but not in $file2\n\n" ' file_1 file_2 > file_3

cat file_3
FKGJGPOPOPOQA
ASDADWEEWERE
ASDAWEWQWRW

Desired output file content:

FKGJGPO
WERE
WEWQ

The perl command that I used print out the whole content instead of just print out the specific special content that I interested in file_1:(
Thanks a lot for any advice.

rdcwayx · September 27, 2010, 2:09am

$ awk -F "" '
NR==FNR{a[NR]=$0;next} 
!/>/ { 
          split(a[FNR],b,""); 
          {     for (i=1;i<=NF;i++) {if ($i!=b) printf b}
                printf RS
          }
      } 
' file1 file2

FKGJGO
WERE
WEWQ

patrick87 · September 27, 2010, 3:44am

Thanks again for your help, rdcwayx
Do you got any idea regarding my another question facing at the following link?

Thanks first ^^

patrick87 · September 28, 2010, 6:42am

Hi rdcwayx,
Do you got any idea to archive the below goal?
file_1

>sample_1
FKGJGPOPOPOQA
ASDADWEEWERE
ASDAWEWQWRW
ASDASDASDASDD

file_2

>sample_1
ASDFRPOPOPPWE
ASDADWEERTTY
ASDAPERTWRW
ASDASDASDASDD

Desired output:

FKGJG     OQA
        WERE
    WEWQ

My purpose just replaced those similar content with either "empty" or "tab" delimiter in order to separate those special content in every line.
eg.
I prefer the result look like this

FKGJG     OQA

Instead of

FKGJGOQA

Thanks again and a lot for your advice all the times, rdcwayx

rdcwayx · September 28, 2010, 6:54am

awk -F "" '
NR==FNR{a[NR]=$0;next}
!/>/ {
          split(a[FNR],b,"");
          {     for (i=1;i<=NF;i++) {printf ($i!=b)?b:" "}
                printf RS
          }
      }
' file1 file2

FKGJG     OQA
        WERE
    W WQ

michaelrozar17 · September 28, 2010, 7:14am

Could you please explain the code..

patrick87 · September 28, 2010, 7:20am

Hi rdcwayx,
Really thanks for your help in awk

rdcwayx · September 28, 2010, 7:54am

awk -F "" '                                    # set Field Separator to null to split the letters.
NR==FNR{a[NR]=$0;next}             # read the first file (file1) into array a , array index is line number.
!/>/ {                                          # don't touch the line begin with > 
split(a[FNR],b,"");                          # split the array a[current file's line number] to array b
{ for (i=1;i<=NF;i++) {if ($i!=b) printf b}           # compare each letter in array b with file2, if not same, print it.
printf RS                                       # print a record Separator (return).
} 
' file1 file2