extracting information from multiple files

Hello there,

I am trying to extract (string) information ( a list words) from 4 files and then put the results into 1 file. Currently I am doing this using grep -f list.txt file1 . and repeat the process for the other 3 files. The reasons i am doing that (a) I do know how to code (b) each file has a header that must be include with the string (c) I have to put the word and underneath it the results from 4 files and if there is no results from one and/or all I need to write a specific sentence like there is no results from file No. Below I will explain what I am trying to achieve.

(1) The list.txt
Gene1
Gene2
Gene3

(2) file1.txt
Chromosome Position Genes Mutation
1 251565465 Gene1 T/G
1 215465511 Gene3 G/A

(3) file 2.txt
Chromosome Position Genes Protein
1 251565471 Gene1 Damaged
1 215465614 Gene2 Pass

(4) file 3 and file 4.txt with different results

I want get the results file from the list.txt in a text as the format below
---
Gene1
(add a sentence) "From Mutation point of view" or file name
Chromosome Position Genes Mutation
1 251565465 Gene1 T/G
(add a sentence) "From Protein point of view" or file name
Chromosome Position Genes Protein
1 251565471 Gene1 Damaged
(add a sentence) "There was No results from file 3 and 4" or file name
(a gap, the follow the list with the next word)
Gene2
(add a sentence) "No resultsFrom Mutation point of view" or file name
(add a sentence) "From Protein point of view" or file name
Chromosome Position Genes Protein
1 215465614 Gene2 Pass

---

Any suggestions ?
I searched for grep for multiple files on the forum but it seems that perl will be needed and I am not an expert in coding.

See if this will help...

awk '
FILENAME != "list"{
        if(filename!=FILENAME){
                filename=FILENAME;
                a[FILENAME"HDR"]=$0;
                next;
        }
        a[FILENAME$3]=$0;
        f[FILENAME]++;
        next;
}
{
        print $0
        for(i=1;i<=length(f);i++){
                print "File : "i
                print a[i"HDR"]
                print a[i$0]?a[i$0]:"No info found"
        }
        print "\n"
}
'  file1.txt file2.txt list

Make sure list is the last file in the argument.

--ahamed

1 Like

Thanks a lot ahmed i will give it ago and let you know.

Happy Eid

cheers

---------- Post updated at 04:38 PM ---------- Previous update was at 03:11 AM ----------

Hi again,

I tried the command above, and found there is No Info found for the list file. However, this is not the case when I take a word from the list file and do grep Gene1 file1.txt and I got results (The whole strings that I want). Also I wanted to know if I can put the result into an output file since the results will be too much to copy the displayed resutls. In addition when i get no info found for a specific file e.g
-
Gene000055930
File : 1

No info found
File : 2

No info found
File : 3

No info found
-
Is there a way I can modify the File : 1 into No resutls from "Mutation point of view"

cheers

There was a bug... Corrected it with changes you have requested... Try this...

#!/bin/bash
awk '
FILENAME != "list"{
        if(filename!=FILENAME){
                filename=FILENAME;
                a[FILENAME"HDR"]=$0;
                next;
        }
        a[FILENAME$3]=$0;
        f[FILENAME]++;
        next;
}
{
        print $0
        for(i in f){
                if(a[i$0]){
                print "From "a[i"HDR"]" point of view"
                print a[i"HDR"]
                print a[i$0]
                }else{
                        print "No results from "a[i"HDR"]" point of view"
                }
        }
        print "\n"
}
'  file*.txt list > output

--ahamed

hi Ahmed,

I tried the code got the output file with the headers from each however No results from all the files which when checked manually i got what i query on the list file. strange, i came to believe that only grep is good in these situation ? what do you recon?