Match words and fetch data in front of it in second column

Hi all,

I have 2 files

one file contain data like this in one column

AST3
GSTY4
JST3

second file containign data like this in 2 columns

AST3(PAXXX),GSTY4(PAXXY)              it is used in diabetes 
KST4                                                     it is used in blood pressure
JST3                                                    it is never applied in oedema

I have match column of first file with second file and if similarty of symbol which are made up of capital alphabets is there then it shuld copy the sentence in front of it

The Expected out put is

AST3(PAXXX)                               it is used in diabetes
GSTY4(PAXXY)                            it is used in diabetes 
JST3                                            it is never applied in oedema

If it possible to remove the bracket then it will be good in expected output

AST3                              it is used in diabetes
GSTY4                         it is used in diabetes 
 JST3                                            it is never applied in oedema

---------- Post updated at 08:25 AM ---------- Previous update was at 02:37 AM ----------

sed 's/\(..*\)//' file2
or
sed 's/(..*)//' file2

If you don't mind getting rid of all of the whitespace between the 1st and 2nd fields in your second file when writing the output:

awk 'FNR==NR {wanted[NR] = $1; next}
 {      nf=split($1, f1, ",")
        for (i=1; i<=nf; i++) {
                sub("[(][^(]*[)]", "", f1)
                for (j in wanted) if (wanted[j] == f1) {
                        $1=f1
                        print
                }
        }
}' first_file second_file

produces

AST3 it is used in diabetes
GSTY4 it is used in diabetes
JST3 it is never applied in oedema

when given your two sample input files.

1 Like

Hi

Thanks for reply.

I checked regarding white space but there is space between every two words mentiioned in the 2 files and output is just same as inpu of second file there is no change in output and second input file

Kindly guide

In the second file you showed in the 1st message in this thread, you have somewhere between 10 and 60 spaces between columns. In the expected output, you showed lots of spaces between the 1st and 2nd columns (and a space at the front of the last line). The script I provided never puts a space at the start of an output line and always puts a single space between the drug name and the usage note.

If it is important for you to keep the same spacing between columns in the output that was present in the input, the script will be a more complex (and I believe the output would be harder to read).

Hi

I have checked actually in the second file I don't tthink there is problem related to spacing.

In actual file there are just 2 column of excel sheet in second file

And, In output also I do not want lots of spacing I just want output should be like second file. And, there should n't be any extra spacing.

Kindly check it and let me know any solution if possible

The output produced by the script I supplied in message #3 in this thread is:

AST3 it is used in diabetes
GSTY4 it is used in diabetes
JST3 it is never applied in oedema

The output you said you want in the updated first message in this thread is:

AST3                              it is used in diabetes
GSTY4                         it is used in diabetes 
 JST3                                            it is never applied in oedema

I assume that you can see that there is a space before JST3 in the third line and there are lots of spaces between the end of the first column and the it at the start of the second field in each of the three output lines.

Please tell me if the solution I provided earlier is sufficient to solve your problem. If it is not; please explain where the space at the start of line three of your output came from. And, if more than one space (or tab) is needed between columns one and two in the output, explain how we can determine how many spaces should be there. (Especially, explain how the number of spaces between fields is supposed to be determined when input lines are split and when the parenthetical elements are removed.)

1 Like