Extract specific lines based on another file

alanmathew84 · November 8, 2015, 2:04am

I have a folder containing text files. I need to extract specific lines from the files of this folder based on another file input.txt. How can I do this with awk/sed?

file1
ARG	81.9	8	81.9	0
LEU	27.1	9	27.1	0
PHE	.0	10	.0	0
ASP	59.8	11	59.8	0
ASN	27.6	12	27.6	0
ALA	.0	13	.0	0
MET	13.1	14	13.1	0
LEU	66.8	15	66.8	0
ARG	21.0	16	21.0	0

file2
SER	57.9	43	57.9	0
PHE	2.4	44	2.4	0
LEU	39.4	45	1.0	38.4
GLN	83.9	46	40.8	43.1
ASN	46.9	47	46.9	0
PRO	47.1	48	4.8	42.3
GLN	86.1	49	83.2	2.9
THR	33.2	50	33.2	0
SER	10.2	51	.9	9.3

input.txt

*file1
10
16
*file2
43
44
49

Desired output

file1
PHE	.0	10	.0	0
ARG	21.0	16	21.0	0

file2
SER	57.9	43	57.9	0
PHE	2.4	44	2.4	0
GLN	86.1	49	83.2	2.9

RudiC · November 8, 2015, 3:56am

Any attempts from your side?

---------- Post updated at 09:56 ---------- Previous update was at 09:25 ----------

Howsoever, try

awk '
FNR==NR         {if (/^\*/)     {TFN = substr ($0,2)
                                 SAM[TFN] =  "-"
                                 next
                                }
                 SAM[TFN] = SAM[TFN] $0 "-"
                 next
                }

SAM[FILENAME] ~ "-" $3 "-"      {print > FILENAME ".res"}

' input  file1 file2
cf *.res
file1.res:
PHE      .0    10    .0      0
ARG    21.0    16    21.0    0
file2.res:
SER    57.9    43    57.9    0
PHE    2.4     44    2.4     0
GLN    86.1    49    83.2    2.9

looney · November 8, 2015, 12:45pm

Hi Rudic,
Could you please explain the code. Not sure what i got is all perfect.
FNR==NR ## just compare the number of records of two files. While reading first file the condition would be true. So it will jump to just next block.
{if (/^\*/) ## search for pattern starting with asterisk , that i found no where so what is the purpose. ?
{TFN = substr ($0,2) ## for all fields starting 2 characters will be assigned to variable TFN
SAM[TFN] = "-" ## Associative array would be declared and assiged value "-" to them.
SAM[TFN] = SAM[TFN] $0 "-" ## if the IF block become false then add "-" after each record.

Rest all going over head.

Aia · November 8, 2015, 1:47pm

@looney

awk '
# start of code block for only first file at command line
FNR==NR         {if (/^\*/)     {TFN = substr ($0,2) # remove the * from the file name we want to structure, save in a Temporal File Name variable
                                 SAM[TFN] =  "-" # start by adding a marker ("-") for separation identification
                                 next   # skip to next line of first file at command line, ignore the remaining code blocks
                                }
                 # this expression is for the lines that contains not a filename denotation
                 SAM[TFN] = SAM[TFN] $0 "-" # append the whole record to the current id key; followed by separator markers 
                 next # skip to next line in same file, ignore the rest
                }
# end of block for first file at command line

# the following block applies only to the rest of the files in the command line, not the first
SAM[FILENAME] ~ "-" $3 "-"      {print > FILENAME ".res"} # look for the FILENAME (current file being processed), in the previously built data structure 
# and see if it can match the pattern created by the concatenation of "-" $3 "-" (example: "-10-" ); if it does send the current $0 to the file FILENAME.res

Scrutinizer · November 8, 2015, 1:55pm

Another one:

awk 'NR==FNR{if(/^\*/) f=substr($1,2); else A[f,$1]; next} (FILENAME,$3) in A {print>(FILENAME ".res")}' input.txt file[12]

--
Note:

Use parentheses around the filename concatenation, or most awks will protest with a syntax error.

{print > (FILENAME ".res")}

RudiC · November 8, 2015, 2:21pm

looney:

Hi Rudic,
Could you please explain the code. Not sure what i got is all perfect.
FNR==NR ## just compare the number of records of two files. While reading first file the condition would be true. So it will jump to just next block. <--- for the second ++ files
{if (/^\*/) ## search for pattern starting with asterisk , that i found no where so what is the purpose. ? <-- look into spec for input.txt
{TFN = substr ($0,2) ## for all fields starting 2 characters will be assigned to variable TFN <-- just for the line with the leading *; shortcut for substr ($0, 2, rest of $0) := filename to which comparison is to be applied
SAM[TFN] = "-" ## Associative array would be declared and assiged value "-" to them. <-- Yes, as a starting point; additional elements to be added
SAM[TFN] = SAM[TFN] $0 "-" ## if the IF block become false then add "-" after each record. <-- concatenate the fields into SAM separated by "-"

Rest all going over head.