Partial content greping into a 3rd file

Kanja · February 4, 2014, 2:42pm

Hi,

I do have couple of files in folder. The names of each of the files have a pattern.

B_A17_A17_1T.txt
B_A17_A17_2T.txt
B_A17_A17_3T.txt
B_A17_A17_7T.txt
.....
.....
B_A17_A17_45T.txt

Each of the above files have the same pattern of data with 4 columns and have an header for the last 3 columns.

vi B_A17_A17_1T.txt

  A1 A2 A3
1 0 0 0
2 0 1 0.571
3 0.010 6 3.333
4 0 0 0
5 0 0 0
6 0 0 0
7 0.00527 1 1.667
8 0 0 0
9 0 0 0
10 0.0036 1 5.556
11 0 1 1.299
12 0 0 NaN

I also have another single file called file1.txt, that has a single column with 19 rows in it.

vi file1.txt
A
D
F
G
E
T
G
E
W
T
T
W

I would like to get a third file by greping contents from each of the above files in the following manner.
for example, if we take the file B_A17_A17_1T.txt, I would like to grep part of the file name, A17_1T and print it 12 times as the first column. Then print the content of file1.txt as the second column and then print or grep the third column in the file B_A17_A17_1T.txt. I don't need the headers of the file B_A17_A17_1T.txt.

The sample output file will be a tab delimited file:

A17_1T A 0
A17_1T D 1
A17_1T F 6
A17_1T G 0
A17_1T E 0
A17_1T T 0
A17_1T G 1
A17_1T E 0
A17_1T W 0
A17_1T T 1
A17_1T T 1
A17_1T W 0

It would be great if I could do this merging and greping of contents for atleast one file and I could repeat the rest. Is awk best for this?

bartus11 · February 4, 2014, 2:56pm

Try (in bash):

file=B_A17_A17_1T.txt
id=${file#*_};id=${id#*_};id=${id%.*}
paste -d" " <(echo; cat file1.txt) $file | awk 'NR>1{print id, $1, $4}' id=$id

vgersh99 · February 4, 2014, 3:05pm

awk -f kan.awk file1.txt B_A17_A17_1T.txt
where kan.awk is:

function rindex(str,c)
{
  return match(str,"\\" c "[^\\" c "]*$")? RSTART : 0
}

FNR==NR {
  f2[FNR]=$0
  next
}
FNR==1 {f=substr(FILENAME, 1, rindex(FILENAME, "_")-1)}
NF>3{ print f, f2[FNR-1], $3}

RudiC · February 4, 2014, 3:14pm

Try also:

awk     'FNR==NR        {T[NR]=$1;next}
         FNR==1         {gsub(/^._[^_]*_/,"",FILENAME); next}
                        {print FILENAME, T[FNR-1], $3}
        ' OFS="\t" file B*

Kanja · February 4, 2014, 3:20pm

Thanks vgersh 99. It worked fine. but I would like only part of the file name as the first column. A17_1T. At the present, the awk code will give almost all the file name.

RudiC - where do I specify the file names?

Thanks

Chubler_XL · February 4, 2014, 3:21pm

Another solution using awk:

awk 'FNR==NR{A[NR]=$0;next}
$1 in A{print substr(FILENAME,3,3) substr(FILENAME,10,3),A[$1],$3}' OFS='\t' file1.txt B_A17_A17_1T.txt

Kanja · February 4, 2014, 3:43pm

Thanks Chubler_XL. If the file name, B_A17_A17_1T.txt has more character, where do I change the code?

for example instead of B_A17_A17_1T.txt if the file name was Bvtr_A17_A17_1T.txt. Please let me know where to chnage the code. I believe it is in this part of the code

substr(FILENAME,3,3)

RudiC · February 4, 2014, 3:44pm

They are "B*", your one column file is represented by "file"

Kanja · February 4, 2014, 3:44pm

Also the first column should have only A17_1T

vgersh99 · February 4, 2014, 4:05pm

FNR==NR {
  f2[FNR]=$0
  next
}
FNR==1 {n=split(FILENAME,t,"[_.]");f=t[n-2] "_" t[n-1];next}
# or
# FNR==1 {match(FILENAME,"_[^_][^_]*_[^_][^_]*[.]"); f=substr(FILENAME,RSTART+1,RLENGTH-2);next}
{ print f, f2[FNR-1], $3}

Chubler_XL · February 4, 2014, 5:01pm

kanja:

Thanks Chubler_XL. If the file name, B_A17_A17_1T.txt has more character, where do I change the code?

for example instead of B_A17_A17_1T.txt if the file name was Bvtr_A17_A17_1T.txt. Please let me know where to chnage the code. I believe it is in this part of the code
substr(FILENAME,3,3)

Yes, you are correct the substr statement you quoted takes 3 characters starting at position 3 of the filename and substr(FILENAME,10,3 takes 3 characters starting from position 10. As the two substr are next to each other they are joined together.

I'm not sure this approach is going to work out for you, the approach taken by the other solutions posted here involves matching on the Underscore(_) character and might be more flexable depending on your actual filenames.