Not sure, it worked perfectly fine for me as per POST#2 only, so there could be 2 possibilities in my point of view.
I- Either there could be carriage characters present into your Input_file, you could try with command cat -v Input_file , if you see carriage return characters then you could use command tr -d '\r' < Input_file > temp_file && mv temp_file Input_file .
II- Second option could be in case on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .
Did you give the input files in the correct order?
It is important to first process the "mapping file", then the file that corresponds to the output file.
And the FS="_" is to be set after the "mapping file".
Thanks Ravi!
There was an issue with the weird carriage return in one of the files. It worked fine now!
I understand you changed the order of the files, and you used FS="_" for the second file, and hope this respond to what MadeinGermany reminded me.
But, I still have the question why the part A[1]=$0 in my script does not work.
So in your code why A[$1] is not working because when Input_file1 is being read then $1 will be whole line eg--> S00739A_ACAGTG_L001_R1.fq.gz and when Input_file2 is being read then $1 will be S00739A , so that is why A[$1]'s value will always be NULL and it will not print anything over there, kindly do let me know if you have any queries on same.
I meant A[1] = $0 for the mapping part as I thought A is the array from split(). So that later A[$2] will get what I want by $2 as the key/subscript of the array. What did I miss?
So here A[1] means array named A whose index is 1(digit one) and value is A[1]=current line of Input_file1 . So now when you try to print A[$2] or A[$1] then it means it will look for $2/$1 's value from current line from Input_file2 into array A (eg--> A[S00739A] ) which is NOT present at all in array A . Thus it will NOT print anything then. Kindly do let me know if I was NOT clear, will try to explain more on same then.
Yes, A is the receiving array of the split() function. It has index values 1 .. 4 (which never will match $1 nor $2 in file2) and is overwritten for every line read from the input file, so after reading the entire file1 it will hold the last line in A[1] and the residual fields in A[2] till A[4], never to be matched by following records from file2.
Plus, with file2 being the last file worked upon, the output - should it be generated at all - would have four lines only.
Are you aware that you don't get your desired output from post#1 with your approach in post#12?
It would yield four lines only, and all R1.fq.gz would have disappeared
Thanks RudiC!
I was too excited to notice the problem, which is a serious bug for sure.
Seems the files order must be changed because of the similarity of the _R1/R2.fq.gz file names.
It is possible, but that solution wouldn't lend itself naturally. You'd need to suck in the entire first file - which I presume is larger - into RAM and either store it in two arrays (key and entire record) or have an algorithm search the key later. Then, for every line in file two, you'd need to run through ALL keys to find ALL occurrences of the key for printout. All that can become lengthy for huge files.