Merge records based on multiple columns

Hi,

I have a file with 16 columns and out of these 16 columns 14 are key columns, 15 th is order column and 16th column is having information. I need to concate the 16th column based on value of 1-14th column as key in order of 15th column. Here are the example file

Input File (multiple records like this)

"A1"	"A2"	"A3"	A4"	"A5"	"A6"	"A7"	"A8"	"A9"	"A10"	"A11"	"A12"	"A13"	"A14"	"0001"	"Once Upon A Time"
"A1"	"A2"	"A3"	A4"	"A5"	"A6"	"A7"	"A8"	"A9"	"A10"	"A11"	"A12"	"A13"	"A14"	"0003"	"He was very"
"A1"	"A2"	"A3"	A4"	"A5"	"A6"	"A7"	"A8"	"A9"	"A10"	"A11"	"A12"	"A13"	"A14"	"0002"	"There was a crow"
"A1"	"A2"	"A3"	A4"	"A5"	"A6"	"A7"	"A8"	"A9"	"A10"	"A11"	"A12"	"A13"	"A14"	"0004"	"Thirsty"

Required Output

"A1"	"A2"	"A3"	A4"	"A5"	"A6"	"A7"	"A8"	"A9"	"A10"	"A11"	"A12"	"A13"	"A14"	"Once Upon A Time There was a crow He was very Thirsty"

So a few questions to start:-

  • What have you tried so far?
  • What errors/output are out getting?
  • What OS and version are you running on?
  • What are your preferred tools (ksh, bash, awk, perl, etc.)

Most importantly, what have you tried so far?

We all give suggestions freely, but you need to make it easy for us to help you and show that you have made an effort rather than just hoping for a reply with an fully tailored solution. Depending on your set-up and preferences, there may be various ways to approach this.

Robin

Hi,

I am not an expert shell scripting programmer ,but I have tried few options

1) For loop by reading each row and doing concatenated grep to find the matching column

2) I have also gone through the link (on this website itself )given below and tried to change given solution as per requirement

shell-programming-and-scripting
208027-merge-multiple-lines-same-file-common-key-using-awk.html

but ordering on the basis of 15th column is still an issue and also I am getting double quotes in concatenation of last field

I am ok with any solution being in KSH,BASH or AWK. Perl and any other language could be the last option but not preferable for now.

I am not sure about OS version too as I am out of my office and can't check

let me know if these informations are helpful

import re
hash={}
with open("a.txt") as file:
 for line in file:
  arr=re.findall('"[^"]+"',line.replace("\n",""))
  key=" ".join(arr[0:13])
  if key not in hash:
    hash[key]=[]
  hash[key].append(arr[14:])   
  
for i in hash:
    print(i,end=" ")
    print('"'," ".join(j[1].replace('"',"") for j in sorted(hash,key=lambda x: int(x[0].replace('"',"")))),'"',sep="")