There can be any number of matching rows (16 max at the moment) as there seems to be a lot of duplication but I only need to join the first 4 rows as they appear on the file. (I have at the moment deleted the duplicates, but it seems that the duplication is actually necessary in rare cases :rolleyes:)
Thanks so much in advance for any help with this!
This hasn't done the trick for me though. The file that I'm working on is fixed length and contains both text and numerics throughout. The variables that I am matching on have 7 digits followed by 1/2 characters, if there is no second character this is a space.
I was guessing that the file you posted is not your real file you work upon. Have modified the command to match only the numerics at the beginning between the lines 1 and 2,3 and 4 etc..If this does not work post your sample data of the real file.
sed '/^[0-9]\+/{N;s/\(^[0-9]\+\)\(.*\)\n\(\1.*\)/\1\2\3/}' inputfile > outfile
Did you check the line count after running the sed command..? I guess since the posted above 2 lines are lengthy, it appears as if they are not joined. Pls check again.
From my test file, I should have 19 lines after the merge. I have 23. Maybe I would be best to use AWK to substitute the new lines based on the two conditions?
This file contains records matched on column 1 and column 4.
There can be up to four rows for each record, these rows appear sequentially on the file. However, duplicate rows can appear in the file also, so I would like to only join up to four rows and then delete duplicates after the join (as occasionally 2/3 of the four rows are duplicates of each other validly).
Where column 1 and column 4 are the same, I want to join the lines together (really I just want to add the different data (ie. columns 8 and 10) to the end of the first row.