Concatenating column values with unique id into single row

Hi,

I have a table in Db2 with data say

id_1 phase1
id_1 phase2
id_1 phase3
id_2 phase1
id_2 phase2

I need to concatenate the values like
id_1 phase1,phase2,phase3
id_2 phase1,phase2

I tried recursive query but in vain as the length of string to be concatenated in quite long.
How can this be implemented using shell scripts effectively.

Thanks

---------- Post updated at 02:29 AM ---------- Previous update was at 02:11 AM ----------

I also need to insert the output values into new table

$more file9
id_1 phase1
id_1 phase2
id_1 phase3
id_2 phase1
id_2 phase2

$awk '
{
          if (vector[$1])
              vector[$1]=vector[$1]","$2
      else
              vector[$1]=$2
}
END {
        for (palabra in vector)
                print palabra" "vector[palabra]
}       
' file9

Result:
id_1 phase1,phase2,phase3
id_2 phase1,phase2


using awk:

#  nawk '{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2}END{for (i in t){print i,t}}' infile
id_2 phase1,phase2
id_1 phase1,phase2,phase3

HTH

Thanks ,

The below is the sample data.
2U_1_2 1-KCER
2U_1_2 1-KCER
2U_1_2 1-SKBY
2U_1_2 1-SKBY
2U_1_2 1-SKBY
2U_1_2 Post delivery
D_1_2 P05
D_1_2 P06
D_1_2 P07
D_1_2 Post delivery
K_1_3 Generate Programs
K_1_3 Parse Programs
K_1_3 Auctions Java
K_1_3 Registration
K_1_3 End-to-End Integration
K_1_3 Post delivery
D_18_1 P05
D_18_1 P06
D_18_1 P07
D_18_1 Post delivery

When I perform the awk command on this data the result is not as expected,
D_1_2 P05,P06,P07,Post
2U_1_2 1-KCER,1-KCER,1-SKBY,1-SKBY,1-SKBY,Post
D_18_1 P05,P06,P07,Post
K_1_3 Generate,Parse,Auctions,Registration,End-to-End,Post

Could you please give a solution for this, Also, I need to insert these values into another table, how to establish it

i guess it is cos u r having more columns comparing to the original sample u posted so just a little modification to the previous post, and if u have more columns, u have to add them

 
nawk '{t[$1]?t[$1]=t[$1]","$2" "$3:t[$1]=$2" "$3}END{for (i in t){print i,t}}' inputfile

Hi,
Thanks a many.

I was not aware of nawk commands so I gave few samples.Now I understood the concept, atleast at basic level.
There are around 3500 unique phase fields. Each may have 3 words or even 8 words and go on eg: .
Also some has characters like '&,-' in between words like 'Testing & Delivery' In that case I want whole text instead of 'Testing &'

Please let me know how it can be done.

This should work:

# awk '{n=$1;sub($1" ","",$0);t[n]?t[n]=t[n]","$0:t[n]=$0}END{for(i in t)print i,t}' f2
K_1_3 Generate Programs,Parse Programs,Auctions Java,Registration,End-to-End Integration,Post delivery
D_1_2 P05,P06,P07,Post delivery
2U_1_2 1-KCER,1-KCER,1-SKBY,1-SKBY,1-SKBY,Post delivery
D_18_1 P05,P06,P07,Post delivery

Hi

This works fine. But there is more space between the words
K_25_5 Analysis ,
Coding , Post deli
very phase
DU0182_17_42 P05 ,
P06 , P07
, P08
, Post delivery phase

To keep the forums high quality for all users, please take the time to format your posts correctly.

  1. Use Code Tags when you post any code or data samples so others can easily read your code.
    You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags and by hand.)
  2. Avoid adding color or different fonts and font size to your posts.
    Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
  3. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
Reply With Quote

Thanks for the advice Danmero.. I will surely follow it in future.

I see that you have edited the code posted. But that again gives me the output with space issues. Where exactly was the code changed. When I compared both codes I was not able to find the change.

---------- Post updated at 03:55 AM ---------- Previous update was at 03:52 AM ----------

KU0506_25_7          Analysis                                                                                            ,
      Coding                                                                                              ,         Post deli
very phase
DU0182_17_44         Post delivery phase                                                                                 ,
     P05                                                                                                 ,        P06

This is how output looks. Used # this time :slight_smile:

Hi

I was not able to solve the space issue could you please help

We need both , input sample and output required to be able to find a solution.

Please find the sample below

ID               PHASE
-------------------------------------------
D1_5_18          IM propagation and review
D1_5_18          System Testing - Process
D1_5_18          System Test - Process
D1_5_18          IPS Module Build/Packaging
D1_5_18          Implementation
D1_5_18          PVT Regression 
D1_5_18          Post delivery phase
D1_5_18          MTP
D1_5_18          HLTD
D1_5_18          Detailed Design
D1_5_18          IPTA
D1_5_18          SIT Test Cases
D1_5_18          Test Environment Support
D1_5_18          Coding
D1_5_18          Unit Test Case Preperation
D1_5_18          Code testing & delivery

The required output is

del_id       phase
D1_5_18   IM propagation and review,System Testing - Process,System Test - Process,IPS Module Build/Packaging,Implementation,PVT Regression,Post delivery phase,MTP,HLTD,Detailed Design,IPTA,SIT Test Cases,Test Environment Support,Coding,Unit Test Case Preperation,Code testing & delivery

Reuse the code of Tytalus with a little modification:

awk -F "          " '{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2}END{for (i in t){print i,t}}'  yourfile.txt 
awk -F "          " 'NR>2{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2;next}1 END{for (i in t){print i FS t}}' file

Thanks thanhdat, but for few cases the space was less between 2 fields.
I used Danmero code and it was resolved

---------- Post updated at 07:17 AM ---------- Previous update was at 06:17 AM ----------

Thanks Danmero!

The output is as desired :b:

Hi Danmero,

I have another need based on this output.
Instead of space between 2 columns I need any delimeted like "|" character.
Because I need to insert these values into another table so it will be useful.
Is that possible?

Thanks in advance!

---------- Post updated at 02:29 AM ---------- Previous update was at 01:49 AM ----------

I have edited the command as below:

awk -F " " 'NR>2{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2;next}1 END{for (i in t){print i"|"FS t}}' prod_data.out > res1.out

I got the output fine.

I just noticed that the first few values of prod_data file have not got concatenated

why do they get skipped?

The code will skip(print only) the first 2 lines base on your data sample from http://www.unix.com/shell-programming-scripting/117028-concatenating-column-values-unique-id-into-single-row-2.html\#post302345773