Concatenating column values with unique id into single row

jsaravana · August 17, 2009, 3:29am

Hi,

I have a table in Db2 with data say

id_1 phase1
id_1 phase2
id_1 phase3
id_2 phase1
id_2 phase2

I need to concatenate the values like
id_1 phase1,phase2,phase3
id_2 phase1,phase2

I tried recursive query but in vain as the length of string to be concatenated in quite long.
How can this be implemented using shell scripts effectively.

Thanks

---------- Post updated at 02:29 AM ---------- Previous update was at 02:11 AM ----------

I also need to insert the output values into new table

chipcmc · August 17, 2009, 5:17am

$more file9
id_1 phase1
id_1 phase2
id_1 phase3
id_2 phase1
id_2 phase2

$awk '
{
          if (vector[$1])
              vector[$1]=vector[$1]","$2
      else
              vector[$1]=$2
}
END {
        for (palabra in vector)
                print palabra" "vector[palabra]
}       
' file9

Result:
id_1 phase1,phase2,phase3
id_2 phase1,phase2

Tytalus · August 17, 2009, 5:17am

using awk:

#  nawk '{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2}END{for (i in t){print i,t}}' infile
id_2 phase1,phase2
id_1 phase1,phase2,phase3

HTH

jsaravana · August 17, 2009, 7:53am

Thanks ,

The below is the sample data.
2U_1_2 1-KCER
2U_1_2 1-KCER
2U_1_2 1-SKBY
2U_1_2 1-SKBY
2U_1_2 1-SKBY
2U_1_2 Post delivery
D_1_2 P05
D_1_2 P06
D_1_2 P07
D_1_2 Post delivery
K_1_3 Generate Programs
K_1_3 Parse Programs
K_1_3 Auctions Java
K_1_3 Registration
K_1_3 End-to-End Integration
K_1_3 Post delivery
D_18_1 P05
D_18_1 P06
D_18_1 P07
D_18_1 Post delivery

When I perform the awk command on this data the result is not as expected,
D_1_2 P05,P06,P07,Post
2U_1_2 1-KCER,1-KCER,1-SKBY,1-SKBY,1-SKBY,Post
D_18_1 P05,P06,P07,Post
K_1_3 Generate,Parse,Auctions,Registration,End-to-End,Post

Could you please give a solution for this, Also, I need to insert these values into another table, how to establish it

malcomex999 · August 17, 2009, 8:31am

i guess it is cos u r having more columns comparing to the original sample u posted so just a little modification to the previous post, and if u have more columns, u have to add them

 
nawk '{t[$1]?t[$1]=t[$1]","$2" "$3:t[$1]=$2" "$3}END{for (i in t){print i,t}}' inputfile

jsaravana · August 18, 2009, 1:35am

Hi,
Thanks a many.

I was not aware of nawk commands so I gave few samples.Now I understood the concept, atleast at basic level.
There are around 3500 unique phase fields. Each may have 3 words or even 8 words and go on eg: .
Also some has characters like '&,-' in between words like 'Testing & Delivery' In that case I want whole text instead of 'Testing &'

Please let me know how it can be done.

danmero · August 18, 2009, 2:52am

This should work:

# awk '{n=$1;sub($1" ","",$0);t[n]?t[n]=t[n]","$0:t[n]=$0}END{for(i in t)print i,t}' f2
K_1_3 Generate Programs,Parse Programs,Auctions Java,Registration,End-to-End Integration,Post delivery
D_1_2 P05,P06,P07,Post delivery
2U_1_2 1-KCER,1-KCER,1-SKBY,1-SKBY,1-SKBY,Post delivery
D_18_1 P05,P06,P07,Post delivery

jsaravana · August 18, 2009, 3:37am

Hi

This works fine. But there is more space between the words
K_25_5 Analysis ,
Coding , Post deli
very phase
DU0182_17_42 P05 ,
P06 , P07
, P08
, Post delivery phase

danmero · August 18, 2009, 4:31am

To keep the forums high quality for all users, please take the time to format your posts correctly.

Use Code Tags when you post any code or data samples so others can easily read your code.
You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags and by hand.)
Avoid adding color or different fonts and font size to your posts.
Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
Reply With Quote

jsaravana · August 18, 2009, 4:55am

Thanks for the advice Danmero.. I will surely follow it in future.

I see that you have edited the code posted. But that again gives me the output with space issues. Where exactly was the code changed. When I compared both codes I was not able to find the change.

---------- Post updated at 03:55 AM ---------- Previous update was at 03:52 AM ----------

KU0506_25_7          Analysis                                                                                            ,
      Coding                                                                                              ,         Post deli
very phase
DU0182_17_44         Post delivery phase                                                                                 ,
     P05                                                                                                 ,        P06

This is how output looks. Used # this time

jsaravana · August 20, 2009, 12:45am

Hi

I was not able to solve the space issue could you please help

danmero · August 20, 2009, 6:40am

We need both , input sample and output required to be able to find a solution.

jsaravana · August 20, 2009, 7:46am

Please find the sample below

ID               PHASE
-------------------------------------------
D1_5_18          IM propagation and review
D1_5_18          System Testing - Process
D1_5_18          System Test - Process
D1_5_18          IPS Module Build/Packaging
D1_5_18          Implementation
D1_5_18          PVT Regression 
D1_5_18          Post delivery phase
D1_5_18          MTP
D1_5_18          HLTD
D1_5_18          Detailed Design
D1_5_18          IPTA
D1_5_18          SIT Test Cases
D1_5_18          Test Environment Support
D1_5_18          Coding
D1_5_18          Unit Test Case Preperation
D1_5_18          Code testing & delivery

The required output is

del_id       phase
D1_5_18   IM propagation and review,System Testing - Process,System Test - Process,IPS Module Build/Packaging,Implementation,PVT Regression,Post delivery phase,MTP,HLTD,Detailed Design,IPTA,SIT Test Cases,Test Environment Support,Coding,Unit Test Case Preperation,Code testing & delivery

thanhdat · August 20, 2009, 8:37am

Reuse the code of Tytalus with a little modification:

awk -F "          " '{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2}END{for (i in t){print i,t}}'  yourfile.txt

danmero · August 20, 2009, 5:54pm

awk -F "          " 'NR>2{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2;next}1 END{for (i in t){print i FS t}}' file

jsaravana · August 21, 2009, 8:17am

Thanks thanhdat, but for few cases the space was less between 2 fields.
I used Danmero code and it was resolved

---------- Post updated at 07:17 AM ---------- Previous update was at 06:17 AM ----------

Thanks Danmero!

The output is as desired

jsaravana · August 26, 2009, 3:29am

Hi Danmero,

I have another need based on this output.
Instead of space between 2 columns I need any delimeted like "|" character.
Because I need to insert these values into another table so it will be useful.
Is that possible?

Thanks in advance!

---------- Post updated at 02:29 AM ---------- Previous update was at 01:49 AM ----------

I have edited the command as below:

awk -F " " 'NR>2{t[$1]?t[$1]=t[$1]","$2:t[$1]=$2;next}1 END{for (i in t){print i"|"FS t}}' prod_data.out > res1.out

I got the output fine.

I just noticed that the first few values of prod_data file have not got concatenated

why do they get skipped?

danmero · August 26, 2009, 10:00am

The code will skip(print only) the first 2 lines base on your data sample from http://www.unix.com/shell-programming-scripting/117028-concatenating-column-values-unique-id-into-single-row-2.html\#post302345773