DB2 - Performance Issue using MERGE option

Perlbaby · June 4, 2018, 5:39am

Dear Team,
I am using DB2 v10.5 and trying to load huge data using MERGE option ( 10-12 Million) using below query. Basically it loads data from staging to target . Staging table schemaname.Customer_Staging has 12 Million records . Runs for 25 Mins for just select query. But while doing merge (first time load) , it has to insert all records and below process is getting hanged after running 1.5-2 hrs (With No inserts)

   MERGE INTO schemaname.Customer_Table A
    USING ( SELECT DISTINCT B.Name , B.CUST_ID, B.ORG_ID from schemaname.Customer_Staging B 
    WHERE (B.Name is not null and length(B.Name)>0) and (B.CUST_ID is not null and length(B.CUST_ID)>0) and (B.ORG_ID is NOT null and length(B.ORG_ID) > 0)) B
     ON B.Name = A.Name and B.CUST_ID=A.CUST_ID and B.ORG_ID=A.ORG_ID and A.Load_dt is null
    WHEN NOT MATCHED THEN INSERT (Name,CUST_ID,Load_dt,ORG_ID)  
    VALUES(B.Name,B.CUST_ID,current timestamp - current timeZone,B.ORG_ID);

DDL columns for both staging and Target is same .
Included Unique index for three columns Name,CUST_ID,ORG_ID using

ALLOW REVERSE SCANS PAGE SPLIT SYMMETRIC option

Please can someone help how to tweak this query or use best approach.
any help appreciated
Thanks

rbatte1 · June 4, 2018, 7:03am

I expect that this is doing what you asked, and although it does seem a sensible query, it is not being very practical. I think that your request is going to build some huge temporary tables to work out the various clauses before applying any inserts/updates. You may even fill your temporary table tablespace before your MERGE completes this step.

You might have some success with indices though. What indices do you have on table schemaname.Customer_Staging and schemaname.Customer_Staging?

If you don't have a single index for the three columns you mention in the SELECT DISTINCT , then you might do a full table scan of schemaname.Customer_Staging.
If you don't have a single index for all the columns in the whole query (e.g A.Name, B.Name, A.Load_dt etc. ) then you might do a full table scan of the appropriate table.

Any reason to do a full table scan that these scales will be bad. An index might take a while to build and needs to have space, but then your MERGE should run better. I've had something that took me 20 minutes to build an index then the process I wanted ran in about an hour, after which I dropped the index again.because it was a one-off report. The run without the index was abandoned after over 22 hours (someone set if off and went home)

Is there some sort of query profiler you can use to consider this? MSSQLServer has one, Oracle has one so I'm sure that DB2 will have one, but I've not ever used it. You need to avoid a full table scan when dealing with this volume of data.

I hope that this helps,
Robin

Perlbaby · June 6, 2018, 3:56am

Hi Robin
First Thing : Thank you for looking into it and providing valid suggestions.

Made few changes :
1.I have removed distinct in the select query.
2.I already have indexes for both tables for required columns as below.

--staging 
CREATE UNIQUE INDEX schemaname.PK_Customer_Staging
    ON schemaname.Customer_Staging(Name ,CUST_ID,ORG_ID)
ALLOW REVERSE SCANS PAGE SPLIT SYMMETRIC

--target
CREATE UNIQUE INDEX schemaname.PK_Customer_Table
    ON schemaname.Customer_Table(Name ,CUST_ID,ORG_ID)
ALLOW REVERSE SCANS PAGE SPLIT SYMMETRIC

3.I observed my db2 uses MAX_LOG => 32 and LOGPRIMARY is 12(No way to extend this as its max provided by tool client ) and do get transaction log space error while loading first time since no data exists in target table.

Is there any way to split the logic based on row count and insert data in two parts . Like First 5 Million , then 5=M to Max count .
Or any other approach to load huge 8 Million data . Any help appreciated.