Remove duplicate records

Hi,

i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file.

INPUT FILE:>

CREATE a
(
TRIAL_CLIENT              NOT NULL VARCHAR2(60),
TRIAL_FUND                NOT NULL VARCHAR2(60),
LOCAL_ACC_NO              NOT NULL VARCHAR2(60),
TRIAL_BROKER              NOT NULL VARCHAR2(12),
CORR_ACC_NO               NOT NULL NUMBER(10),
CURRENCY                  NOT NULL VARCHAR2(3),
AS_OF_DATE                NOT NULL DATE,
COUNT_OUR_TRANSACTIONS             NUMBER,
SUM_OUR_POSITIONS                  NUMBER,
SUM_OUR_CASH_TXNS                  NUMBER,
SUM_OUR_TRANSACTIONS               NUMBER,
COUNT_BROKER_TRANSACTIONS          NUMBER,
SUM_BROKER_POSITIONS               NUMBER,
SUM_BROKER_CASH_TXNS               NUMBER,
SUM_BROKER_TRANSACTIONS            NUMBER,
SUM_OUR_CASH_BALS                  NUMBER,
SUM_OUR_UNREAL_BALS                NUMBER,
SUM_OUR_BALANCES                   NUMBER,
SUM_BROKER_CASH_BALS               NUMBER,
SUM_BROKER_UNREAL_BALS             NUMBER,
SUM_UNSETT_INT                     NUMBER,
SUM_OPEN_FWDS                      NUMBER,
SUM_BROKER_BALANCES                NUMBER,
NET_TRANSACTIONS                   NUMBER,
NET_BALANCES                       NUMBER,
TRIAL                              NUMBER,
CASH_ADJ                           NUMBER,
WO_AMT                             NUMBER,
ITEMS                              NUMBER
);

CREATE b
(
TRIAL_CLIENT               VARCHAR2(60) NOT NULL,
TRIAL_FUND                 VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO               VARCHAR2(60) NOT NULL,
TRIAL_BROKER               VARCHAR2(12) NOT NULL,
CORR_ACC_NO                NUMBER(10)   NOT NULL,
CURRENCY                   VARCHAR2(3)  NOT NULL,
AS_OF_DATE                 DATE         NOT NULL,
COUNT_OUR_TRANSACTIONS     NUMBER,
SUM_OUR_POSITIONS          NUMBER,
SUM_OUR_CASH_TXNS          NUMBER,
SUM_OUR_TRANSACTIONS       NUMBER,
COUNT_BROKER_TRANSACTIONS  NUMBER,
SUM_BROKER_POSITIONS       NUMBER,
SUM_BROKER_CASH_TXNS       NUMBER,
SUM_BROKER_TRANSACTIONS    NUMBER,
SUM_OUR_CASH_BALS          NUMBER,
SUM_OUR_UNREAL_BALS        NUMBER,
SUM_OUR_BALANCES           NUMBER,
SUM_BROKER_CASH_BALS       NUMBER,
SUM_BROKER_UNREAL_BALS     NUMBER,
SUM_UNSETT_INT             NUMBER,
SUM_OPEN_FWDS              NUMBER,
SUM_BROKER_BALANCES        NUMBER,
NET_TRANSACTIONS           NUMBER,
NET_BALANCES               NUMBER,
TRIAL                      NUMBER,
CASH_ADJ                   NUMBER,
WO_AMT                     NUMBER,
ITEMS                      NUMBER
);

CREATE c
(
TRIAL_CLIENT  VARCHAR2(60) NOT NULL,
TRIAL_FUND    VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO  VARCHAR2(60) NOT NULL,
TRIAL_BROKER  VARCHAR2(12) NOT NULL,
CORR_ACC_NO   NUMBER(10)   NOT NULL,
CURRENCY      VARCHAR2(3)  NOT NULL,
VALUE_DATE    DATE         NOT NULL
);

OUTPUT:

CREATE a
(
TRIAL_CLIENT               VARCHAR2(60) NOT NULL,
TRIAL_FUND                 VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO               VARCHAR2(60) NOT NULL,
TRIAL_BROKER               VARCHAR2(12) NOT NULL,
CORR_ACC_NO                NUMBER(10)   NOT NULL,
CURRENCY                   VARCHAR2(3)  NOT NULL,
AS_OF_DATE                 DATE         NOT NULL,
COUNT_OUR_TRANSACTIONS     NUMBER,
SUM_OUR_POSITIONS          NUMBER,
SUM_OUR_CASH_TXNS          NUMBER,
SUM_OUR_TRANSACTIONS       NUMBER,
COUNT_BROKER_TRANSACTIONS  NUMBER,
SUM_BROKER_POSITIONS       NUMBER,
SUM_BROKER_CASH_TXNS       NUMBER,
SUM_BROKER_TRANSACTIONS    NUMBER,
SUM_OUR_CASH_BALS          NUMBER,
SUM_OUR_UNREAL_BALS        NUMBER,
SUM_OUR_BALANCES           NUMBER,
SUM_BROKER_CASH_BALS       NUMBER,
SUM_BROKER_UNREAL_BALS     NUMBER,
SUM_UNSETT_INT             NUMBER,
SUM_OPEN_FWDS              NUMBER,
SUM_BROKER_BALANCES        NUMBER,
NET_TRANSACTIONS           NUMBER,
NET_BALANCES               NUMBER,
TRIAL                      NUMBER,
CASH_ADJ                   NUMBER,
WO_AMT                     NUMBER,
ITEMS                      NUMBER
);

CREATE c
(
TRIAL_CLIENT  VARCHAR2(60) NOT NULL,
TRIAL_FUND    VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO  VARCHAR2(60) NOT NULL,
TRIAL_BROKER  VARCHAR2(12) NOT NULL,
CORR_ACC_NO   NUMBER(10)   NOT NULL,
CURRENCY      VARCHAR2(3)  NOT NULL,
VALUE_DATE    DATE         NOT NULL
);

as you can see , based from the output file, the one with the TRIAL_CLIENT NOT NULL VARCHAR2(60), ... etc where remove from the output file.

Thanks,

And what code did you use to transform your sample input into your sample output?

The title of this thread says you want to delete duplicate records. What constitutes a record? (A line, a create x {...} where x is the same in both records, or what???)

You have shown us your input file and you have shown us what seems to be incorrect output that you're getting from your code. What output do you want?

am using bash shell below is just the part of the code i am using. the script needs a parameter for it to run.

Infile=$1
NotFinalOutFile=`echo $Infile | awk -F '.' '{print $1}'`

for files in `echo $Infile`
do
   sed 's/[ \t]*$//' $files | \
   sed 's/^$/)\n/' | egrep -vw 'Name|-' | \
   sed -e 's/desc/CREATE TABLE/;/CREATE / a\(' | \
   awk '{
         sub(/^\)/,"&;");                                       # Replace  ")" with ");"
         s=($0~/^[A-Z]/&& a~/^[A-Z]/ && !/"^CREATE "/)?",":"";  # IF this line and previous line start with "A-Z" and not "CREATE" set "s" to "," else set it to ""
         printf s"\n%s",                                        # Print this line "$0" using "s" as formating pluss new line
         a=$0}                                                      # Set a=this line
   END {
         print ""}'                                             # Print a new line
done >> $NotFinalOutFile.dat


---------- Post updated at 03:01 PM ---------- Previous update was at 02:48 PM ----------

actually, the script above just creates the create table statement.

---------- Post updated at 03:04 PM ---------- Previous update was at 03:01 PM ----------

below is the sample input file for the script:

desc a
Name         Null     Type
------------ -------- ------------
TRIAL_CLIENT NOT NULL VARCHAR2(60)
TRIAL_FUND   NOT NULL VARCHAR2(60)
LOCAL_ACC_NO NOT NULL VARCHAR2(60)
TRIAL_BROKER NOT NULL VARCHAR2(12)
CORR_ACC_NO  NOT NULL NUMBER(10)
CURRENCY     NOT NULL VARCHAR2(3)
VALUE_DATE   NOT NULL DATE

desc b
Name                      Null     Type
------------------------- -------- ------------
TRIAL_CLIENT              NOT NULL VARCHAR2(60)
TRIAL_FUND                NOT NULL VARCHAR2(60)
LOCAL_ACC_NO              NOT NULL VARCHAR2(60)
TRIAL_BROKER              NOT NULL VARCHAR2(12)
CORR_ACC_NO               NOT NULL NUMBER(10)
CURRENCY                  NOT NULL VARCHAR2(3)
AS_OF_DATE                NOT NULL DATE
SUM_OUR_CASH_TXNS                  NUMBER
SUM_OUR_POSITIONS                  NUMBER
COUNT_OUR_TRANSACTIONS             NUMBER
SUM_OUR_CASH_BALS                  NUMBER
SUM_OUR_UNREAL_BALS                NUMBER
SUM_BROKER_CASH_TXNS               NUMBER
SUM_BROKER_POSITIONS               NUMBER
COUNT_BROKER_TRANSACTIONS          NUMBER
SUM_BROKER_CASH_BALS               NUMBER
SUM_BROKER_UNREAL_BALS             NUMBER
SUM_UNSETT_INT                     NUMBER
SUM_OPEN_FWDS                      NUMBER
WO_AMT                             NUMBER

I repeat: