C cedilla Delimiter interpretation issue

eap714.com · August 17, 2018, 5:06am

Hi Folks,

I am trying to generate a file with the C Cedilla delimiter.

I have a file that uses the below DML (In Dev region, the file that i am trying to generate):
decimal("�") acct_id; and so and so for new columns

When I cat this file I get below output:

bankbtch@jackets:/prod/home10/data/serial/rtl_baseldw_us_card/rbuc_cr_bur/temp=> cat ori_test.dat | head -1
10045087403�07�-2�*2�*2�*2�*2�5�37630�40500�22000�20515�19�422�0�0�0�0�0�0�0�2�0�5�*2�38501�-2�-2�-2�-2��-2�0�1�*2�*2�*2�*2�*2�*2�*2�*2�0�*2�-2�*2�0�*2�*2�-2�0�-2�-2�-2�-2�-2�-2�*2�*2�*2�*2�*2�*2�3�-2

But when i vi the same file I get the below output:

 bankbtch@jackets:/prod/home10/data/serial/rtl_baseldw_us_card/rbuc_cr_bur/temp=> vi ori_test.dat
10045087403~G07~G-2~G*2~G*2~G*2~G*2~G5~G37630~G40500~G22000~G20515~G19~G422~G0~G0~G0~G0~G0~G0~G0~G2~G0~G5~G*2~G38501~G-2~G-2~G-2~G-2~G~G-2~G0~G1~G*2~G*2~G*2~G*2~G*2~G*2~G*2~G*2~G0~G*2~G-2~G*2~G0~G*2~G*2~G-2~G0~G-2~G-2~G-2~G-2~G-2~G-2~G*2~G*2~G*2

The same file uses the below DML in prod looking like:
decimal("�") acct_id; and so and so for new columns

When I cat this file I get below output:

bankbtch@jackets:/prod/home10/data/serial/rtl_baseldw_us_card/rbuc_cr_bur/temp=> cat ori_test.dat | head -1
bankbtch@jackets:/prod/home01/data/serial/rtl_baseldw_us_card/rbuc_misc/temp=> cat prod_test.dat_new_efx
11470733975328324500323212770004130003Y10320001NNNNNNNY002A8F588C159E82802199403T4002N0000000000010216931112210450000000000000010322100003120233017101693111227500010001991193105904500532805000000005328050000000061750032Y0000000000000000000413000000780004000000000000000000000000000000000000000010000000000000000000000000000017120010533845906020170700033352222222200000000201805200408319940000041NMORINSH HWYSTONY BROOKNY117905FOREST CREEK HIGHWAYAVESTONY BROOKNY11790HT1405830211014322435445993938661Y000NYYNNN00NNNNNN99999999900000020040999999999999999999991193002135009999999982017071113999999998500999999998201707111399999999800001995062002012002010000000000000001009991399800099999999999999999912079999999999999999200099999999999900005883918131099182322018-07-310297

But when i vi the same file I get the below output:

bankbtch@jackets:/prod/home10/data/serial/rtl_baseldw_us_card/rbuc_cr_bur/temp=> vi prod_test.dat_new_efx
11470733975328324500323212770004130003Y10320001NNNNNNNY002A8F588C159E82802199403T4002N0000000000010216931112210450000000000000010322100003120233017101693111227500010001991193105904500532805000000005328050000000061750032Y0000000000000000000413000000780004000000000000000000000000000000000000000010000000000000000000000000000017120010533845906020170700033352222222200000000201805200408319940000041NMORINSH HWYSTONY BROOKNY117905FOREST CREEK HIGHWAYAVESTONY BROOKNY11790HT1405830211014322435445993938661Y000NYYNNN00NNNNNN99999999900000020040999999999999999999991193002135009999999982017071113999999998500999999998201707111399999999800001995062002012002010000000000000001009991399800099999999999999999912079999999999999999200099999999999900005883918131099182322018-07-310297

Need to find a way to make the Dev file consistent with the prod file eliminating ~G from the delimiter content.

Currently i am passing UC007 as the unicode delimiter value.

Please let me know if anyone knows the fix for the same.

RudiC · August 17, 2018, 6:13am

Welcome to the forum.

Please become accustomed to provide decent context info of your problem.

It is always helpful to carefully and detailedly phrase a request, and to support it with system info like OS and shell, related environment (variables, directory structures, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two including your own attempts at a solution, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.

Some additional info is necessary:
Where do those files come from (HW, OS, DB-tool)?
What is those files' character encoding?

You could create a temp file with a "normal" or "standard" delimiter like ; or , , and then transliterate those with a text tool like tr , sed , or awk .

eap714.com · August 17, 2018, 7:31am

Hey,

Thanks for the response.

It is Unix OS and korn shell. The source is Hadoop EMR and this file is generated on the S3 lake.

We run a script where we input the delimiter to the script and it generates the output file accordingly. we tried providing both UC007 and uc007, it does not help.

Request is to provide the encoding that we need to pass to my script to generate just (C cedilla) instead of ~G (we don't need the extra ~G)

Please let me know for any questions

RudiC · August 17, 2018, 7:57am

I'm sorry I can't follow. The unicode character for � is 0xC7 . Don't know if that helps you.