Convert variable length record to fixed length

Hi Team,

I have an issue to split the file which is having special chracter(German Char) using awk command.

I have a different length records in a file. I am separating the files based on the length using awk command.

The command is working fine if the record is not having any special(German) characters and it's not working if a record contains German character.

Below is the example command which we are using to separate the file based on the length

awk '{if(length($0) == 67) { print $0 }}' Rec_Length_Test.txt > 67.txt

Please let us know how to handle German characters using awk command.

Thanks in Advance.

Thanks,
Anthuvan J.

 sed -n -r '/^.{67}$/p' infile

or something like:

perl -pe 'while(<>) {print $_ if $_ =~ /^.{67}$/};' < infile

Hi rdrtx

Still I am facing the same issue. It's not working if the file is having special character(German Character).

Thanks,
Anthuvan J.

Please post a few lines of the file that include the characters.

  • What is your OS and version?
  • Are you using the right locale?
  • If a POSIX awk is using the right locale and length is not working correctly, then it is broken. It should report the number of characters, not the number of bytes. Reasonably recent GNU awk (3.1.8) reports it correctly..

Hi,

I have posted some lines

12345678X                                            X01        AB01  01             000000000000            12345678SIERINGJA   04                      0012  0000045433AB                     VZE UXXXXXXX                                                                                                                              ERLANGEN                                                                                                                     91012        D                POSTFACH 2xxx                                                                                                    
12345678X                                            X01        AB01  01             000000000000            12345678SIERINGJA   01                      0012  0000045434AB                     STADT LAATZEN                                                                                                                               LAATZEN                                                                                                                      30880        D                MXXXXPLATZ 13                                                                                                    
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  01                      0012  0000004000AB                     WEEK MXXX CXXXXXXXXX GMBH                                                                                                                     LEIPZIG                                                                                                                      04105        D                Exxxx Wxxxxxxx PLATZ 3 - 4                                                                                        
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  01                      0012  0000004002AB                     PXXXXXXX GMBH                      C/O FXXXX AG U. CO. KG, BETRIEBSRES                                   LF                                 ESSLINGEN                                                                                                                    73734        D                RXXXXXSTR. 82                                                                                                    
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  01                      0012  0000004003AB                     HXXXXXXXX GROUP EG                                                                                                                          MANNHEIM                                                                                                                     68219        D                                                                                                                                 
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  01                      0012  0000004005AB                     HOTELBETRIEBSGESELLSCHAFT          SXXXXXXXXX MBH                                                                                            BAD WOERISHOFEN                                                                                                              86825        D                HXXXXXXXX-AXXX-STRASSE 11                                                                                          
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  01                      0012  0000004006AB                     ALLGEMEINE GEB�UABREINIGUNG                                                                                                                 BERLIN                                                                                                                       12103        D                MXXXXXXXXXXSTRASSE 65                                                                                             
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  03                      0012  0000004007AB                     GXXX / Pxxxxxxxx GRUPPE BSC                                                                                                                 D�REN                                                                                                                        52353        D                Oxxx-Bxxxxxxxx-STRASSE 21                                                                                          
12345678X                           16P16V61 42      X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  02                      0012  0000004008AB                     AXXX HXXX HAUS GMBH                                                                                                                         STUTTGART                                                                                                                    70374        D                Fxxxxxxxxxxxx STRA�E                                                                                                
12345678X                                            X01        AB01  01    ABC      000000000000            12345678SAUNABRSBR  04                      0012  0000004011AB                     SERVAL GEB�UABMANAGEMENT                                                                                                                    ZWICKAU                                                                                                                      08056        D

Please reconsider publishing clear text customer data in here.

From what I see in your post / attachment, ALL lines have 601 characters, and ALL umlaut chars are represented by one single byte. Mayhap some strange MS windows char set? So, the problem can't be reproduced on this side.

I have done the lot of modification in the original data. It's a sample data. not a original data.