To check Blank Lines, Blank Records and Junk Characters in a File

Hi All

Need Help

I have a file with the below format (ABC.TXT) :

���ABCDHEJJSJJ|XCBJSKK01|M|7348974982790
HDFLJDKJSKJ|KJALKSD02|M|7378439274898
KJHSAJKHHJJ|LJDSAJKK03|F|9898982039999
(cont......)

I need to write a script where it will check for : blank lines (between rows,before first line and after last line) and remove them. Also if there are any blank records between the "|" delimeter it will identify the row number and send a mail. Finally in case there are "junk characters" it will identify and remove them. How should I go about it?

I know what a blank line is. What is a ": blank line"? Is it a blank line that can contain colons in addition to <space>s and <tab>s?

By definition, a blank line only contains characters in the character class blank and the character | is never a member of that character class. So, what do you really mean by "blank records between the "|" delimiter"?

All of the characters in the 1st two fields of the 1st three lines of your sample ABC.TXT look like "junk characters" to me. What is your definition of a junk character? If " KJHSAJKHHJJ" isn't junk, what is it?

Hi Don..

Apologies for not "specifying" correctly. :frowning:

1>Well Blank Line would mean spaces and tabs only.
2>Blank records in between "|" delimiter means that the column is NULL (blank as of now) . User has forgot to place the data there.
Example ABCDEFGH|"NULL"|M|"NULL"
"NULL"|XYZABNH|"NULL"|4567344
3>Junk Characters would mean characters other that alphabets or numerical or special characters. Something which is not understandable.
Example �'�|��

" KJHSAJKHHJJ" or "ABCDEFGH" are examples which I have used instead of proper names.

1) awk with default field separators would set NF to 0 for those lines. Alternatively, you could use the regex /^[ ]*$/ (space and <TAB>) to identify them.
2) You'll need to loop over all fields and check for the length of each.
3) "Something which is not understandable" can be very locale depending. All of your examples "�'�|��" are very essential in languages other than (US) English or necessary for e.g. record/parameter delimiting.

Hi Rudi

Thanks!:slight_smile:

Actually I have trying this now :

a=`grep -c " " abc.txt`
if [ $a gt 0 ]
then
      {perform rest of code}
fi

But I need to get a better syntax where a simple grep/sed command will help me to find and count any spaces/tabs between any rows (above,middle or below) and also in front and after last of any lines in a row. Can you please help me in that?
Example

abc.txt :
Space1 [/BOF]
Space2
ABCHSDJRIR|MSDJAS|M|122121
Space3
ASDSDSADSAS|DASDASD|K|12328137 
Space4 ASDSADASDA|qwueiwuqoei|H|1219827918Space5
[/EOF]

Regards

We need a better preparations of your post.
Use code tags as required by the forum rules.

What have you tried so far?
REAL code, not nonworking-abreviahted pseudo code.

The above saying (would) show the respect many people here deserves as they are trying to help you.
And its absolute annoying to ask for every single bit.

What is your definition of 'get a better syntax', while you not provide a valid/(basicly) working syntax at all.

Besides, there are NO SPACECHARS at all in your textfile.
AGAIN, USE CODE TAGS!
And provide samples AS-IS. (example: not a 2TB database, but a few lines of actual data to work with)

Thank you, have a nice weekend

1 Like

Absolutely.
And, the requirement in post#5 differs from the ones in post#1. A concise specification accompanied by a adequate sample is needed!