Automatically correct a File Name

Hi All

Need a help to understand how one can automatically correct a file name using a shell script even if it is in a completely different format...

Ex : adv_OK_0215_mem_rules_firing.txt / advex_0215_OK_me_rule_fire.txt (or in any other format as above)

to : advext_OK_2015_mem_rule_firings.txt (Final Name)

You need to depict an algorithm how to convert all the possible corrupt elements into their final forms. What corrupt elements can occur?

Hi, chatwithsaurav.

A Google search for algorithm for spelling correction produces more than 0.3 million hits.

A Google search for algorithm for language translation produces 26 million hits.

A Google search for algorithm for string substitution produces about 0.6 million hits.

A Google search for algorithm for string substitution table produces about 5 million hits.

Best wishes ... cheers, drl

Hi RudiC

Many corrupt elements can occur but the script should automatically correct it to the final name. I actually need that algorithm :slight_smile:

Regards
Saurabh

Note that I will seriously doubt your judgement if you run either of these scripts.

With the information you have provided:

for i in *_*_*_*_*_*.txt
do     mv "$i" advext_OK_2015_mem_rule_firings.txt
done

or, if the number of underscores or the .txt could also be corrupted, you could simplify this to just:

for i in *
do      [ -f "$i" ] && mv "$i" advext_OK_2015_mem_rule_firings.txt

Of course either of these might match several files and destroy all but one of them, but if you can't be any more specific about what corruption can occur, that is the best we can do for you.

1 Like

Hi Don

Thanks for your solution. Well normally the user who puts the file in a predefined directory makes spellings mistakes as close to the examples which I have shared. Assuming that the "file name sequence" remains same (i.e advext_OK_0215_mem_rule_firings.txt) he/she makes mistakes like : "ad" instead of "advext" or "firng" instead of "firings" or "me/m" instead of "mem". The script will identify each "word" and correct it if its wrong. He/she can give two or more underscores and even no underscores as well.

I'd propose you run a validity check on the users' input, then, and to avoid a file rename based on guesses (at best).

Hi RudiC

Like every word in the string needs to be validated?

Every item that you deem necessary for the correct syntax of your file names; words, underscore count, string length, whatever...

Thanks Rudi...

Hi Don

Can you please throw some more light on the code which u shared. Its working but removing all the rest of the files. Can the "*" in the for loop be replaced with anything else?

Chatwithsaurav,

You simply have not provided us with sufficient information about the legal names and formats of your file for us to assist you at present.

The examples you provide are just random examples. Unless you can rigidly formulate your requirements and communicate them, nobody can help you.

Hi Murphy

Well the thing is that I need to "rename" two files in the correct format.
The correct format are : "advext_OK_0315_mem_rule_firings.txt" and "advext_OK_0315_rule_thirdparty.txt"
Now the challenge is that when the user places them in a predefined directory, he/she places it with a different name. Example : "ad_OK_0315_me_rul_firin.txt" or "adv_0315_OK_m_rule_fir.txt" or "adv_OK_1503_m_r_firing.txt" or "ad_OK_1214_r_third.txt" or "OK_0315_ad_r_thirdparty.txt" .... etc...

The unix shell script will "validate" and "eliminate" these errors and rename them in the correct format. Also the script should accept input parameters in
places where there is plan name and run qual. Plan name is : OK (There are other plan names as well) and Run Qual is : 0315 (MMYY format - This will change everymonth)

Making assumptions on what the user wanted is always critical to be truely the same.
Specialy regarding the deletion/moving/renaming of files, i'd only assume on creating new files/names.

Then, as RudiC proposed, run a validity check, like

  1. Does the input exist as file
  2. Does any of the input string parts (without the _'s) match any of the existing files
  3. Let the user select from the shortest (compare each list vs the others, still valid) list of existing files matching the users input the most.
  4. Use the select file (variable) for the existing process.

One could turn around the approach... (imo safer approach)

  1. Get a list of valid string parts by scanning/parsing the existing file(name)s.
  2. Ask the user part by part which string he wants to add, generate a filename-string from those selections.
  3. Check if the filename-string exists (or how many files match the current string), if not, loop again.
  4. Use the generated file(name-string).

hth

EDIT - Question:
Why, other than a simple char-typo-mix, would they write a different formated string.
You could show an example input (to the user, before they enter the input).

Did you not read my post?

The first line of that post in bold red text warned you not to run those examples. The last line of that post explicitly said that the second script would destroy all but one of the files in that directory. What it does is correct the name of every regular file in your current directory with a name ending in .txt to the name that you said the user should have named the file in the first place.

As many of us have repeatedly stated: Unless you can define what constitutes a valid name and specify how to map an invalid name to the "correct" valid name, that script is the best we can do (and IT IS NOT SUITABLE FOR ANY USE other than to show that we don't have enough information to do what you seem to want).

All that you have really told us is that you have a directory that contains some files. Some of the filenames are bad. There is one good name ( advext_OK_2015_mem_rule_firings.txt ). There may be other good names. Fix the bad filenames.

Our crystal balls aren't that good.