Use portion of file name as new file name

Hello, This is my first posting and attempt to create a loop using Unix and awk commands. So far this is what I have:

 awk -F, 'BEGIN {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03"; 
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06"; 
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09"; 
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";}
 {print > "a_STHCSPHFM_ACTUAL_"months[substr($3,0,3)]"20" 
substr($3,5,2)"_RR.txt"}' STAVOSHFM.txt 

It basically takes the 3 Characters representing the months and replaces them for AP01, etc. creating new files for each month. This works fine if I have only one file as a source.

However, I have multiple files, for example STCOCICANHFM.txt and STHCSPHFM.txt
What I want to do is replace STAVOSHFM.txt with the name of the first file being processed and then process the STCOCICANHFM and lastly STHCSPHFM replacing the source file name each time. The ideas is being able to identify the source file based on the newly created file.

So I tried this:

  a "#!/usr/bin/awk -f" 
for EACHFILE in $FILEPATH/${FILENAME}*.txt
Do
awk -F, 'BEGIN {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03"; 
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";} {print > "a_"FILENAME"months[substr($3,0,3)]"20" 
substr($3,5,2)"_RR.txt"}' FILENAME
Done 

However this does not work. Please tell me what do I need to correct this code. Thanks in advance

Why not make use of awk 's predefined FILENAME variable and adapt your script like

awk '... >...FILENAME...' *.txt

I appreciate your answer, however,I have tried multiple iterations of the following changes to no avail.

"#!/usr/bin/awk -f"
For EACHFILE in * 
Do
awk -F, 'BEGIN {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03";
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";} {print > "a_" File "_ACTUAL_"months[substr($3,0,3)]"20" 
substr($3,5,2)"_RR.txt"}> File'*.txt
Done 

Would you mind being very specific, as I mentioned I am new to UNIX and don't know where to place the changes. I keep getting the following error:
Syntax Error Line 1.
:o

My first though would be the quotes in the first line "#!/usr/bin/awk -f"

Are they really there or is it some problem with pasting? Try taking them out to leave the first line being just #!/usr/bin/awk -f

Robin

I have remove the quotes. I get this:

-bash: Done: Command not found

The shell commands are do and done , but you have them as Do and Done

I suspect that the editor you are using automatically capitalises the first character of what it sees as a sentence. This is usually a bad thing when writing code, especially as most of these helpful writing editors can shove lots of control characters into what you want as code.

I would suggest using a plain text editor, such as vi or ed - or at worst notepad on Windows.

Robin

1 Like

erroneous and not needed. If you want to use awk in the "shebang", the script needs to be pure awk.

not used in script and thus obviously not needed

WHAT is File ? And, WHY two redirections?

Clean and consistent indenting sometimes helps track down errors. Try running this from the command line (i.e. DON'T put it into a script file). It should show you the target files for the redirections without actually creating and populating those files:

awk -F, '
BEGIN   {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03";
         months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
         months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
         months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";
        }
        {print "a_" FILENAME "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
        }
' *.txt
1 Like

The expected result should be 8 files:

a_STAVOSHFM_ACTUAL_AP06-2015_RR.txt 

NOT

a_STHCPSHFM.txt_ACTUAL_AP022016_RR.txt

Again, the idea is to be able to identify the file but the file must comply with a very specific mask.

You haven't given us any context for when you want it to execute. What are your plans/needs for the code? Are you hoping to do any of these?:-

  • Inserting it into an existing application
  • Schedule it at a regular time
  • Call it on user request
  • Have an icon to click
  • Something else?

Robin

I am currently using Cygwin64 Terminal for testing purposes, once successfully tested I will use this code or a variation of it in a software named Automic.
This software is object based and it can execute UNIX commands, but since the infrastructure were these commands will ultimately be executed is very complex, I do testing on an isolated environment with dummy files.

I was able to execute the code and see the results in Cygwin64 Terminal as I mentioned at the beginning of this thread; the issue I had was that the file name was hard coded in the awk script and I need to pick up the source file name (which is being split by the code into multiple files) while reusing a portion of its name in the newly created files.

Basically I want to make sure it runs without issues and review the newly created files for accurate formatting.
Again thank you for your help

Wouldn't it be helpful if you listed the issues and posted the "accurate formatting" so we have something to work upon?

For your FILENAME extension problem in post#8 try to remove the extension like e.g.

        {TMP = FILENAME
         sub (/\.*$/, "", TMP)
         print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
        }
1 Like

Here is the whole process:

  1. Multiples files are deposited in an inbox folder
  2. File names vary but they are all .txt
  3. Each file has month and year identifier on each line
  4. The awk script splits the original file into multiple files
  5. The original file name must be inserted between the string I have defined in the awk statement(prefix,original filename, month dash year, suffix)
  6. This process has to be executed for each file deposited in the inbox
  7. The source data for each file is now easily identified by its original file name

The script I posted at the start of this thread successfully split a file, but the source file and the destination files were not dynamically named and it processed only one file.
The modified script below produces the split files but still has issues in the naming convention:

awk -F, 'BEGIN 
{months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03"; 
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06"; 
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09"; 
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";} 
TMP = FILENAME sub (/\.*$/, "", TMP) 
{print > "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"}' *.txt

The original files are

[*]STAVOSHFM.txt
[*]STCHSPHFM.txt

The results are as follows:

[*]a_STAVOSHFM.txt1_ACTUAL_AP022016_RR.txt
[*]a_STAVOSHFM.txt1_ACTUAL_AP032015_RR.txt
[*]a_STAVOSHFM.txt1_ACTUAL_AP052016_RR.txt
[*]a_STAVOSHFM.txt1_ACTUAL_AP062015_RR.txt
[*]a_STHCPSHFM.txt1_ACTUAL_AP022016_RR.txt
[*]a_STHCPSHFM.txt1_ACTUAL_AP032015_RR.txt
[*]a_STHCPSHFM.txt1_ACTUAL_AP052016_RR.txt
[*]a_STHCPSHFM.txt1_ACTUAL_AP062015_RR.txt

Basically the only issue to resolve is the removing of the .txt1 from the inserted file name. The expected result would be:

[*]a_STAVOSHFM_ACTUAL_AP022016_RR.txt
[*]a_STAVOSHFM_ACTUAL_AP032015_RR.txt
[*]a_STAVOSHFM_ACTUAL_AP052016_RR.txt
[*]a_STAVOSHFM_ACTUAL_AP062015_RR.txt
[*]a_STHCPSHFM_ACTUAL_AP022016_RR.txt
[*]a_STHCPSHFM_ACTUAL_AP032015_RR.txt
[*]a_STHCPSHFM_ACTUAL_AP052016_RR.txt
[*]a_STHCPSHFM_ACTUAL_AP062015_RR.txt 

You did NOT copy the proposal. How do you expect it to work, then?

I did add the proposed changes if I leave the curly bracket before the declaration of the temporary variable it does not change the file name. If I place it before the print command, it adds a 1 to the text extension; right before the word actual.

awk -F, 'BEGIN 
{months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03"; 
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06"; 
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09"; 
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";} 
TMP = FILENAME sub (/\.*$/, "", TMP) 
{print > "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"}' *.txt

PLEASE copy AS IS - do not interfere without knowing EXACTLY WHAT you are doing. DON'T concatenate the statements from multiple lines into one line without additional measures (e.g. adding a semicolon).

Apologies, RudiC. There's no intention of argument on my part nor to give offense. However, your comment is a disservice to the people coming here.

It is OK to change things even if you do not know what you are doing, it is OK to experiment and come back asking why it is not working. Please, try things and if it is broken ask why it is broken. Do not accept the suggestions as written in stone rules. They are not! Convince yourself that that's what you want and learn from it.

However, when you ask for help, please, give motivation to the person helping you by explaining what you did, tried or it is not working and do not just say it doesn't work (or the effect of it). Give some incentive to the people helping you to continue to do so.

RudiC, I guess I am having problems following directions. If I use the code as is I get this results:

Raul@Raul-PC ~
$ {TMP = FILENAME
-bash: {TMP: command not found

Raul@Raul-PC ~
$          sub (/\.*$/, "", TMP)
-bash: syntax error near unexpected token `/\.*$/,'

Raul@Raul-PC ~
$          print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
-bash: syntax error near unexpected token `('

Raul@Raul-PC ~
$         }'

If I add it to the existing code, replacing what I have the new code looks like this:

awk -F, 'BEGIN {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03"; 
months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06"; 
months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09"; 
months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";}
{TMP = FILENAME
         sub (/\.*$/, "", TMP)
         print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
        }'

The result looks like this:

Raul@Raul-PC ~
$ Raul@Raul-PC ~
> months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
> months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";}
> {TMP = FILENAME
>          sub (/\.*$/, "", TMP)
>          print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
>         }'-bash: Raul@Raul-PC: command not found

Raul@Raul-PC ~
$ $ awk -F, 'BEGIN {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03";
> > months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
> > months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
> > months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";}
> > {TMP = FILENAME
> >          sub (/\.*$/, "", TMP)
> >          print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
> >         }'

So I am stuck:confused: because as you said I don't know what I am doing, that I why attempted to correct by removing and adding instructions such as the

*.txt

at the end of the instructions and changing the position of the curly brackets. I do appreciate your efforts. I am just lost

To run an awk script from the shell ( bash ?) command line, you need to enter a command like awk extvars '...script...' inputstream (other syntaxes are possible).
What I see in your post above is quite mysterious to me:

  • -bash: {TMP: command not found looks like the leading awk is missing
  • -bash: Raul@Raul-PC: command not found looks like you copied the shell prompt for the command

To make it (hopefully) unambiguously clear: please run

awk -F, '
BEGIN   {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03";
         months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
         months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
         months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";
        }
        {TMP = FILENAME
         sub (/\.*$/, "", TMP)
         print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_RR.txt"
        }
' *.txt

from the command line and report the results.

RudiC
I have copied exactly what you had on your last reply, here are the results.

Raul@Raul-PC ~
$ awk -F, '
> BEGIN   {months ["JAN"]="AP01"; months ["FEB"]="AP02"; months ["MAR"]="AP03";
>          months ["APR"]="AP04"; months ["MAY"]="AP05"; months ["JUN"]="AP06";
>          months ["JUL"]="AP07"; months ["AUG"]="AP08"; months ["SEP"]="AP09";
>          months ["OCT"]="AP10"; months ["NOV"]="AP11"; months ["DEC"]="AP12";
>         }
>         {TMP = FILENAME
>          sub (/\.*$/, "", TMP)
>          print "a_" TMP "_ACTUAL_"months[substr($3,0,3)]"20" substr($3,5,2)"_R                                                                                                               R.txt"
>         }
> ' *.txt
a_STAVOSHFM.txt_ACTUAL_AP062015_RR.txt
a_STAVOSHFM.txt_ACTUAL_AP022016_RR.txt
a_STAVOSHFM.txt_ACTUAL_AP032015_RR.txt
a_STAVOSHFM.txt_ACTUAL_AP052016_RR.txt
a_STHCPSHFM.txt_ACTUAL_AP062015_RR.txt
a_STHCPSHFM.txt_ACTUAL_AP022016_RR.txt
a_STHCPSHFM.txt_ACTUAL_AP032015_RR.txt
a_STHCPSHFM.txt_ACTUAL_AP052016_RR.txt

Raul@Raul-PC ~
$

See depiction for ease
or https://1drv.ms/i/s!AqQovh6lB7bfg0qta9AuWIU87Ajg

Hmmm, negligence from my side. Still, we're getting there! In the sub statement, replace /\.*$/ with /\..*$/ and try again. You should have a list of valid file names now. If happy, replace print "a..." with print > "a..." to create all the new files you needed.

1 Like