Grouping files on pattern

I have this Requirement where i have to group the files,
I have a folder say "temp" where many files resides...files are like this;

010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort_MNOLO_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort-MNOLO_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt

So here, in every file there are three patterns i.e.
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

so these files should be grouped based on marked words(words can be distinguised either by "-" or "_"). and all 3 pattern are of descirbed length always.
After grouping files move in there particular directory for eg

one folder will be created with the name
/temp/010020001_S-ABC-Sort-DEFAW
and inside this file there will be corresponding grouped files i.e.

010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort_DEFAW_YYYYMMDD_HHMMSS.txt
010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

/temp/010020004_S-PQR-Sort-DRTON

010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt
010020004_S-PQR-Sort-DRTON_YYYYMMDD_HHMMSS.txt

/temp/010020009_S-JKL-Sort-MNOLO

010020009_S-JKL-Sort-MNOLO_YYYYMMDD_HHMMSS.txt
010020009_S-JKL-Sort_MNOLO_YYYYMMDD_HHMMSS.txt

Please help me in this. Let me know if require any other infornation.

TIA:b:

Any attempts / ideas / thoughts from your side?

What OS / shell version do you use?

My bash version :
-bash-4.1$ echo $BASH_VERSION
4.1.2(1)-release

Is this the way to get the version or is it something else.

I guess i have made my requirement bit complicated,

so basically my first requirement is to group the files based on 3 patterns i.e.

010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

First Patter-last 3 char of first word

010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

second pattern- After S there can be "-" or "_" doesnt matter but after that 3 char

010020001_S-ABC_Sort_DEFAW_YYYYMMDD_HHMMSS.txt

third pattern- After Sort there can be "-" or "_" doesnt matter but after that 5 char

And these file naming and pattern is constant.
after grouping them need to move in seperate dir and those dir can be of any name like
Dir ABC-first set of grouped files
Dir BDC-second set of grouped files and so on

That's not really explaining your attempts ... howsoever, try

for FN in *.txt; do TMP="${FN:0:20}-${FN:21:5}"; echo mkdir -p "$TMP";  echo mv "${FN}" "$TMP"; done

Remove the echo commands if output deems acceptable...

The question RudiC presented was not about your requirements, but about what operating system you're using (which you have yet to answer) and what shell you're using (which you have answered). He also asked what you have tried to solve this problem on your own. We are here to help you learn how to use the tools available on your operating system to do what you need to do. We are not here to act as your unpaid programming staff.

Please tell us what operating system you're using. The output from:

uname -a

is a great way to tell us what we need to know.

And, please show us what you have tried to solve this problem on your own.

If the various parts of your filenames are not all fixed length, you could also try something like:

#!/bin/bash
for file in 010020*[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do	IFS='[_-]'
	set -- $file
	unset IFS
	destination=/tmp/${1}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done
1 Like

Thanks a lot Don Cragun,
please find below the output

-bash-4.1$ uname -a
Linux ucsdv181.symprod.com 2.6.32-696.10.3.el6.x86_64 #1 SMP Thu Sep 21 12:12:50 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

I tried your code and it worked perfectly fine.. just want to know couple of things in that script like

#!/bin/bash
for file in 010020*[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do	IFS='[_-]' --what is the use of IFS here
	set -- $file --What this command is doing
	unset IFS
	destination=/tmp/${1}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done

Also, the first word in my file i.e
010020001_S-ABC-Sort-DEFAW_20170412_121224.txt need not to be start from these same values..what i am trying to say is

010020001_S-ABC-Sort-DEFAW_20170412_121224.txt
111020001_S-ABC-Sort-DEFAW_20180412_121224.txt
should be of same group and move to one dir instead of creating two directories..in first word only last 3 char matters to group them.
Again thanks

The IFS variable is used by the shell when splitting fields. Each character in the value of the string assigned to IFS will be used as a field delimiter (although there are some special cases for strings of adjacent characters in the space character class when characters in that class are field separators). The value I used happens to work for the filenames in your example, but it should have just been:

IFS='_-'

or:

IFS=_-

to have the shell split fields on just <underscore> and <hyphen> characters. The value I used in the script would also split fields on open and close square brackets. When IFS is unset, the shell behaves as if IFS had been set to a string containing the three characters <space>, <tab>, and <new-line>.

The command:

set -- $file

first clears all positional parameters and then sets the positional parameters for the current shell execution environment to the values obtained by performing field splitting on the expansion of the file variable. When $file expands to the string:

010020001_S-ABC-Sort-DEFAW_YYYYMMDD_HHMMSS.txt

that sets the positional parameters as follows:

$1 010020001
$2 S
$3 ABC
$4 Sort
$5 DEFAW
$6 YYYYMMDD
$7 HHMMSS.txt

and sets the special parameter # to the number of positional parameters (i.e., $# expands to 7).

In post #1 in this thread you said that all of the filenames you would be processing started with 010020 , and I wrote the filename matching pattern used to identify the files to be processed by the for loop in the script I gave you to match that statement. If those 1st six characters aren't always 010020 , how do we know what name is supposed to be used for the directory into which files are to be moved?

Are you now saying that files to be processed have names starting with 010020 or 111020 , or can there be any string of six digits there? Or, any string of six characters? Or, any string of an arbitrary number of characters?

Making lots of unwarranted assumptions, maybe the following will come closer to what you want:

#!/bin/bash
for file in ??????[0-9][0-9][0-9][_-]S[_-]*[_-]*[_-]*[_-]*[_-]*.txt
do	IFS=_-
	set -- $file
	unset IFS
	destination=/tmp/010020${1#??????}_$2-$3-$4-$5
	mkdir -p "$destination"
	mv "$file" "$destination/$file"
done

Hi Don Cragun,
My requirement is litle bit changed..instaed creating dynamic dir i want to send to these files to file list but not all files together rather first only one group set of files and then second group set.
So i am using this code

#!/bin/bash
cd /apps/sym/arcload/VSORT_X9/test
FILE_LIST=/apps/sym/arcload/VSORT_X9/SrcFiles/Source_input.txt
for file in *[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt
do      IFS='[_-]'
        set -- $file
        unset IFS
        echo "$file" >> $FILE_LIST
done

but here problem is its writing all files to the file list...i want to send first 4 grouped files do operation on that and same operation for next 2 grouped files and so on..

thanks for your help again:b:

I do not understand what you are trying to do, so I don't see how I can help you.

You have removed all of the code that created groups of files, so I don't know what groups you are now talking about.

The code you have shown above could be replaced by the much simpler code:

#!/bin/bash
cd /apps/sym/arcload/VSORT_X9/test
FILE_LIST=/apps/sym/arcload/VSORT_X9/SrcFiles/Source_input.txt
printf '%s\n' > "$FILE_LIST"  *[_-]S[_-]*[_-]*[_-]*[_-][0-9][0-9][0-9][0-9][01][0-9][0-3][0-9][_-][0-2][0-9][0-6][0-9][0-6][0-9].txt

Please show us a brief list of filenames you want to process and then clearly explain how you decide which four files to process first, which two files to process next, and so on. And, clearly explain what operation you want to perform on each group of files.

Note that if the directories you created in your earlier problem are the groups you're talking about now, the list of files in your output file is not sorted into groups in any manner (and files that would have been grouped together in directories will not necessarily even be adjacent to each other in the output file produced by your code above).

More importantly, if you didn't understand the answers I provided in post #7 in this thread for your questions, please ask more questions. Our goal here is to help you learn how to write your own code. We are not here to act as your unpaid programming staff.