Pattern Match FileNames

I am on AIX.

I need to list the contents of the directory based on a pattern and write an XML output file with file names.

If a filename does NOT match the below pattern then write an OUTPUT xml file in the below xml format

Pattern

Starts with (.abc) and contains (def)
Starts with (.abc) and contains (pqr)
Ends with (.xml) & conatins (xyz)
Starts with (.tvs)
Contains(.hij)

Additionally irrespective of the patter match if the FileName Contains (space character) include that FileName in the Output XML file

Output File Structure

<Files>
<FiileName>LMN.txt</FileName>
<FileName>OTS.txt</FileName>
</Files>

Please advise

1 Like

Some sample data would help as the inverse complex pattern is beyond my imagination. What's your shell / version? Your files have leading dots? I guess <FiileName> is a typo?

1 Like

Hello techedipro,

Which AIX version are you running? It might make a difference.

Are these conditions to be AND-ed together or OR-ed together? Please be specific else we might go the wrong way about this. As RudiC mentions, it would be better to know what to positively look for rather than try to strip out the unwanted.

Perhaps (depending on the size of the file list being processed) you could use these tests to build a list of files that you want to exclude and then use something like grep -vfF remove_these main_file_list > wanted_file_list to get you close, but if the remove list gets large then results can be less predictable and slow. It would cost processing and IO to go this long way round, but it's possible if there is not better logic to positively select what you want to report.

The easiest way to get your actual output file when you have a list of names may be something like:-

awk 'BEGIN {print "<Files>"} ; {print "<FileName>"$0"</FileName>"} ; END {print "</Files>"} ' wanted_file_list  >  output_file

Of course, there may be a better way to blend it into a single operation, saving the processing and IO cost, but you need to help us understand the context. Some examples would be good.

I hope that this helps,
Robin

2 Likes

Should you run a recent shell ( bash , ksh ) that provides "extended globbing" of "pattern-lists", you might use this to feed into rbatte1's awk proposal:

ls  @(!(@(.abc*@(def|pqr)*|.tvs*|*xyz*.xml|*.hij|.hij*))|@(*\ *|.*\ *))
2 Likes

RudiC & rbatte1

Thanks for your valuable inputs.

version : Version M-11/16/88f

I have made minor change to the pattern as well as corrected the typo on the output file and also included sample filenames and the expected output file.

If a filename does NOT match the below pattern and if any of the FileNames contain spaces in them then write an OUTPUT xml file with the FileNames in the below xml format

Ends with (.abc) and contains (DEF)
Ends with (.abc) and contains (PQR)
Ends with (.xml) and conatins (XYZ)
Starts with (TVS)
Starts with (TVS) and contains(SPR)
Contains(HIJ)

FileNames

cqa_20180405_tom_DEF.abc
uvw_bs_PQR_041118120208.abc
wvu_XYZ_041118120208.xml
TVS_~tosp.sh
TVS_SPR.txt
HIJ_03_15_2018.xml
LMN.txt
OTS.txt
iws_ eti-.oiy .txt

OutputFile

<Files>
<FileName>LMN.txt</FileName>
<FileName>OTS.txt</FileName>
<FileName>iws_ eti-.oiy .txt</FileName>
</Files>
1 Like