How to extract specific data?

LeftoverStew · March 4, 2014, 12:37pm

Bash scripting beginner here...

I have many folders, each folder representing one subject. Not all subjects have all the required files, so I need to somehow cycle through all the data and then extract the data only from subjects who have no files missing. I tried to output the ls command, but I don't know how to make a script that would be able to extract lines only if they are repeated x times for the same subject... Help greatly appreciated!

RudiC · March 4, 2014, 12:47pm

How do you tell no subjects are missing? Just the number of files? Or do you have a list of necessary files that have to be present? And what do you mean when you say "lines"

LeftoverStew · March 4, 2014, 12:55pm

I know not to use a subject if files with a specific part in their names are missing. So yes, there is a list of necessary files.

And by "lines" I meant each row of ls output - directory of each file. My subjects are numbered so I thought I would list all the necessary files, sort by subject number, and then extract only subjects who appear at least 4 times. That's the best I could come up with so far. Haven't figured out how to script for that though.

RudiC · March 4, 2014, 1:25pm

Not clear. Pls post a representative directory structure, a sample of the list of necessary files, and the desired operation of the script.

Chubler_XL · March 4, 2014, 4:15pm

Here is an example that checks for:

Subject/1.0 Install/start.doc
Subject/2.0 Handover
Subject/3.0 Operation
Subject/4.0 Retire

Change the FILES array to suit what you need

FILES=( "1.0 Install/start.doc"  "2.0 Handover"  "3.0 Operation"  "4.0 Retire" )

for subject in *
do
    IFS=$'\n' NEED=( $(printf "$subject/%s\n" "${FILES[@]}") )
    for file in "${NEED[@]}"
    do
       [ -f "$file" ] || continue 2
    done
    echo "All OK - process $subject"
    #
    # Put your processing code here
    #
done