include file name to extracted files

miss_dodi · April 27, 2011, 10:41am

I've written the script below to merge only .txt files that exist in one directory into one huge .txt file and ignore other files with other extensions.
now the result is one huge .txt file with all the contents of other .txt files

how can i add a File Name as a comment before each file?

//FileName

system='/path/to/my/directory'
cat `find ${system}  -name '*.txt'` > outputFileDirectoryName.txt

2pugs · April 27, 2011, 10:46am

Here's a ksh script that would do the trick.

#!/bin/ksh
 
for txtfile in $(ls $system/*.txt)
do
    echo "# Filename: $txtfile"
    cat $txtfile >> outputFileDirectoryName.txt
done

miss_dodi · April 27, 2011, 11:12am

i think i should specify the path in the system because i want to merge each directory with all .txt files in it in a separate file.
meaning
Directory A: has many files, i want to merge all .txt files in this directory only
Directory B: same thing merge all .txt file exist in this directory only.
and so forth.

when i executed the script u posted I got this error:
ls: cannot access /path/to/my/directories/*.txt: No such file or directory

#!/bin/ksh
system='/path/to/my/directories/DirectoryA'
for txtfile in $(ls $system/*.txt)
do
 echo " #FileName : $txtfile"
 cat $txtfile >> outputFileA.txt
done

When i removed system='/path/to/my/directory/DirectoryA'
i got this error
ls: cannot access /*.txt: No such file or directory

2pugs · April 27, 2011, 1:50pm

From the errors it gave you, it sounds like it's having trouble finding either the files or directories. I would verify that your path is correct. You could add a test for the directory in the script that could tell you this:

 
#!/bin/ksh

system='/path/to/my/directories/DirectoryA'
 
if [ -d $system ]  # If the directory exists..
then
    for txtfile in $(ls $system/*.txt)
    do
        echo " #FileName : $txtfile"
        cat $txtfile >> outputFileA.txt
    done
else
    echo "Sorry, but that directory doesn't exist."
fi

As for wanting to repeat this on more directories, you could wrap a for loop around the above code like this:

for mydir in dirA dirB dirC
do
    echo "Now processing $mydir ..."
    # ... code from above ...
done

This is where I would define a function and pass the directory you want to process to the function as an argument. I hope that helps you.

cjcox · April 27, 2011, 2:25pm

How about?

find "$system" -name '*.txt' -exec sh -c "echo '# {}';cat {}" \;

miss_dodi · April 27, 2011, 3:06pm

Thank you for your response.
I think i didnt explain what i need clearly,
I have 145 directories each one has many files,
for example one of the directories is
ClassA that has grade.txt subjects.txt courses.txt description.xml
i want to have one file called ClassA.txt that contanis all the contents of grade.txt , subject.txt , courses.txt
and each part in ClassA.txt should have comment '//' and the file name
i.e
ClassA.txt
would be like
//grade.txt
[content of grade .txt]
// subject.txt
[content of subject.txt]
and so forth

what i've managed to do so far is

#!/bin/sh
system='/home/path/to/first/directory'
for txtfile in `find ${system} | grep "\.txt"'$'` ; do
#echo $txtfile 
cat $txtfile | `find ${system} -name '*.txt'`  >  ClassA.txt
done

i dont want to display the path of $txtfile as shown in the code above rather i would like to append the vale of $txtfile before each .txt file

unfortunately the code above isnt working, am getting permession denied error!

vgersh99 · April 27, 2011, 3:27pm

something to start with:

find . -type f -name '*.txt' | xargs -I {} -i ksh 'echo //$(dirname {}); cat {}'

alister · April 27, 2011, 4:51pm

That's a rather pointless construct. It's inefficient (requires fork-exec to create a subshell and run ls), prone to breakage if any of the file names contain an IFS character (by default, space, tab, and newline), and is bound by the system's exec()'s limit (ARG_MAX).

A simpler, safer, more efficient alternative: for txtfile in "$system"/*.txt

Regards,
Alister

miss_dodi · April 28, 2011, 9:53am

Thank you all for ur efforts, unfortunately none of the suggested solutions has worked for me.
I managed to get the file name, the question is how to inser the file name before the concatenation happens.

generated txtfile: no such file or directory

has generated missing argument to -exec

xargs: ksh: No such file or directory

vgersh99 · April 28, 2011, 9:56am

what OS are you on?
Try:

find . -type f -name '*.txt' | xargs -I {} -i /bin/ksh 'echo //$(dirname {}); cat {}'

miss_dodi · April 28, 2011, 10:01am

Linux Ubuntu 10.04.2

Chirel · April 28, 2011, 11:07am

Hi,

If i understand what you want, and if there is no IFS characters on filenames and dirnames, this could do the trick.

In my example, i suppose that all dirname with txt files to cat are named Class*.
But you can remove the -name 'Class*' to include all dirs.

#!/bin/bash
echo "Let's do the job . . ."
for txtdir in $(find . -type d -name 'Class*'); do
    # Remove old output file.
    rm $txtdir.out &>/dev/null
    for txtfile in $(find $txtdir -name '*.txt'); do
        [ -e $txtdir.out ] && echo "Processing $txtdir.out . . ."
        echo "  Adding $(basename $txtfile)."
        echo "// $(basename $txtfile)" >>$txtdir.out
        cat $txtfile >>$txtdir.out
    done
done
echo "Done."

alister · April 28, 2011, 1:11pm

Hello, miss_dodi:

Unless I missed it, you never made it clear if the output filename varies with directory and if so how to choose its name. To collect the desired concatenation of all .txt files in a directory, with each file's contents preceded by its filename, the solution that follows will create a file named ALL-TEXT-FILES.txt in each directory

You mention that you have 145 directories, but neglected to explain how the code is expected to visit them. Do you have a list to feed the script, either via a pipe or command line arguments? Or are they all in a hierarchy which can be simply traveresed with find from a single root location? I will assume the later and the following script can take a single argument, the location of this starting directory. If absent, the current working directory is assumed.

#!/bin/sh

find "${1:-.}" -type d -exec sh -c '
        for d; do
                out=$d/ALL-TEXT-FILES.txt
                for f in "$d"/*.txt; do
                        { [ -f "$f" ] && [ -r "$f" ]; } || continue
                        printf "//%s\n" "${f##*/}" >> "$out"
                        cat "$f" >> "$out"
                done
        done 
' sh {} +

I tested it and it works as I intend.

However, there is a bug in this code (which is also present in some of the other suggestions). It's unlikely to be triggered, but it's lurking ... sleeping ... hoping.

In case anyone would prefer to find it themselves...

***** CAUTION: SPOILERS AHEAD *****

If a directory happens to contain a file whose name is identical to the output file, cat will enter an infinite loop of reading-writing to itself until the machine explodes. The non-lazy solution would be to use a unique tempfile (or at least a filename that is guaranteed to be outside the traversal).

Regards,
Alister