At 10 megabytes per second it sounds like there's some room for improvement. But you can't go too crazy or you'll just slow your disk down to uselessness.
This requires the BASH shell, mostly for the ability to do wait "$ONEPARTICULARTHREAD" instead of wait #for everything
#!/bin/bash
maxproc=2 # Max number of threads. Suggest 2, or 3 at most
i=0
# Count files
set -- /home/cmccabe/Desktop/NGS/API/*.tar.bz2
FILES="$#"
# Blank $1 $2 ...
set --
let i=1
for FILE in /home/cmccabe/Desktop/NGS/API/*.tar.bz2
do
printf "(%2d/%2d)\tProcessing %s\n" "$i" "$FILES" "$FILE"
let i=i+1
tar -xvjf "$FILE" -C /home/cmccabe/Desktop/NGS/API >/dev/null &
# Turn $1=pida $2=pidb $3=pidc $4=pidd, into
# $1=pida $2=pidb $3=pidc $4=pidd $5=pide
set -- "$@" $!
# Shift removes $1 and moves the rest down, so you get
# $1=pidb $2=pidc $3=pidd $3=pide
# $# is the number of arguments.
if [ "$#" -ge $maxproc ] ; then wait "$1" ; shift; fi
done
NAME
parallel - build and execute shell command lines from standard input in
parallel
...
Some details on parallel:
parallel build and execute shell command lines from standard in... (man)
Path : /usr/bin/parallel
Version : 20130922
Length : 6224 lines
Type : Perl script, ASCII text executable
Shebang : #!/usr/bin/env perl
Repo : Debian 8.7 (jessie)
Home : https://www.gnu.org/software/parallel/ (pm)
Modules : (for perl codes)
IPC::Open3 1.16
POSIX 1.38_03
Symbol 1.07
CGI::File::Temp 4.09
File::Path 2.09
Getopt::Long 2.42
strict 1.08
strict 1.08
FileHandle 2.02
POSIX 1.38_03parallel build and execute shell command lines from standard in... (man)
Some help at:
You can also watch the intro video for a quick introduction:
http://tinyogg.com/watch/TORaR/ http://tinyogg.com/watch/hfxKj/ and
http://tinyogg.com/watch/YQuXd/ or
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Can I just check that this files are local and not NFS mounted. If they are remote, then you are dependant on the network too, along with a dollop of memory to work on the file. You will also lose any caching that could help you.
#!/bin/bash
maxproc=2 # Max number of threads. Suggest 2, or 3 at most
i=0
# Count files
set -- /home/cmccabe/Desktop/NGS/API/*.tar.bz2
FILES="$#"
# Blank $1 $2 ...
set --
let i=1
for FILE in /home/cmccabe/Desktop/NGS/API/*.tar.bz2
do
printf "(%2d/%2d)\tProcessing %s\n" "$i" "$FILES" "$FILE"
let i=i+1
tar -xvjf "$FILE" -C /home/cmccabe/Desktop/NGS/API >/dev/null &
# Turn $1=pida $2=pidb $3=pidc $4=pidd, into
# $1=pida $2=pidb $3=pidc $4=pidd $5=pide
set -- "$@" $!
# Shift removes $1 and moves the rest down, so you get
# $1=pidb $2=pidc $3=pidd $3=pide
# $# is the number of arguments.
if [ "$#" -ge $maxproc ] ; then wait "$1" ; shift; fi
done
The command drl suggested was not pbzip but in fact parallel. Instead of extracting multiple partial files from one tar, you can get several tars extracting at once.
Which is what my code is for, actually.
I neglected one line at the end. It shouldn't have mattered, but if the code did manage to quit while the children were running, its possible it killed them instead of waiting. So:
#!/bin/bash
maxproc=2 # Max number of threads. Suggest 2, or 3 at most
i=0
# Count files
set -- /home/cmccabe/Desktop/NGS/API/*.tar.bz2
FILES="$#"
# Blank $1 $2 ...
set --
let i=1
for FILE in /home/cmccabe/Desktop/NGS/API/*.tar.bz2
do
printf "(%2d/%2d)\tProcessing %s\n" "$i" "$FILES" "$FILE"
let i=i+1
tar -xvjf "$FILE" -C /home/cmccabe/Desktop/NGS/API >/dev/null &
# Turn $1=pida $2=pidb $3=pidc $4=pidd, into
# $1=pida $2=pidb $3=pidc $4=pidd $5=pide
set -- "$@" $!
# Shift removes $1 and moves the rest down, so you get
# $1=pidb $2=pidc $3=pidd $3=pide
# $# is the number of arguments.
if [ "$#" -ge $maxproc ] ; then wait "$1" ; shift; fi
done
wait