Splitting a file into chunks of 1TB

Hi

I have a file with different filesystems with there sizes. I need to split them in chucks of 1TB.

The file looks like

vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1
vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 64944      CORR
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1
vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4

What I have is

cat /tmp/file |awk '{print; sum+=$2;if (sum>=1024000000){t=int(sum/1024000000)*1024000000; printf RS}}'

However it does not split them correctly

vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1

vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 64944      CORR
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1

vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4

Any help please

something like that?

awk '{sum+=$2; if(sum>=1024000000) {sum=$2; print RS}}1' myFile

This works but why the 2 lines in between? One would suffice

drop the RS.

Here is a solution the this problem that uses the metropolis algorithm to maximize the utilization of each group (ie get each group as close to the 1TB limit as possible).

function score_solution(show_result) {
    st=ut=gc=0
    for (i=1;i<=n;i++) {
       if(st+F > s) {
           ut+=st; st=0; gc++
           if (show_result) print ""
       }
       if (show_result) print S
       st+=F
    }
    if (show_result) printf "\nGroups: %d Utilisation: %f\n", gc+(st?1:0), ut/(gc*s) > "/dev/stderr"
    return ut/(gc*s);
}
{ F[$0]=$2>s?s:$2; S[++n]=$0; ts+=$2}
END {
    srand()
    if(ts > s) {
       t = 10000;
       sc = oscore = -1;
       while (t > 0.003 && sc < 1.0) {
          while (oscore == sc) {
              a = int(rand() * n) + 1
              b = int(rand() * n) + 1
              # Swap two random lines
              tn=S[a]
              S[a]=S
              S=tn
              sc = score_solution(0)
           }
           t *= 0.9995;
           diff = sc - oscore;

           if (oscore == -1 || diff > 0 ||
            (((double)rand()) > exp(diff/-t))) {
                # Yes, have this one
                oscore = sc;
            } else {
                 # No thanks, swap them back
                 sc = oscore
                 S=S[a]
                 S[a]=tn
            }
        }
    }
    score_solution(1)
}' infile

One output for the given datafile:

vf_MTLHQNASF07_Wkgp2 1242622044 10605AW1

vf_MTLHQNASF07_Wkgp2 479504596  10604AW1
vf_MTLHQNASF07_Wkgp2 87813204   10607AW1
vf_MTLHQNASF07_Wkgp2 15423520   CS4AW2
vf_MTLHQNASF07_Wkgp2 8396       ANNUAL
vf_MTLHQNASF07_Wkgp2 305420     ADMIN
vf_MTLHQNASF07_Wkgp2 185218932  CS4AW1
vf_MTLHQNASF07_Wkgp2 4896484    Board Material
vf_MTLHQNASF07_Wkgp2 31712      10607AW2
vf_MTLHQNASF07_Wkgp2 187428400  10601AW1
vf_MTLHQNASF07_Wkgp2 63368560   CS4AW3

vf_MTLHQNASF07_Wkgp2 19940      10605AID
vf_MTLHQNASF07_Wkgp2 2735968    CS4AW4
vf_MTLHQNASF07_Wkgp2 412696     10605AWP
vf_MTLHQNASF07_Wkgp2 4992       CMONGEAU
vf_MTLHQNASF07_Wkgp2 94668      BO_Reports
vf_MTLHQNASF07_Wkgp2 64944      CORR

Groups: 3 Utilisation: 1.000000