Merging two files each contain 16 lakh lines on HP-UX 11.11 system

Phani369 · March 13, 2016, 11:20am

Hello All ,

I am trying to merge two files each contain 16 lakh lines ..My requirement is i have merge after every 14 lines of each file .

Like from file1 14 lines then after after 14 lines form file2 ..so i wrote below script .

It is working for small files ,but large files script not running and PID of script get killed automatically after merging 90000 files ..

Please anyone help me inthis ,anyway that i can merge it in quick time .

#!/bin/sh
touch test
size=`ls -lrt file1 | awk '{print $5}'`
while [ "$size" -gt 0 ]
do
    sed -n 1,14 file1 >> test
    sed  '1,14d' file1 >testing1.tmp
    cat testing1.tmp >file1
    rm testing1.tmp
    sed -n 1,14p file2 >> test
    sed  '1,14d' file2 >testing2.tmp
    cat testing2.tmp >file2
    rm testing2.tmp
    size=`ls -lrt file1 | awk '{print $5}'`
done
echo " Files merged successfully "

RudiC · March 13, 2016, 11:38am

How about

awk '1; !(NR%14) {for (i=1; i<=14; i++)  {getline < F2; print}}' F2=file2 file1

Phani369 · March 14, 2016, 2:19pm

#!/usr/bin/awk -f
BEGIN {
File2 = ARGV[2]
    --ARGC
    }
    {
        print
            if((FNR % 14) == 0) {
                for(n=1; n<=14; ++n) {
                            getline <File2
                                    print
                                        }
                                            }
                                            }
                                            END {
                                                while (getline <File2)  print
                                                }

Above code is working for small sized files but coming to big size files it merging in zigzag manner .. Can anyone suggest which script works here

RudiC · March 14, 2016, 2:26pm

What is "zigzag manner"?

Phani369 · March 14, 2016, 2:53pm

lines are repeated so many times by suppressing trailing lines.. like 0 1 2 3 4 5 5 5 5 5 5 10 11 12 12 12 ..like this i am receiving output ..

RudiC · March 14, 2016, 3:03pm

I neglected error checking. Might be file2 is at its end, and getline failed repeatedly. This doesn't depend on the files' sizes but stems from file length unbalance. Try

awk '
function FLOK() {return (1 == getline < F2)}
1
!(NR%14)        {for (; (++i)%15 && FLOK();) print}
END             {while (FLOK()) print}
' F2=file2 file1

Phani369 · March 14, 2016, 3:48pm

I am receiving same error .. it is merging good upto 1000 lines then it is starting wrong merging .. Lines are getting repeated and some line getting supressed:confused:

RudiC · March 14, 2016, 4:33pm

How many lines in file1, and in file2?

---------- Post updated at 21:33 ---------- Previous update was at 21:23 ----------

I did a test with roundabout 3000 lines, and one file 100 lines longer than the other. It worked.

Phani369 · March 15, 2016, 4:24pm

It is working fine upto 5000 lines (not sure) ,but in my case each file contain more than 15000 lines.

File =16739
File2 = 17512

Lenght of each line is below 100 characters

Can you suggest me for this much large file size:(

MadeInGermany · March 15, 2016, 5:07pm

Try this shell script:

#!/bin/ksh
while :
do
  for descriptor in 3 4
  do
    for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14
    do
      IFS= read -r line || break 3
      printf "%s\n" "$line"
    done <&$descriptor
  done
done 3< file1 4< file2

The outer loop opens the two files via descriptors 3 and 4 and does never end.
The 2nd loop toggles between the the two descriptors.
The 3rd loop reads 14 lines from the current descriptor.
The break 3 breaks out from the 3 nested loops
If the output looks good, you can redirect the whole stuff to a file

...
done 3< file1 4< file2 > file3

looney · March 16, 2016, 7:48am

Hi Phani,

can we split src file in 16 files each with 1 lakh records and then process each one by one with codes suggested here..?