How to make paste -d second file print down while looping?

bigvito19 · August 29, 2019, 6:05pm

]I would like to make the second file label 'b' print down the first file label 'a', like shifting down the file creating new lines I want it to print all the way down until the first line of the second file hit the last line of the first file. Would I have to put this into a file itself or could I do it from the terminal?

What it does now:


paste -d ' ' a b

1           a 
  2         b
  3         c
  4         d 
  5         e
  6         
  7         
  8        
  9

Desired output:
                               
1     ↓                           
2    a                      
3    b                       
4    c                       
5    d                       
6    e                      
7                              
8                               
9                              
                             

                                                                  
1                               
2                             
3    ↓                          
4    a                      
5    b                    
6    c                     
7    d                       
8    e                       
9                              
                               
                                 
        Re-looping                   
1     b                       
2   c                       
3   d
4   e                           
5                          
6                              
7             ↓                          
8    List is about to loop        
9    a

Corona688 · August 29, 2019, 6:51pm

One way:

#!/bin/bash

# Create temporary file to hold newlines
touch /tmp/$$

# Open both files to read line-by-line
exec 5<file1
exec 6<file2

# Read lines from both until file2 runs out
while read LINE <&6 && read LINE <&5
do
        :
done

# Run loop same number of times as lines left in file1.
while read LINE <&5
do
        cat /tmp/$$ file2 | paste file1 /dev/stdin
        echo >> /tmp/$$
        echo
done

cat /tmp/$$ file2 | paste file1 /dev/stdin


rm -f /tmp/$$ # Delete temporary file
# Close file descriptors
exec 5>&-
exec 6>&-

Re-looping I'm less certain of. Perhaps the files themselves should be generated at need. How large are they in a non-trivial example?

bigvito19 · August 29, 2019, 7:27pm

corona688:

One way:

#!/bin/bash

# Create temporary file to hold newlines
touch /tmp/$$

# Open both files to read line-by-line
exec 5<file1
exec 6<file2

# Read lines from both until file2 runs out
while read LINE <&6 && read LINE <&5
do
   :
done

# Run loop same number of times as lines left in file1.
while read LINE <&5
do
   cat /tmp/$$ file2 | paste file1 /dev/stdin
   echo >> /tmp/$$
   echo
done

cat /tmp/$$ file2 | paste file1 /dev/stdin


rm -f /tmp/$$ # Delete temporary file
# Close file descriptors
exec 5>&-
exec 6>&-

Re-looping I'm less certain of. Perhaps the files themselves should be generated at need. How large are they in a non-trivial example?

Thank you for the reply,

Re-looping mean just looping through the list again, the files are over 20gbs.

and how to take the space out between the files to merge them together?

rdrtx1 · August 29, 2019, 8:47pm

Just following the example posted. Using files a and b to create c loop example:

awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
for (j=0; j<c; j++) print a[j], b[(j-i)<0?(c+j-i):(j-i)];
print _;
}
}' a b > c

bigvito19 · August 29, 2019, 9:25pm

rdrtx1:

Just following the example posted. Using files a and b to create c loop example:
awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
   for (j=0; j<c; j++) print a[j], b[(j-i)<0?(c+j-i):(j-i)];
   print _;
   }
}' a b > c

Thank you for the reply,

I just want the whole second file to print through the first file. I didn't want the line to have breaks. I was just showing an example of a file completely printing through another.

And how to merge the files together instead of having a space between them?

rdrtx1 · August 29, 2019, 9:39pm

try:

awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
for (j=0; j<c; j++) {
v=j-i;
if (v < 0) v=v+c;
print a[j], b[v];
}
print _;
}
}' a b > c

or gawk instead of [ICODE]awk[/ICODE

bigvito19 · August 29, 2019, 9:58pm

rdrtx1:

try:

awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
   for (j=0; j<c; j++) {
   v=j-i;
   if (v < 0) v=v+c;
   print a[j], b[v];
   }
   print _;
   }
}' a b > c

or gawk instead of awk

Thank you for the reply,

I had go it to work, i had to go back and re-do it.

rdrtx1 · August 29, 2019, 10:09pm

try removing the print _; statement.

Does the example posted accomplish what you are looking for or set you in the right track? What is it that you had to go back and re-do to make it work? Can you post the solution you have?

bigvito19 · August 29, 2019, 10:17pm

The first example was right, I had removed one of the letters on line 5 by mistake. I'm still using the first example you gave.

I had just wanted the whole second file to print through the first file. I didn't want the line to have breaks. I was just showing an example of a file completely printing through another.

RudiC · August 30, 2019, 4:40am

Try also

for ((CNT=0; CNT<=$(wc -l < file1)-$(wc -l < file2); CNT++)); do paste -d'\0' -- file1 <(printf "%*s" $CNT '' | tr " " "\n"; cat file2); done
1a
2b
3c
4d
5e
6
7
8
9
1
2a
3b
4c
5d
6e
7
8
9
1
2
3a
4b
5c
6d
7e
8
9
1
2
3
4a
5b
6c
7d
8e
9
1
2
3
4
5a
6b
7c
8d
9e

paste -d seems not to like empty separators but ignore strange control character constructs...

bigvito19 · August 30, 2019, 8:34am

rudic:

Try also

for ((CNT=0; CNT<=$(wc -l < file1)-$(wc -l < file2); CNT++)); do paste -d'\0' -- file1 <(printf "%*s" $CNT '' | tr " " "\n"; cat file2); done
1a
2b
3c
4d
5e
6
7
8
9
1
2a
3b
4c
5d
6e
7
8
9
1
2
3a
4b
5c
6d
7e
8
9
1
2
3
4a
5b
6c
7d
8e
9
1
2
3
4
5a
6b
7c
8d
9e

paste -d seems not to like empty separators but ignore strange control character constructs...

Thank you for the reply.

bigvito19 · August 30, 2019, 10:01am

Can awk open up very big files, if it can't what else can I use?

Don_Cragun · August 30, 2019, 2:27pm

In general, awk can open large files.

If you are having problems trying to open a large file with awk , please give us details on what you are tying to do, how big the file is, what operating system (including version) you're using, and what version of awk you're using.

bigvito19 · August 30, 2019, 3:22pm

I'm trying to get the file on the right to print or generated down the file on the left without the left file printing too. I don't want the left file to print at all. I also want to merge both lines as it prints down the file. Or if there is a way to print files in opposite directions creating new lines. I don't want repeated lines.

The files I am now testing are 2.5gbs each, will try to test 40+gbs later.

I'm using xfce xubuntu 4.12

My awk version is 1.3.3

Don_Cragun · August 30, 2019, 5:05pm

I guess I wasn't clear enough in my request. Please explain what diagnostics or erroneous output you have received from awk that make you believe awk is unable to process large files on your system. I don't know of any reason (other than lack of memory) that would keep awk on xfce xubuntu from processing large files.

bigvito19 · August 30, 2019, 5:21pm

it just said line 11: 7605 Killed

I can open a file with a python script or with the paste cmd but with bigger files seems like I can't use the awk.

Don_Cragun · August 30, 2019, 6:47pm

What awk script were you running when you got the diagnostic: line 11: 7605 killed ?

How big were the two files you were having awk read into memory when you got that diagnostic.

How much memory is installed on your system?

If you were redirecting the output from your awk script to a file, what was the size of that output file and how much free space was there on the filesystem to which you were writing that file when awk was killed?

bigvito19 · August 30, 2019, 7:08pm

 awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
         for (j=0; j<c; j++) print a[j], b[(j-i)<0?(c+j-i):(j-i)];
         print _;
      }
}' a b

I was using the above code I got from the first page of this post.

The files I tried to test was 172MB each

I got enough memory to run big files.

I was directing the output to the terminal not to a file.

Don_Cragun · August 30, 2019, 7:30pm

bigvito19:

 awk '
NR==FNR {a[c++]=$0; next;}
{b[++d]=$0;}
END { for (i=0; i<c; i++) {
   for (j=0; j<c; j++) print a[j], b[(j-i)<0?(c+j-i):(j-i)];
   print _;
   }
}' a b
I was using the above code I got from the first page of this post.

The files I tried to test was 172MB each

I got enough memory to run big files.

I was directing the output to the terminal not to a file.

OK. My mistake. The diagnostic was not from awk , it was from your shell script.

What are the contents of file containing your shell script?

Was there any output from awk before you got that diagnostic, and, if there was, how much?

bigvito19 · August 30, 2019, 8:01pm

I only put 2 files in the shell script, I had put 2 very small files to test it out and I was getting outputs from that. Then when I tried 2 bigger files, it didn't have any output after that then it said killed.