The first awk program that I posted loads all the records in your file into an Indexed Array A[++c] = $1 and in the end it performs the required operation.
This caused the program to throw Cannot allocate memory error for large files.
But in the second awk program, entire records are not loaded into any variable or array but instead checking record by record to perform the required operation.
Hence it is not a memory intensive program and works for large files.
I have used this code to print the number that comes before and after specific character
SEQ=200
awk -v S="$SEQ" '
{
A[++c] = $1
}
END {
for ( i = 1; i <= c; i++ )
{
if ( A == S )
{
print " the letter is " A
print " followed by " A[i+1]
print " comes after " A[i-1]
}
}
}
' file
result was like the following
the letter is 200
followed by 202
comes after 211
the letter is 200
followed by 223
comes after 212
the letter is 200
followed by 202
comes after 211
I need it to print counter of times that happened if it repeated rather than print it many time so the result should be something like this
2 times
the letter is 200
followed by 202
comes after 211
1 times
the letter is 200
followed by 223
comes after 212
SEQ=200
awk -v S="$SEQ" '
{
A[++c] = $1
}
END {
for ( i = 1; i <= c; i++ )
{
if ( A == S )
{
C[A,A[i+1],A[i-1]]++
V[A,A[i+1],A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
}
}
for ( k in V )
{
print C[k] " times"
print V[k]
}
}
' file
By default, the order in which a for (i in array) loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.
So you have to use an Indexed Array to help preserve the order, try this modified code:
SEQ=200
awk -v S="$SEQ" '
{
A[++c] = $1
}
END {
for ( i = 1; i <= c; i++ )
{
if ( A == S )
{
if ( !(V[A A[i+1] A[i-1]]) )
T[++j] = A A[i+1] A[i-1]
C[A A[i+1] A[i-1]]++
V[A A[i+1] A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
}
}
for ( i = 1; i <= j; i++ )
{
print C[T] " times"
print V[T]
}
}
' file
I'm sorry. I misread your requirement. I thought you want to print records in the order that you have them in your input file.
If you have gawk then you can use below code to print in descending order:
SEQ=200
gawk -v S="$SEQ" '
{
A[++c] = $1
}
END {
for ( i = 1; i <= c; i++ )
{
if ( A == S )
{
C[A,A[i+1],A[i-1]]++
V[A,A[i+1],A[i-1]] = "the letter is " A RS "followed by " A[i+1] RS "comes after " A[i-1]
}
}
for ( k in V )
{
T[++j] = C[k]
}
n = asort(T)
for ( i = n; i >= 1; i-- )
{
for ( k in V )
{
if ( C[k] == T )
{
print C[k] " times"
print V[k]
}
}
}
}
' file