Linux Gods,
I am simply attempting to parse SQL statements from a PDF doc in creating a base SQL script at a later time but for the life of me, am having a tough time extracting this data.This exact string worked perfectly a couple of months ago and now it doesnt. Below is an example of the data structure.
show parameter os_authent_prefix
SHOW PARAMETER log_archive_dest;
Audit:
SELECT AUD.POLICY_NAME, AUD.AUDIT_OPTION, AUD.AUDIT_OPTION_TYPE
FROM AUDIT_UNIFIED_POLICIES AUD, AUDIT_UNIFIED_ENABLED_POLICIES ENABLED
WHERE AUD.POLICY_NAME = ENABLED.POLICY_NAME
AND AUD.AUDIT_OPTION = 'CREATE TRIGGER'
AND AUD.AUDIT_OPTION_TYPE = 'STANDARD ACTION'
AND ENABLED.SUCCESS = 'YES'
AND ENABLED.FAILURE = 'YES'
AND ENABLED.ENABLED_OPT = 'BY'
AND ENABLED.USER_NAME = 'ALL USERS';
Other variations I have tried:
pdfgrep -i -PB 20 -A 20 "audit\:" ./Oracle-12.pdf | gawk '{IGNORECASE=1;} /show.*\;/ || /select.*\;/ {print "Here is the data \n\n",$0, "\n"}'
gawk: cmd. line:1: warning: regexp escape sequence `\;' is not a known regexp operator
Here is the data
REVOKE SELECT_ANY_DICTIONARY FROM <grantee>;
Here is the data
REVOKE SELECT ANY TABLE FROM <grantee>;
Here is the data
REVOKE SELECT_CATALOG_ROLE FROM <grantee>;
Here is the data
AUDIT SELECT ANY DICTIONARY;
I suspect something changed in a binary or two. In attempting to get past this, I have attempted various regex variations:
pdfgrep -i -PB 20 -A 20 "audit\:" ./Oracle-12.pdf | gawk '{IGNORECASE=1;} /show.*|;/ || /select.*|;/ {print "Here is the data \n\n",$0, "\n"}'
pdftotext ./Oracle-12.pdf - | grep -i "select.*\; | show.*\;"
gawk '{IGNORECASE=1;} /show.*|;/ || /select.*|;/ {print "The Goodies \n\n",$0, "\n"}' ./Oracle-12.pdf.txt
Can someone shed some light? I am using distro Kali 2020.1 which I upgrade from 2019.4 and now the original string doesnt work. Thanks