How to remove answers and explanation parts from a text file with lots of questions.?

Hi,
I have a text file with thousands of questions in it. Each question (multiple lines) with multiple choice options, Answer and Explanation (optional). I need to delete Answer & explanation parts for all Questions and insert a blank line before net question. Each question starts with NO.

I tried to do that with egrep as shown below, it mostly works except for extracting only the first line (as egrep matches only first line of each question) of Q. Please advise, thanks!

root@TESTBOX:~/# cat test.txt
NO.13 Darius is analysing IDS logs. During the investigation, he noticed that there was nothing
suspicious found and an alert was triggered on normal web application traffic. He can mark this alert
as:
A. False-Negative
B. False-Positive
C. True-Positive
D. False-Signature
Answer: A
NO.14 What is the proper response for a NULL scan if the port is closed?
A. SYN
B. ACK
C. FIN
D. PSH
E. RST
F. No response
Answer: E
NO.15 The Open Web Application Security Project (OWASP) is the worldwide not-for-profit
charitable organization focused on improving the security of software. What item is the primary
concern on OWASP's Top Ten Project Most Critical Web Application Security Risks?
A. Injection
B. Cross Site Scripting
C. Cross Site Request Forgery
D. Path disclosure
Answer: A
Explanation
IT Certification Guaranteed, The Easy Way!
4
The top item of the OWASP 2013 OWASP's Top Ten Project Most Critical Web Application Security
Risks is injection.
Injection flaws, such as SQL, OS, and LDAP injection occur when untrusted data is sent to an
interpreter as part of a command or query. The attacker's hostile data can trick the interpreter into
executing unintended commands or accessing data without proper authorization.
References: https://www.owasp.org/index.php/Top_10_2013-Top_10
NO.16 A recent security audit revealed that there were indeed several occasions that the company's
network was breached. After investigating, you discover that your IDS is not configured properly and
therefore is unable to trigger alarms when needed. What type of alert is the IDS giving?
A. True Positive
B. False Negative
C. False Positive
D. False Positive
Answer: B
Explanation
New questions
NO.17 A Network Administrator was recently promoted to Chief Security Officer at a local university.
One of employee's new responsibilities is to manage the implementation of an RFID card access
system to a new server room on campus. The server room will house student enrollment information
that is securely backed up to an off-site location.
During a meeting with an outside consultant, the Chief Security Officer explains that he is concerned
that the existing security controls have not been designed properly. Currently, the Network
Administrator is responsible for approving and issuing RFID card access to the server room, as well as
reviewing the electronic access logs on a weekly basis.
Which of the following is an issue with the situation?
A. Segregation of duties
B. Undue influence
C. Lack of experience
D. Inadequate disaster recovery plan
Answer: A

What I tried...

root@TESTBOX:~/# egrep -E '^A\.|^B\.|^C\.|^D\.|^E\.|^F\.|^G\.|^NO\.' test.txt
NO.13 Darius is analysing IDS logs. During the investigation, he noticed that there was nothing
A. False-Negative
B. False-Positive
C. True-Positive
D. False-Signature
NO.14 What is the proper response for a NULL scan if the port is closed?
A. SYN
B. ACK
C. FIN
D. PSH
E. RST
F. No response
NO.15 The Open Web Application Security Project (OWASP) is the worldwide not-for-profit
A. Injection
B. Cross Site Scripting
C. Cross Site Request Forgery
D. Path disclosure
NO.16 A recent security audit revealed that there were indeed several occasions that the company's
A. True Positive
B. False Negative
C. False Positive
D. False Positive
NO.17 A Network Administrator was recently promoted to Chief Security Officer at a local university.
A. Segregation of duties
B. Undue influence
C. Lack of experience
D. Inadequate disaster recovery plan
root@TESTBOX:~/# 

The problem I have is trying to match multiple lines using sed or egrep, so I did not quickly find a way to filter out the Explanation text. The rest was easy (plus I removed the stray number and the "New questions" phrase):

sed 's/NO./\nNO./' test.txt |egrep -v '^Answer|^Explanation|^New questions|^[0-9]'

Leaving an incomplete solution because of the requirement for a multiline regex match:

ubuntu# sed 's/NO./\nNO./' text.txt |egrep -v '^Answer|^Explanation|^New questions|^[0-9]'

NO.13 Darius is analysing IDS logs. During the investigation, he noticed that there was nothing
suspicious found and an alert was triggered on normal web application traffic. He can mark this alert
as:
A. False-Negative
B. False-Positive
C. True-Positive
D. False-Signature

NO.14 What is the proper response for a NULL scan if the port is closed?
A. SYN
B. ACK
C. FIN
D. PSH
E. RST
F. No response

NO.15 The Open Web Application Security Project (OWASP) is the worldwide not-for-profit
charitable organization focused on improving the security of software. What item is the primary
concern on OWASP's Top Ten Project Most Critical Web Application Security Risks?
A. Injection
B. Cross Site Scripting
C. Cross Site Request Forgery
D. Path disclosure
IT Certification Guaranteed, The Easy Way!
The top item of the OWASP 2013 OWASP's Top Ten Project Most Critical Web Application Security
Risks is injection.
Injection flaws, such as SQL, OS, and LDAP injection occur when untrusted data is sent to an
interpreter as part of a command or query. The attacker's hostile data can trick the interpreter into
executing unintended commands or accessing data without proper authorization.
References: https://www.owasp.org/index.php/Top_10_2013-Top_10

NO.16 A recent security audit revealed that there were indeed several occasions that the company's
network was breached. After investigating, you discover that your IDS is not configured properly and
therefore is unable to trigger alarms when needed. What type of alert is the IDS giving?
A. True Positive
B. False Negative
C. False Positive
D. False Positive

NO.17 A Network Administrator was recently promoted to Chief Security Officer at a local university.
One of employee's new responsibilities is to manage the implementation of an RFID card access
system to a new server room on campus. The server room will house student enrollment information
that is securely backed up to an off-site location.
During a meeting with an outside consultant, the Chief Security Officer explains that he is concerned
that the existing security controls have not been designed properly. Currently, the Network
Administrator is responsible for approving and issuing RFID card access to the server room, as well as
reviewing the electronic access logs on a weekly basis.
Which of the following is an issue with the situation?
A. Segregation of duties
B. Undue influence
C. Lack of experience
D. Inadequate disaster recovery plan

In practice, I would do this in PERL or PHP because of the required multiline matches; but I'm sure someone else can do a much better command line than me.

1 Like

The task seems easy: delete the section between "Answer:" and "NO."
Doable with sed by means of the N command and a loop. But sed has a portability issue regarding N on the last line.
So awk is the first choice here.

awk '/^Answer:/{del=1} /^NO\./{del=0; print ""} del==0' test.txt

del==0 is true if del is not initialized. A true without action defaults to {print} .

2 Likes

To avoid an initial newline one can test for NR>1 or del==1 :

awk '/^Answer:/{del=1} (del==1 && /^NO\./){del=0; print ""} del==0' test.txt

Here comes a portable sed solution:

sed '/^Answer:/{
  :Loop
  $d; N; /\nNO\./!bLoop
  s/.*\(\n\)/\1/
}' test.txt

GNU sed needs $d , to not default-print the last "Answer:" section.
A multi-liner easiliy supports a Unix sed.

2 Likes

Awesome, thanks :slight_smile: