Hello
I have a big excel file for Ticket Data Analysis. The idea is to make meaningful insight from Resolution Field. Now as people write whatever they feel like while resolving the ticket it makes quite a task.
- They may or may not tag it with something like below within the resolution field
Problem:
Analysis:
Resolution:
So I am suppose to pick the write ups after Resolution: tag and Put it in a new excel file with the ticket number.
Now the problem starts because people write their own tagging mechanism. Sometimes they Skip Problem and Analysis and just write Steps etc. It is very random and no common factors.
After going through lot of data we figured out the following tags which are most common.
So logic is
If we find the following tags :
pick after it
elif we find the following Tags:
pick after it.
The elseif condition can sometimes be true in if condition. So if statement gets presidence
else
pick the whole field
in the if statements the below are common words
regularexp="r'Action Performed\:(.*)|" \
"ACTION PERFORMED\:(.*)|" \
"Steps taken to resolve the issue\:(.*)|" \
"steps taken to resolve issue\:(.*)|" \
"Steps Taken to Resolved the issue\:(.*)|" \
"Steps taken to resolve the issue \:(.*)|" \
"Steps taken to resolve\:(.*)|" \
"Steps\-(.*)|" \
"steps \-(.*)|" \
"steps\:(.*)|" \
"Steps taken\:(.*)" \
"Action taken\:(.*)|" \
"Actions Taken\:(.*)|" \
"Action\:(.*)|" \
"Action \-(.*)|" \
"Action taken \-(.*)|" \
"Actions Taken\;(.*)|" \
"Action Taken \:(.*)|" \
"Action \:(.*)|" \
"Action taken to resolve\:(.*)|" \
"Resolution\-(.*)|" \
"Resolution\:(.*)|" \
"Resolution \-(.*)|" \
"Action taken for resolution\:(.*)|" \
"Solution \:(.*)|" \
"analysis\:(.*)|" \
"analysis \:(.*)|" \
"Investigation\:(.*)" \
"observed\/investigated \:(.*)'"
ELIF:
Now if the above is not found we need to check for the following. The above gets presidence if both are found
"Update\:(.*)|" \
"Update \:(.*)|" \
"UPDATE\-(.*)|" \
"updates\:(.*)'"
Else:
Just re-write the whole statements found
I have written the following code without the elif block for now
# -*- coding: utf-8 -*-
"""
Created on Mon May 29 19:34:54 2017
@author: anirbaba
"""
from openpyxl import Workbook, load_workbook
import re
import xlsxwriter
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'IncidentID')
worksheet.write('B1', 'Resolution')
wb=load_workbook("D:\Backup\Drive_D\W0rk\Script\Python\RegularX\Output_file_high_effort_no_pks.xlsx", read_only=True)
sheet_ranges=wb['High_effort_without_burst']
regularexp="r'Action Performed\:(.*)|" \
"ACTION PERFORMED\:(.*)|" \
"Steps taken to resolve the issue\:(.*)|" \
"steps taken to resolve issue\:(.*)|" \
"Steps Taken to Resolved the issue\:(.*)|" \
"Steps taken to resolve the issue \:(.*)|" \
"Steps taken to resolve\:(.*)|" \
"Steps\-(.*)|" \
"steps \-(.*)|" \
"steps\:(.*)|" \
"Steps taken\:(.*)" \
"Action taken\:(.*)|" \
"Actions Taken\:(.*)|" \
"Action\:(.*)|" \
"Action \-(.*)|" \
"Action taken \-(.*)|" \
"Actions Taken\;(.*)|" \
"Action Taken \:(.*)|" \
"Action \:(.*)|" \
"Action taken to resolve\:(.*)|" \
"Resolution\-(.*)|" \
"Resolution\:(.*)|" \
"Resolution \-(.*)|" \
"Action taken for resolution\:(.*)|" \
"Solution \:(.*)|" \
"analysis\:(.*)|" \
"analysis \:(.*)|" \
"Investigation\:(.*)" \
"observed\/investigated \:(.*)'"
# "Update\:(.*)|" \
# "Update \:(.*)|" \
# "UPDATE\-(.*)|" \
# "updates\:(.*)'"
i=0
for row in sheet_ranges.iter_rows(row_offset=1):
# for i in range(0,50001):
act_resolution=re.compile(regularexp, re.IGNORECASE)
act_resolutiongroup=act_resolution.search(str(row[16].value))
if act_resolutiongroup is not None:
print(row[12].value,act_resolutiongroup.group())
worksheet.write(i+1,0,row[12].value)
worksheet.write(i+1,1,act_resolutiongroup.group())
i+=1
else:
print(row[12].value,row[16].value)
worksheet.write(i+1,0,row[12].value)
worksheet.write(i+1,1,row[16].value)
i+=1
workbook.close()
# if act_resolutiongroup is None:
# print(row[12].value)
- I need help in shortening the Regular expression search for variable regularexp.
- I have seen the keyword is there but still it goes into the Else loop and just write the whole statement instead of picking it.
- Once the above is done need to run 2 Grams 3 Grams TFIDF algorithm non english non numeric (This is far fetched and not my immediate requirement)
---------- Post updated 05-31-17 at 10:29 PM ---------- Previous update was 05-30-17 at 10:54 PM ----------
Hello
Can someone shorten the below regular expression.
regularexp="r'Action Performed\:(.*)|" \
"ACTION PERFORMED\:(.*)|" \
"Steps taken to resolve the issue\:(.*)|" \
"steps taken to resolve issue\:(.*)|" \
"Steps Taken to Resolved the issue\:(.*)|" \
"Steps taken to resolve the issue \:(.*)|" \
"Steps taken to resolve\:(.*)|" \
"Steps\-(.*)|" \
"steps \-(.*)|" \
"steps\:(.*)|" \
"Steps taken\:(.*)" \
"Action taken\:(.*)|" \
"Actions Taken\:(.*)|" \
"Action\:(.*)|" \
"Action \-(.*)|" \
"Action taken \-(.*)|" \
"Actions Taken\;(.*)|" \
"Action Taken \:(.*)|" \
"Action \:(.*)|" \
"Action taken to resolve\:(.*)|" \
"Resolution\-(.*)|" \
"Resolution\:(.*)|" \
"Resolution \-(.*)|" \
"Action taken for resolution\:(.*)|" \
"Solution \:(.*)|" \
"analysis\:(.*)|" \
"analysis \:(.*)|" \
"Investigation\:(.*)" \
"observed\/investigated \:(.*)'"