Skip to content

Commit

Permalink
merging branches
Browse files Browse the repository at this point in the history
  • Loading branch information
fit-alessandro-berti committed Sep 21, 2021
2 parents d58d34f + a5a2409 commit 63bb8d2
Show file tree
Hide file tree
Showing 19 changed files with 140 additions and 60 deletions.
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,38 @@
# PM4Py Changelog

## PM4Py 2.2.13.1

### Fixed

* 816fb4ad
* Fixed a bug in the Pandas case size filter (the constraints were not applied correctly).
* 40f142c4
* Fixed a bug in the format_dataframe function (columns were duplicated if already existing with the same name).
* 00d1a7de
* Reverted stream converter to old variant (in a slightly slower but safer way).

### Removed

### Deprecated

### Changed
* 991a09d4
* Introduce a time limit in the DFG playout.
* ae5d2a07
* Return the state of the process tree along with the alignment for the process tree alignments.
* 8b77384f
* Refactoring of the calculation of the fitness for Petri net alignments (scattered code).

### Added

### Other
* d58d34fd
* Upgraded Dockerfile to Python 3.9
* 50114175
* Resolved issue with the upcoming Python 3.10 release
* 89314905
* Security issue in requirements

## PM4Py 2.2.13 (2021.09.03)

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ RUN apt-get -y install libtool flex bison pkg-config g++ libssl-dev automake
RUN apt-get -y install libjemalloc-dev libboost-dev libboost-filesystem-dev libboost-system-dev libboost-regex-dev python3-dev autoconf flex bison cmake
RUN apt-get -y install libxml2-dev libxslt-dev libfreetype6-dev libsuitesparse-dev
RUN pip install -U wheel six pytest
RUN pip install backcall==0.2.0 colorama==0.4.4 cycler==0.10.0 decorator==5.0.9 deprecation==2.1.0 graphviz==0.17 intervaltree==3.1.0 ipython==7.27.0 jedi==0.18.0 jinja2==3.0.1 joblib==1.0.1 jsonpickle==2.0.0 kiwisolver==1.3.2 lxml==4.6.3 MarkupSafe==2.0.1 matplotlib==3.5.0b1 matplotlib-inline==0.1.3 mpmath==1.2.1 networkx==2.6.2 numpy==1.21.2 packaging==21.0 pandas==1.3.2 parso==0.8.2 pickleshare==0.7.5 pillow==8.3.2 prompt-toolkit==3.0.20 pulp==2.1 pydotplus==2.0.2 pygments==2.10.0 pyparsing==3.0.0b3 python-dateutil==2.8.2 pytz==2021.1 pyvis==0.1.9 scikit-learn==1.0rc1 scipy==1.7.1 setuptools==58.0.3 six==1.16.0 sortedcontainers==2.4.0 stringdist==1.0.9 sympy==1.8 threadpoolctl==2.2.0 tqdm==4.62.2 traitlets==5.1.0 wcwidth==0.2.5
RUN pip install backcall==0.2.0 colorama==0.4.4 cycler==0.10.0 decorator==5.1.0 deprecation==2.1.0 graphviz==0.17 intervaltree==3.1.0 ipython==7.27.0 jedi==0.18.0 jinja2==3.0.1 joblib==1.0.1 jsonpickle==2.0.0 kiwisolver==1.3.2 lxml==4.6.3 MarkupSafe==2.0.1 matplotlib==3.5.0b1 matplotlib-inline==0.1.3 mpmath==1.2.1 networkx==2.6.3 numpy==1.21.2 packaging==21.0 pandas==1.3.3 parso==0.8.2 pickleshare==0.7.5 pillow==8.3.2 prompt-toolkit==3.0.20 pulp==2.1 pydotplus==2.0.2 pygments==2.10.0 pyparsing==3.0.0rc1 python-dateutil==2.8.2 pytz==2021.1 pyvis==0.1.9 scikit-learn==1.0rc2 scipy==1.7.1 setuptools==58.0.4 six==1.16.0 sortedcontainers==2.4.0 stringdist==1.0.9 sympy==1.8 threadpoolctl==2.2.0 tqdm==4.62.2 traitlets==5.1.0 wcwidth==0.2.5
RUN pip install cvxopt

COPY . /app
Expand Down
12 changes: 6 additions & 6 deletions README.THIRD_PARTY.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ libraries are added/removed.
| colorama | https://pypi.org/project/colorama | BSD License (BSD) | 0.4.4 |
| cvxopt | https://pypi.org/project/cvxopt | GNU General Public License v3 (GPLv3) (GNU GPL version 3) | 1.2.6 |
| cycler | https://pypi.org/project/cycler | BSD | 0.10.0 |
| decorator | https://pypi.org/project/decorator | BSD License (new BSD License) | 5.0.9 |
| decorator | https://pypi.org/project/decorator | BSD License (new BSD License) | 5.1.0 |
| deprecation | https://pypi.org/project/deprecation | Apache Software License (Apache 2) | 2.1.0 |
| graphviz | https://pypi.org/project/graphviz | MIT License (MIT) | 0.17 |
| intervaltree | https://pypi.org/project/intervaltree | Apache Software License (Apache License, Version 2.0) | 3.1.0 |
Expand All @@ -25,24 +25,24 @@ libraries are added/removed.
| matplotlib | https://pypi.org/project/matplotlib | Python Software Foundation License (PSF) | 3.5.0b1 |
| matplotlib-inline | https://pypi.org/project/matplotlib-inline | BSD 3-Clause | 0.1.3 |
| mpmath | https://pypi.org/project/mpmath | BSD License (BSD) | 1.2.1 |
| networkx | https://pypi.org/project/networkx | BSD License | 2.6.2 |
| networkx | https://pypi.org/project/networkx | BSD License | 2.6.3 |
| numpy | https://pypi.org/project/numpy | BSD License (BSD) | 1.21.2 |
| packaging | https://pypi.org/project/packaging | Apache Software License, BSD License (BSD-2-Clause or Apache-2.0) | 21.0 |
| pandas | https://pypi.org/project/pandas | BSD License (BSD-3-Clause) | 1.3.2 |
| pandas | https://pypi.org/project/pandas | BSD License (BSD-3-Clause) | 1.3.3 |
| parso | https://pypi.org/project/parso | MIT License (MIT) | 0.8.2 |
| pickleshare | https://pypi.org/project/pickleshare | MIT License (MIT) | 0.7.5 |
| pillow | https://pypi.org/project/pillow | Historical Permission Notice and Disclaimer (HPND) (HPND) | 8.3.2 |
| prompt-toolkit | https://pypi.org/project/prompt-toolkit | BSD License | 3.0.20 |
| pulp | https://pypi.org/project/pulp | BSD License | 2.1 |
| pydotplus | https://pypi.org/project/pydotplus | MIT License (UNKNOWN) | 2.0.2 |
| pygments | https://pypi.org/project/pygments | BSD License (BSD License) | 2.10.0 |
| pyparsing | https://pypi.org/project/pyparsing | MIT License (MIT License) | 3.0.0b3 |
| pyparsing | https://pypi.org/project/pyparsing | MIT License (MIT License) | 3.0.0rc1 |
| python-dateutil | https://pypi.org/project/python-dateutil | Apache Software License, BSD License (Dual License) | 2.8.2 |
| pytz | https://pypi.org/project/pytz | MIT License (MIT) | 2021.1 |
| pyvis | https://pypi.org/project/pyvis | BSD | 0.1.9 |
| scikit-learn | https://pypi.org/project/scikit-learn | OSI Approved (new BSD) | 1.0rc1 |
| scikit-learn | https://pypi.org/project/scikit-learn | OSI Approved (new BSD) | 1.0rc2 |
| scipy | https://pypi.org/project/scipy | BSD License (BSD) | 1.7.1 |
| setuptools | https://pypi.org/project/setuptools | MIT License | 58.0.3 |
| setuptools | https://pypi.org/project/setuptools | MIT License | 58.0.4 |
| six | https://pypi.org/project/six | MIT License (MIT) | 1.16.0 |
| sortedcontainers | https://pypi.org/project/sortedcontainers | Apache Software License (Apache 2.0) | 2.4.0 |
| stringdist | https://pypi.org/project/stringdist | MIT License (MIT) | 1.0.9 |
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# The short X.Y version
version = '2.2'
# The full version, including alpha/beta/rc tags
release = '2.2.13'
release = '2.2.13.1'

# -- General configuration ---------------------------------------------------

Expand Down
6 changes: 4 additions & 2 deletions examples/alignment_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@ def execute_script():
else:
model_cost_function[t] = 1

alignments = ali.petri_net.algorithm.apply(log, net, marking, fmarking)
print(alignments)
alignments = []
for trace in log:
alignments.append(align(trace, net, marking, fmarking, model_cost_function, sync_cost_function))

pretty_print_alignments(alignments)


Expand Down
6 changes: 5 additions & 1 deletion pm4py/algo/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,9 @@
You should have received a copy of the GNU General Public License
along with PM4Py. If not, see <https://www.gnu.org/licenses/>.
'''
from pm4py.algo import discovery, conformance, reduction, analysis, evaluation, simulation, comparison, decision_mining, organizational_mining, transformation
from pm4py.algo import discovery, conformance, reduction, analysis, evaluation, simulation, comparison, organizational_mining, transformation

import pkgutil

if pkgutil.find_loader("sklearn"):
from pm4py.algo import decision_mining
51 changes: 39 additions & 12 deletions pm4py/algo/conformance/alignments/petri_net/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ class Parameters(Enum):
VARIANTS_IDX = "variants_idx"
SHOW_PROGRESS_BAR = "show_progress_bar"
CORES = 'cores'
BEST_WORST_COST_INTERNAL = "best_worst_cost_internal"
FITNESS_ROUND_DIGITS = "fitness_round_digits"


DEFAULT_VARIANT = Variants.VERSION_STATE_EQUATION_A_STAR
Expand Down Expand Up @@ -116,9 +118,43 @@ def apply_trace(trace, petri_net, initial_marking, final_marking, parameters=Non
"""
if parameters is None:
parameters = copy({PARAMETER_CONSTANT_ACTIVITY_KEY: DEFAULT_NAME_KEY})
return exec_utils.get_variant(variant).apply(trace, petri_net, initial_marking, final_marking,

parameters = copy(parameters)
best_worst_cost = exec_utils.get_param_value(Parameters.BEST_WORST_COST_INTERNAL, parameters,
__get_best_worst_cost(petri_net, initial_marking, final_marking, variant, parameters))

ali = exec_utils.get_variant(variant).apply(trace, petri_net, initial_marking, final_marking,
parameters=parameters)

trace_cost_function = exec_utils.get_param_value(Parameters.PARAM_TRACE_COST_FUNCTION, parameters, [])
# Instead of using the length of the trace, use the sum of the trace cost function
trace_cost_function_sum = sum(trace_cost_function)

ltrace_bwc = trace_cost_function_sum + best_worst_cost

fitness = 1 - (ali['cost'] // align_utils.STD_MODEL_LOG_MOVE_COST) / (
ltrace_bwc // align_utils.STD_MODEL_LOG_MOVE_COST) if ltrace_bwc > 0 else 0

# other possibility: avoid integer division but proceed to rounding.
# could lead to small differences with respect to the adopted-since-now fitness
# (since it is rounded)

"""
initial_trace_cost_function = exec_utils.get_param_value(Parameters.PARAM_TRACE_COST_FUNCTION, parameters, None)
initial_model_cost_function = exec_utils.get_param_value(Parameters.PARAM_MODEL_COST_FUNCTION, parameters, None)
initial_sync_cost_function = exec_utils.get_param_value(Parameters.PARAM_SYNC_COST_FUNCTION, parameters, None)
uses_standard_cost_function = initial_trace_cost_function is None and initial_model_cost_function is None and \
initial_sync_cost_function is None
fitness = 1 - ali['cost'] / ltrace_bwc if ltrace_bwc > 0 else 0
fitness_round_digits = exec_utils.get_param_value(Parameters.FITNESS_ROUND_DIGITS, parameters, 3)
fitness = round(fitness, fitness_round_digits)
"""

ali["fitness"] = fitness

return ali


def apply_log(log, petri_net, initial_marking, final_marking, parameters=None, variant=DEFAULT_VARIANT):
"""
Expand Down Expand Up @@ -161,6 +197,7 @@ def apply_log(log, petri_net, initial_marking, final_marking, parameters=None, v
best_worst_cost = __get_best_worst_cost(petri_net, initial_marking, final_marking, variant, parameters)
variants_idxs, one_tr_per_var = __get_variants_structure(log, parameters)
progress = __get_progress_bar(len(one_tr_per_var), parameters)
parameters[Parameters.BEST_WORST_COST_INTERNAL] = best_worst_cost

all_alignments = []
for trace in one_tr_per_var:
Expand All @@ -172,7 +209,6 @@ def apply_log(log, petri_net, initial_marking, final_marking, parameters=None, v
progress.update()

alignments = __form_alignments(log, variants_idxs, all_alignments)
__assign_fitness(log, alignments, best_worst_cost)
__close_progress_bar(progress)

return alignments
Expand Down Expand Up @@ -207,6 +243,7 @@ def apply_multiprocessing(log, petri_net, initial_marking, final_marking, parame

best_worst_cost = __get_best_worst_cost(petri_net, initial_marking, final_marking, variant, parameters)
variants_idxs, one_tr_per_var = __get_variants_structure(log, parameters)
parameters[Parameters.BEST_WORST_COST_INTERNAL] = best_worst_cost

all_alignments = []
with ProcessPoolExecutor(max_workers=num_cores) as executor:
Expand All @@ -229,7 +266,6 @@ def apply_multiprocessing(log, petri_net, initial_marking, final_marking, parame
__close_progress_bar(progress)

alignments = __form_alignments(log, variants_idxs, all_alignments)
__assign_fitness(log, alignments, best_worst_cost)

return alignments

Expand Down Expand Up @@ -281,15 +317,6 @@ def __form_alignments(log, variants_idxs, all_alignments):
return alignments


def __assign_fitness(log, alignments, best_worst_cost):
for index, align in enumerate(alignments):
ltrace_bwc = len(log[index]) + best_worst_cost
if ltrace_bwc > 0:
align['fitness'] = 1 - ((align['cost'] // align_utils.STD_MODEL_LOG_MOVE_COST) / ltrace_bwc)
else:
align['fitness'] = 0


def __close_progress_bar(progress):
if progress is not None:
progress.close()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,7 @@ def get_best_worst_cost(petri_net, initial_marking, final_marking, parameters=No

best_worst = apply(trace, petri_net, initial_marking, final_marking, parameters=parameters)

if best_worst['cost'] > 0:
return best_worst['cost'] // align_utils.STD_MODEL_LOG_MOVE_COST
return 0
return best_worst['cost']


def apply_from_variants_list_petri_string(var_list, petri_net_string, parameters=None):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,7 @@ def get_best_worst_cost(petri_net, initial_marking, final_marking, parameters=No

best_worst = apply(trace, petri_net, initial_marking, final_marking, parameters=parameters)

if best_worst['cost'] > 0:
return best_worst['cost'] // utils.STD_MODEL_LOG_MOVE_COST
return 0
return best_worst['cost']


def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) -> typing.AlignmentResult:
Expand Down Expand Up @@ -106,7 +104,6 @@ def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_mar
if parameters is None:
parameters = {}

parameters = copy(parameters)
activity_key = exec_utils.get_param_value(Parameters.ACTIVITY_KEY, parameters, DEFAULT_NAME_KEY)
trace_cost_function = exec_utils.get_param_value(Parameters.PARAM_TRACE_COST_FUNCTION, parameters, None)
model_cost_function = exec_utils.get_param_value(Parameters.PARAM_MODEL_COST_FUNCTION, parameters, None)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,7 @@ def get_best_worst_cost(petri_net, initial_marking, final_marking, parameters=No

best_worst = apply(trace, petri_net, initial_marking, final_marking, parameters=parameters)

if best_worst['cost'] > 0:
return best_worst['cost'] // utils.STD_MODEL_LOG_MOVE_COST
return 0
return best_worst['cost']


def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) -> typing.AlignmentResult:
Expand Down Expand Up @@ -130,7 +128,6 @@ def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_mar
if parameters is None:
parameters = {}

parameters = copy(parameters)
activity_key = exec_utils.get_param_value(Parameters.ACTIVITY_KEY, parameters, DEFAULT_NAME_KEY)
trace_cost_function = exec_utils.get_param_value(Parameters.PARAM_TRACE_COST_FUNCTION, parameters, None)
model_cost_function = exec_utils.get_param_value(Parameters.PARAM_MODEL_COST_FUNCTION, parameters, None)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,7 @@ def get_best_worst_cost(petri_net, initial_marking, final_marking, parameters=No

best_worst = apply(trace, petri_net, initial_marking, final_marking, parameters=parameters)

if best_worst['cost'] > 0:
return best_worst['cost'] // utils.STD_MODEL_LOG_MOVE_COST
return 0
return best_worst['cost']


def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) -> typing.AlignmentResult:
Expand Down Expand Up @@ -117,7 +115,6 @@ def apply(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_mar
if parameters is None:
parameters = {}

parameters = copy(parameters)
activity_key = exec_utils.get_param_value(Parameters.ACTIVITY_KEY, parameters, DEFAULT_NAME_KEY)
trace_cost_function = exec_utils.get_param_value(Parameters.PARAM_TRACE_COST_FUNCTION, parameters, None)
model_cost_function = exec_utils.get_param_value(Parameters.PARAM_MODEL_COST_FUNCTION, parameters, None)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def _construct_result_dictionary(state: SGASearchState, variant: List[str]):
current_state = current_state.parent
result['alignment'] = list(reversed(alignment))
result['optimal'] = True
result['state'] = state
return result


Expand Down
12 changes: 5 additions & 7 deletions pm4py/algo/filtering/pandas/cases/case_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def filter_on_ncases(df: pd.DataFrame, case_id_glue: str = constants.CASE_CONCEP
return ret


def filter_on_case_size(df: pd.DataFrame, case_id_glue: str = "case:concept:name", min_case_size: int = 2, max_case_size=None):
def filter_on_case_size(df0: pd.DataFrame, case_id_glue: str = "case:concept:name", min_case_size: int = 2, max_case_size=None):
"""
Filter a dataframe keeping only traces with at least the specified number of events
Expand All @@ -77,14 +77,12 @@ def filter_on_case_size(df: pd.DataFrame, case_id_glue: str = "case:concept:name
df
Filtered dataframe
"""
df = df0.copy()
element_group_size = df[case_id_glue].groupby(df[case_id_glue]).transform('size')
df = df[element_group_size >= min_case_size]
if max_case_size:
element_group_size = df[case_id_glue].groupby(df[case_id_glue]).transform('size')
ret = df[element_group_size <= max_case_size]
else:
ret = df
ret.attrs = copy(df.attrs) if hasattr(df, 'attrs') else {}
if max_case_size is not None:
df = df[element_group_size <= max_case_size]
df.attrs = copy(df0.attrs) if hasattr(df0, 'attrs') else {}
return df


Expand Down
18 changes: 14 additions & 4 deletions pm4py/algo/simulation/playout/dfg/variants/classic.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,16 @@
import datetime
import heapq
import math
import sys
import time
from collections import Counter
from copy import deepcopy
from enum import Enum
from typing import Optional, Dict, Any, Union, Tuple

from pm4py.objects.log.obj import EventLog, Trace, Event
from pm4py.objects.log.obj import EventLog
from pm4py.objects.log.obj import Trace, Event
from pm4py.util import exec_utils, constants, xes_constants
from typing import Optional, Dict, Any, Union, Tuple
from pm4py.objects.log.obj import EventLog, EventStream


class Parameters(Enum):
Expand All @@ -36,6 +38,7 @@ class Parameters(Enum):
INTERRUPT_SIMULATION_WHEN_DFG_COMPLETE = "interrupt_simulation_when_dfg_complete"
ADD_TRACE_IF_TAKES_NEW_ELS_TO_DFG = "add_trace_if_takes_new_els_to_dfg"
RETURN_VARIANTS = "return_variants"
MAX_EXECUTION_TIME = "max_execution_time"


def get_node_tr_probabilities(dfg, start_activities, end_activities):
Expand Down Expand Up @@ -139,7 +142,9 @@ def get_traces(dfg, start_activities, end_activities, parameters=None):
heapq.heappush(partial_traces, (prob - prob_new_act, tuple(trace[1] + [new_act])))


def apply(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) -> Union[EventLog, Dict[Tuple[str, str], int]]:
def apply(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int],
parameters: Optional[Dict[Union[str, Parameters], Any]] = None) -> Union[
EventLog, Dict[Tuple[str, str], int]]:
"""
Applies the playout algorithm on a DFG, extracting the most likely traces according to the DFG
Expand Down Expand Up @@ -186,6 +191,7 @@ def apply(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end
add_trace_if_takes_new_els_to_dfg = exec_utils.get_param_value(Parameters.ADD_TRACE_IF_TAKES_NEW_ELS_TO_DFG,
parameters, False)
return_variants = exec_utils.get_param_value(Parameters.RETURN_VARIANTS, parameters, False)
max_execution_time = exec_utils.get_param_value(Parameters.MAX_EXECUTION_TIME, parameters, sys.maxsize)

# keep track of the DFG, start activities and end activities of the (ongoing) simulation
simulated_traces_dfg = set()
Expand All @@ -196,10 +202,14 @@ def apply(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end

final_traces = []

start_time = time.time()
for tr, p in get_traces(dfg, start_activities, end_activities, parameters=parameters):
if (interrupt_simulation_when_dfg_complete and interrupt_break_condition) or not (
len(final_traces) < max_no_variants and overall_probability <= min_weighted_probability):
break
current_time = time.time()
if (current_time - start_time) > max_execution_time:
break
overall_probability += p
diff_sa = {tr[0]}.difference(simulated_traces_sa)
diff_ea = {tr[-1]}.difference(simulated_traces_ea)
Expand Down
Loading

0 comments on commit 63bb8d2

Please sign in to comment.