Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LR with priors initial implementation #66

Merged
merged 18 commits into from
Apr 3, 2024
Merged

LR with priors initial implementation #66

merged 18 commits into from
Apr 3, 2024

Conversation

bmramor
Copy link
Collaborator

@bmramor bmramor commented Mar 12, 2024

No description provided.

@wiz-inc-6f7a9d0588
Copy link

wiz-inc-6f7a9d0588 bot commented Mar 15, 2024

Wiz Scan Summary

IaC Misconfigurations 0C 0H 0M 0L 0I
Vulnerabilities 0C 0H 0M 0L 0I
Sensitive Data 0C 0H 0M 0L 0I
Total 0C 0H 0M 0L 0I
Secrets 0🔑

Copy link
Collaborator

@SkBlaz SkBlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

outrank/algorithms/importance_estimator.py Show resolved Hide resolved
@@ -57,17 +59,22 @@ def sklearn_surrogate(
unique_values, counts = np.unique(vector_second, return_counts=True)

# Establish min support for this type of ranking.
if counts[0] < len(unique_values) * (2**5):
estimate_feature_importance = 0
# if counts[0] < len(unique_values) * (2**5):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove such comments

estimate_feature_importance = 1 + \
np.median(estimate_feature_importance_list)
else:
X = np.concatenate((X,vector_first.reshape(-1, 1)), axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a space missing after X, wondering why lint didn't catch that. @miha-jenko maybe some idea?

X = np.concatenate((X,vector_first.reshape(-1, 1)), axis=1)
X = transf.fit_transform(X)
estimate_feature_importance_list = cross_val_score(
clf, X, vector_second, scoring='neg_log_loss', cv=4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the num. of folds to top of the file as a constant for now

@@ -130,9 +130,13 @@ def mixed_rank_graph(
# Map the scoring calls to the worker pool
pbar.set_description('Allocating thread pool')

reference_model_features = {}
if 'prior' in args.heuristic:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you check for -prior at some point, but prior at some other point. Consider creating a helper function is_prior_heuristic or something, that unifies this behavior (and centralizes it)

@bmramor bmramor requested a review from SkBlaz March 19, 2024 10:42
outrank/core_ranking.py Outdated Show resolved Hide resolved
@SkBlaz SkBlaz requested a review from miha-jenko March 22, 2024 09:56
)
if args.reference_model_JSON != '':
model_combinations = extract_features_from_reference_JSON(args.reference_model_JSON, combined_features_only = True)
model_combinations = [tuple(sorted(combination.split(','))) for combination in model_combinations]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combination delimiter could be a const, as it repeats

random.shuffle(full_combination_space)
full_combination_space = full_combination_space[
: args.combination_number_upper_bound
]
if is_prior_heuristic(args):
full_combination_space = full_combination_space + [tuple for tuple in model_combinations if tuple not in full_combination_space]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this second part list(set(model_combinations).difference(set(full_combination_space)))

@@ -225,7 +244,7 @@ def compute_combined_features(
pbar.set_description('Concatenating into final frame ..')
input_dataframe = pd.concat([input_dataframe, tmp_df], axis=1)
del tmp_df

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this space

"""Given a model's JSON, extract unique features"""

with open(json_path) as jp:
content = json.load(jp)

unique_features = set()
feature_space = content['desc'].get('features', [])
if full_feature_space:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_feature_space sounds somewhat odd for a flag that computes a set

@@ -641,3 +644,10 @@ def summarize_rare_counts(
final_df.to_csv(
f'{args.output_folder}/feature_sparsity_summary.tsv', index=False, sep='\t',
)


def is_prior_heuristic(args: Any):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing return type

outrank/algorithms/importance_estimator.py Outdated Show resolved Hide resolved
outrank/algorithms/importance_estimator.py Outdated Show resolved Hide resolved
outrank/core_ranking.py Outdated Show resolved Hide resolved
outrank/core_ranking.py Show resolved Hide resolved
outrank/core_utils.py Outdated Show resolved Hide resolved
outrank/task_selftest.py Show resolved Hide resolved
@SkBlaz SkBlaz merged commit d6dc5d3 into main Apr 3, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants