-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of a Classifier that would have both WEKA and regex-based classifier in consideration #77
Comments
Are we talking about a new link classifier? I assumed it was a target page classifier. |
My bad, We're talking about a new target page classifier, was kinda sleepy -.- Forget about the link classification example. |
I was thinking in having a more generic classifier that can combine a list of any other existing classifier. For example, it would be configured like this: type: combiner
parameters:
boolean_operator: OR
classifiers:
- type: url_regex
parameters:
regular_expressions: [
"https?://www\\.somedomain\\.com/forum/.*"
".*/thread/.*",
".*/archive/index.php/t.*",
]
- type: weka
parameters:
features_file: pageclassifier.features
model_file: pageclassifier.model The key |
I like the idea. This would be very useful. |
What do you mean by complex boolean expresions? Could you give an example? |
E.g., (A AND B) OR (C AND D) I think we can start with just simple queries A AND B AND C ..., or A OR B OR C... |
Yup, this would do the trick @aecio ! Adding this format means that the one you made a month ago wouldn't be necessary anymore since this one would have everything in consideration Regarding the situation of @julianafreire , I think @aecio already implemented it if you check the regex he made a while back. And I'm not sure, but I think you can apply it in this scenario as well. For example url_regex (parameter_1 OR parameter_2) AND body_regex (parameter_1 AND parameter_2 AND parameter_3) OR weka (...) |
Got it. I think nesting type: combiner
parameters:
boolean_operator: OR
classifiers:
- type: combiner
parameters:
boolean_operator: AND
classifiers:
- type: url_regex
parameters:
regular_expressions: ["https?://www\\.somedomain\\.com/forum/.*"]
- type: weka
parameters:
features_file: model_01/pageclassifier.features
model_file: model_01/pageclassifier.model
- type: weka
parameters:
features_file: model_02/pageclassifier.features
model_file: model_02/pageclassifier.model |
@aecio , the boolean_operator inside the "classifiers" means it can pass the url_regex OR weka right? Could we still have the boolean_operator for the parameters/expressions inside the url/body/etc_regex like you did before? Not sure if you were having this in mind with that structure.. pointing it out anyway 📦 |
@1130695 boolean_operator in this case is how you combine the nested classifiers. You could nest ANY classifier like |
Title. It would be pretty sweet if we could build a Classifier that would classify a certain page using regex and WEKA. This would help us getting a more precise output.
We could start off by building one with the "baseline" behaviour and adapt it from there.
@aecio , you said this was somewhat easy to do, but I don't know very well how the weka classification options you have available work exactly. Which ones would work best together? Authority with BS?
The text was updated successfully, but these errors were encountered: