Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PA learning #1

Open
pberko opened this issue Jul 23, 2021 · 8 comments
Open

PA learning #1

pberko opened this issue Jul 23, 2021 · 8 comments

Comments

@pberko
Copy link

pberko commented Jul 23, 2021

Hello

Do you have a working example for pa_learning?

Thanks

@vhavlena
Copy link
Owner

Hello,
an example of the input file you can find on https://github.com/matousp/datasets/tree/master/scada-iec104/iec104-traffic

@pberko
Copy link
Author

pberko commented Jul 23, 2021

Thanks

The file looks different from what was in my mind.
I thought it should contain paths from a DFA alphabet.
how is this file extracted from DFA?

@vhavlena
Copy link
Owner

It is quite tailored for our application. The input file is basically a long sequence of symbols. In the first step based on an expert knowledge, these symbols are put together into strings and in the second step such a multiset of strings is then the input for DPA learning.

@pberko
Copy link
Author

pberko commented Jul 23, 2021

So in line 66 I get list of strings which is actually automata paths?
Capture

@pberko
Copy link
Author

pberko commented Jul 23, 2021

@vhavlena
Another question
the function
""" Add string to frequency prefix tree """ def add_string(self, string, label=0): act = self._root self._ini[act] = self._ini[act] + 1 for i in range(len(string)): try: self.flanguages[act][tuple(string[i:])] += 1 ....

is recieving only tuples as input (as in your example) or can also get simple string i.e. "a a a"

@vhavlena
Copy link
Owner

  1. Yes. The list of strings is the input for learning. In your output, each line (list) is a string where single pair of strings is a symbol.
  2. The variable string should be a list of arbitrary symbols; you can give there arbitrary string (in your case "aaa").

@pberko
Copy link
Author

pberko commented Jul 25, 2021

Hello @vhavlena

I tried the algorithm with other samples but I'm afraid I did a mistake since the results are different from expected:

The apleh-bet is "a, b"
input file with paths generated from a "blackbox automaton".
input file :https://github.com/pberko/detano/blob/master/train1clean.csv
https://github.com/pberko/detano/blob/master/blackbox111.pdf

I use the file as input file to pa_learning but I got the output:
https://github.com/pberko/detano/blob/master/graphviz%20(2).pdf
which looks different

@vhavlena
Copy link
Owner

Hello @pberko,

Not sure if you are doing something wrong. I think there are several issues to consider. Your black-box automaton is Markov chain, right? In that case you likely get something different, because DPA on the output has accepting probabilities that may cause some bias for your setting (maybe you can try specialised algorithms for learning MCs). The second issue are learning parameters: different parameters may lead to different automata.

vhavlena added a commit that referenced this issue Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants