Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft of a possible solution to the #14 issue and changes in public interface #24

Merged
merged 6 commits into from
Jun 14, 2020

Conversation

zhiburt
Copy link
Contributor

@zhiburt zhiburt commented May 26, 2020

Hi @philippkeller, In the #22 I put an example in what way may be changed the interface of ReadUntil.
And today I went through the use cases of exp function and noticed that in all cases there's no need for all the data find function gives so I went slightly further and check what can be done.

The main idea behind this PR is to provide a cleaner calls of read_until function.
The example when we match string and return the buffer before and the string itself but we know the string already since we did a match by that so the second part is always needless.

A bunch of examples

let mut p = spawn("cat", Some(1000)).expect("cannot run cat");
p.send_line("lorem ipsum dolor sit amet")?;
assert_eq!("lorem ipsum dolor sit ", p.exp_string("amet")?);
let f = io::Cursor::new("2014-03-15");
let mut r = NBReader::new(f, None);
let re = Regex::new(r"-\d{2}-").unwrap();
assert_eq!(
    ("2014".to_string(), "-03-".to_string()),
    r.read_until(&re).expect("regex doesn't match")
);
let f = io::Cursor::new("abcdef");
let mut r = NBReader::new(f, None);
assert_eq!(
    "ab".to_string(),
     r.read_until(&NBytes(2)).expect("2 bytes")
);

The resolution of #14 could looks like

we have an index of the ReadUntil element by which we get successful match, and
the copy of the buffer on which it be called so if the user want he can repeat the match on the buffer.

let until = vec![ReadUntil::NBytes(30), ReadUntil::String("Hi".to_string())];
if let Ok(buffer, index)= = p.exp(until.as_slice()) {
   let res = &until[index]
      .clone()
      .string_needle()
      .unwrap()
      .find(&buffer, true)
      .unwrap();
}

This is a draft and it's affected by a linter which I am sorry about.

All tests are passed.

Signed-off-by: Maxim Zhiburt <[email protected]>
@zhiburt zhiburt changed the title draft of a possible solution to the #14 issue draft of a possible solution to the #14 issue and changes in API May 26, 2020
@zhiburt zhiburt changed the title draft of a possible solution to the #14 issue and changes in API draft of a possible solution to the #14 issue and changes in public interface May 26, 2020
@philippkeller
Copy link
Collaborator

@zhiburt I'm currently swamped with work from my workplace and can only look at this at the weekend. Very happy that you took my PR and added your adaptions yourselves, looking forward to review this!

@zhiburt
Copy link
Contributor Author

zhiburt commented May 30, 2020

Perhaps, as a draft, it's not as clean as It should be.

The idea is basically that read_util function now only responsible for cleaning the internal buffer. The return value is determined by the Needle parameter. And ultimately it splits the ReadUntil to different types which implements Needle so each of them can have different return type.

/// read until it finds a match by needle
/// returns the needle's interest
pub fn read_until<N: Needle + ?Sized>(&mut self, needle: &N) -> Result<N::Interest>;

/// `Match` is a structure which contains a start and end of
/// a distance which `read_util` should delete from buffer.
pub struct Match<T> {
    begin: usize,
    end: usize,
    pub interest: T,
}

/// `Needle` trait has only one method `find` which return `Some`
/// if something was found `None` otherwise. 
pub trait Needle {
    type Interest;

    fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>>;
}

By the way the ReadUntil is still in need by us as we have to implement ReadUntil::Any with new interface.

It allows us to have different return values for each Needle.

impl Needle for String {
    /// return a content before the match
    type Interest = String;
    ...
}

impl Needle for NBytes {
    /// first N bytes
    type Interest = String;
    ...
}

impl Needle for Regex {
    /// content before the match and the match itself
    type Interest = (String, String);
    ...
}

impl Needle for [ReadUntil] {
    /// The copy of the buffer and an index of
    /// a first needle by which we get successful match
    type Interest = (String, usize);
    ...
}

src/reader.rs Outdated
@@ -19,12 +19,42 @@ enum PipedChar {
EOF,
}

#[derive(Clone)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about vec! probably I got mistaken but there's a real cause here. Currently string_needle method consumes/owns a ReadUntil instance. I hope the convert methods could be changed to have a &self parameter rather than self in which case the Clone could be deleted.

https://github.com/philippkeller/rexpect/blob/0212cd6ff069e5633ea8e6d4dad8212f88ef6464/src/session.rs#L503-L508

pub enum ReadUntil {
String(String),
Regex(Regex),
EOF,
NBytes(usize),
Any(Vec<ReadUntil>),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice that you got rid of this, I was always under the impression that the Any doesn't belong into this enum because it is "one level above"

src/reader.rs Outdated
}
}

impl Needle for String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need a impl both for str and for String?

Copy link
Contributor Author

@zhiburt zhiburt May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, actually there's no reason in String implementation, as I remember I've experienced some string cast issue and just added it to not have the cast. (don't remember what it was)

And actually I was wondering at the time, might it's better not to implement the interface for existed types and create a wrapper structure e.g as I did with NBytes(usize) even though it could be implemented just to usize. I am not sure which way is better for _string.

impl Needle for Regex {
type Interest = (String, String);

fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a Needle::find function for regex here and also a function find which also handles regex?

Copy link
Contributor Author

@zhiburt zhiburt May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean this one It just hasn't been deleted, but it can be :). I didn't clean the code as much as you see
https://github.com/philippkeller/rexpect/blob/0212cd6ff069e5633ea8e6d4dad8212f88ef6464/src/reader.rs#L86

but if you mean Regex::find(&self, buffer), here we just call original find method regex::Regex::find.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I meant the pub fn find function. Nice that it can be deleted. All clear then!

@philippkeller
Copy link
Collaborator

@zhiburt I like the approach very much, your proposal regarding the API changes makes sense (e.g. matching a string only needs to return match before string) and then allows to return the index for Any. See my comments to your code above.

I also had a look at the API for pexpect and they only return the index of the match (which makes only sense if you provide a list of needles) or -1 if it didn't match. If you provide a regex then the matching string is stored in an attribute match within the "process object", so bottom line I think your proposal is better as it utilizes rusts features with "conditional return types".

Some questions:

  1. there's now two find functions which looks like redundant code
  2. you mention that this draft is still very drafty, what makes you think that?
  3. did you look into the merge conflict?

IMO this should first merged into a branch in order to more easily play around and be able add/change documentation and then I could release this together with the api changes for windows #11 in version 0.5

@zhiburt
Copy link
Contributor Author

zhiburt commented May 31, 2020

  1. I hope this one can be removed. https://github.com/philippkeller/rexpect/blob/0212cd6ff069e5633ea8e6d4dad8212f88ef6464/src/reader.rs#L86

  2. I consider it is drafty as the documentation is unchanged and all of a sudden a linter effected it.
    Also I dislike a bit the fact of existence 'util' methods in ReadUntil and the fact of coping the buffer in impl Needle for [ReadUntil] I consider that not a crucial performance issue but ...

  3. No, I didn't, I've considered to get a review first :)

I am not versed in git, but I bet have a different branch first not a bad idea

@philippkeller
Copy link
Collaborator

I created a new branch exp-and-windows where I'd like to merge this into. Can you resolve the conflict with master? I think this is easier for you than for me. Then I'll merge it into the new branch. The idea would be that also the windows changes are merged in this branch.

@zhiburt
Copy link
Contributor Author

zhiburt commented May 31, 2020

Good evening @philippkeller,

I've investigated the former mentioned issue with the buffer and not very expandable ReadUntil interface. And as I can state we could solve both issues at once by this proposal.

use reader::{ABInterest, NBytes, Regex, Until};

// currently it equals [ReadUntil::Nbytes(30), ReadUntil::String("Hi"), ReadUntil::Regex(Regex::("").unwrap())]
let until = Until(
    NBytes(30),
    Until("Hi".to_string(), Regex::new("").unwrap())
);
// we could wrap it by a function `or `
// NBytes(30).or("Hi".to_string())
//           .or(Regex::new("").unwrap())

match p.exp(&until) {
   Ok(ABInterest::A(s)) => {}
   Ok(ABInterest::B(ABInterest::A(s))) => {}
   Ok(ABInterest::B(ABInterest::B((s, regex)))) => {}
   Err(e) => assert!(false, format!("got error: {}", e)),
}

So user is building the any match type on its own. Until(a, b) is style of a OR b (This thoughts landed me to the question if there's any reason in a AND b?).

Why it's much more expandable?
Because this PR provides an opportunity for user implement its own Needle, but the array implementation [ReadUntil] doesn't work with users implementations. With such implementation default Needles could be used with user defined ones in 'arrays'.

I don't think that user defined Needles is what is going to be used broadly but this is one option :)

Not sure if the names of type is expansionary enough :)
And am certainly not sure if such interface is 'OK' but it reduce copying operations and force user to check all possible outcomes.

re implementation of [ReadUntil]

pub struct Until<A, B>(pub A, pub B);

pub enum ABInterest<A, B> {
    A(A),
    B(B),
}

impl<A: Needle, B: Needle> Needle for Until<A, B> {
    type Interest = ABInterest<A::Interest, B::Interest>;

    fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>> {
        self.0
            .find(buffer, eof)
            .and_then(
                |Match {
                     begin,
                     end,
                     interest,
                 }| Some(Match::new(begin, end, ABInterest::A(interest))),
            )
            .or_else(|| {
                self.1.find(buffer, eof).and_then(
                    |Match {
                         begin,
                         end,
                         interest,
                     }| {
                        Some(Match::new(begin, end, ABInterest::B(interest)))
                    },
                )
            })
    }
}

@philippkeller
Copy link
Collaborator

Aha, so a user could implement ABInterest his own, but we would of course implement the or type for him to provide an exp_any?

I was first fearing that this is bringing more complexity/abstractions into the code base.
I lot sight a little bit the overview here :-) Is what you propose already possible with this PR or would it mean additional changes?

@zhiburt
Copy link
Contributor Author

zhiburt commented Jun 3, 2020

Aha, so a user could implement ABInterest his own, but we would of course implement the or type for him to provide an exp_any?

ABInterest is a return type for any matching (currently in the PR the type is (usize, String) index and copy of the buffer).

Is what you propose already possible with this PR or would it mean additional changes?

The read_until takes any type which implements Needle. If we make the trait pub the user will be able to pass its own types. (One more time I am not sure if it is a legit for the any user to have such option)

I was first fearing that this is bringing more complexity/abstractions into the code base.

I couldn't disagree, but from my perspective return a original Needle result is much more flexible rather than index + buffer. Here is my arguments.

The proposal is only linked to old ReadUntil::Any and current Needle for [ReadUntil] interfaces.

As I've mentioned it returns index + copy of the buffer. (I decided to return copy since I was not sure that buffer will be unchanged until the next call of a indexed object).
I naturally think that the coping is only shows a smelt code here, but what is more importantly to get around rust type system there should be a util function to cast a ReadUntil obj to Needle. So the code is looks not really well.

let until = vec![ReadUntil::NBytes(30), ReadUntil::String("Hi".to_string()), Regex::new("").unwrap()];
if let Ok(buffer, index)= = p.exp(until.as_slice()) {
   let res = &until[index]
      .string_needle()
      .unwrap()
      .find(&buffer, true)
      .unwrap();
}

But there are a bit more here. As you can see I cast the indexed ReadUntil object to specific one. If the order of the array will be changed the type system will not be able to recognize the error. It will cause an unwap error. And more importantly we call find the second time. (first time in exp)

I propose to change the construction from [ReadUntil] to new type Until (the name is not descriptive ... ). Until is a tuple of Needles (Until(Needle, Needle)). The implementation will be changed to try to find the match by first element of the tuple if it's found than that is a return value, otherwise it check the second one. It will require to change the return type to ABInterest (the same expansionary name here). The ABInterest is present the result of the first element of the tuple or the result of the second element.

That literally a play around of rust type system to get able to remove ReadUntil's util functions. And as a step forward give the any logic be used with user defined Needles.

The code with the same logic as previous one.

// Until and ABInterest is renamed to UntilOR and ORInterest correspondingly (I guess if it is planer)
use reader::{ORInterest, NBytes, Regex, UntilOR};

let until = UntilOR(
    NBytes(30),
    UntilOR("Hi".to_string(), Regex::new("").unwrap())
);

match p.exp(&until) {
   Ok(ORInterest::Left(s)) => {}
   Ok(ORInterest::Right(ORInterest::Left(s))) => {}
   Ok(ORInterest::Right(ORInterest::Right((s, regex)))) => {}
   Err(e) => assert!(false, format!("got error: {}", e)),
}

If the order of parameters in UntilOR will be changed we will get an compiler error. And there's none of copying. And there's no second find call. That's why I consider it is a win.

But yep I am certainly not confident to say that it is a good way to follow as might it is slightly too complicated interface. Just wanted to have some comments according to it :)

@philippkeller
Copy link
Collaborator

Frankly I'm a bit lost, maybe because I'm not into rust that much as you are :-)
But your arguments sound compelling. Less copying is certainly good and your ideas sound like the underlying code is very flexible.

What I'm not sure of is the changes it means to the api. So easiest for me would be if you could solve the conflicts and then I could merge it into the branch to play around, would that be possible?

@zhiburt
Copy link
Contributor Author

zhiburt commented Jun 7, 2020

@philippkeller I hope I resolved the conflicts.
And I slightly clean the PR by removing a linter affected code. Also I did a couple of minor changes there.


The idea came to my mind today acording to my former proposal. We might are able to create a macros to hide the complexity of underling types in order to clean API. I've created a scratch of it but I am truly not versed in macros so I succeded only in creating a macros with constant number of parameters. But I guess the expandable macro can be implemented as well.

Example: the same any logic without macros and with.

match find {
    OrInterest::Lhs(OrInterest::Lhs(first)) => assert!(false),
    OrInterest::Lhs(OrInterest::Rhs(second)) => assert_eq!("Hello ".to_string(), second),
    OrInterest::Rhs(third)) => assert!(false),
}
until!(find, {
     first => { assert!(false) },
     second => { assert_eq!("Hello ".to_string(), second) },
     third => { assert!(false) },
});

@philippkeller philippkeller changed the base branch from master to exp-and-windows June 14, 2020 09:38
@philippkeller philippkeller merged commit f07478a into rust-cli:exp-and-windows Jun 14, 2020
@philippkeller
Copy link
Collaborator

@zhiburt so I merged your PR into the new branch and started to play around.
This PR is already closed, so not sure where to put this discussion best, so I just put it here :-)

Early feedback: I find the exp_any interface too complicated right now. I was hoping that the Until and the OrInterest would be in the "inner workings" of the module, and would not be exposed in the api. What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?

Sorry for no more thorough feedback, I wanted to give an early feedback as soon as I had a time window :-) hope you don't take it personal

Additionally: did you see the windows branch? Maybe we can have some "general chat" on where this repo should be going, the way we work together, etc. Not sure what would be the best way to chat about this, maybe Telegram or Slack would serve this? But I'm not using in any code related communities so not sure what is the best option.

@zhiburt
Copy link
Contributor Author

zhiburt commented Jun 16, 2020

Early feedback: I find the exp_any interface too complicated right now.

As I said I totally agree with a such position :).

I was hoping that the Until and the OrInterest would be in the "inner workings" of the module, and would not be exposed in the api.

I too. As I currently understand that it can be hidden only by macros.

What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?

There's an issue with rust type system I guess here, so it is kind of problematic bu I'll check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants