Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroutines: send events flow #92

Merged
merged 40 commits into from
Nov 28, 2023
Merged

Coroutines: send events flow #92

merged 40 commits into from
Nov 28, 2023

Conversation

wzieba
Copy link
Collaborator

@wzieba wzieba commented Nov 11, 2023

Description

This PR updates how this SDK prepares and sends events to the API. Most notably, this PR:

  • Migrates ParselyAPIConnection from AsyncTask to coroutines
  • Removes FlushQueue and replaces its responsibility with coroutines-based SendEvents
  • Makes LocalStorageRepository thread-safe (mutex) and off main thread (suspend functions run in scope of Dispatchers.IO

How to test

The SDK has a set of a few functional tests, but they all use a mocked web server. To validate that the payload is sent, plase

  1. In MainActivity comment line 31 (make app not be in debug)
  2. Tap "TRACK URL" several times (you can count the number)
  3. Wait 30 seconds or move app to the background
  4. In logs observe:
  • Sending request with N events, where N is the number of taps from point 2
  • Requested https://p1.parsely.com/
  • Pixel request success
  • Event queue empty, flush timer cleared.

wzieba added 18 commits November 9, 2023 09:23
This behavior was intentionally removed in 4ae78b2. It's not present on iOS.
Now, as `ROOT_URL` is used inside the constructor, we have to change this field **before** the `ParselyTracker` is initialized
The responsibility of `ParselyAPIConnection` should be "sending http requests" only. Orchestrating flow of events will be handled by `SendEvents` use case. This reduces coupling.
Not using `ParselyTracker` as middleman, reduces complexity
Not using `ParselyTracker` as middleman, reduces complexity
If an event would be added to local repository between getting stored queue from local repository and sending it in `SendEvents`, it'd be removed and never sent. This change fixes this risk by removing only events that were sent.
`SendEvents` shouldn't know details about structure of JSON payload.
Copy link

codecov bot commented Nov 11, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (845947b) 55.85% compared to head (784a021) 66.31%.

Files Patch % Lines
.../java/com/parsely/parselyandroid/JsonSerializer.kt 75.00% 2 Missing and 1 partial ⚠️
...ava/com/parsely/parselyandroid/ParselyTracker.java 75.00% 1 Missing and 1 partial ⚠️
...in/java/com/parsely/parselyandroid/FlushManager.kt 80.00% 0 Missing and 1 partial ⚠️
...com/parsely/parselyandroid/ParselyAPIConnection.kt 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           coroutines      #92       +/-   ##
===============================================
+ Coverage       55.85%   66.31%   +10.46%     
===============================================
  Files              12       14        +2     
  Lines             376      377        +1     
  Branches           57       54        -3     
===============================================
+ Hits              210      250       +40     
+ Misses            148      109       -39     
  Partials           18       18               

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This fixes a possible unwanted stop of flush manager in case that, between querying and successfully sending a event queue, a new event was added. In such case, we should not stop the queue.
Make the `SendEvents` run once at a time, to prevent a scenario, when we send multiple requests with the same events. In current state of SDK, it could happen when `FlushManager` counts to next flush interval **and** user moves app to the background at the same time.
The operations covered by this AsyncTask are no longer needed to run on background thread. The responsibility of checking network state is moved to `flushEvents` and stopping flush timer in case of empty queue - to `SendEvents`
BREAKING CHANGE: this commit breaks API contract by removing `stopFlushTimer` but it was never an intention to allow clients to stop flush timer under any conditions. The lifecycle of timer is handled by the SDK internally.
… main thread

BREAKING CHANGE: this commit removes `queueSize` and `storedEventsCount` methods. They were never a part of documented public API and added probably only for the need of example app. They were also not safe, as they were triggering I/O operation on the main thread.
As `getStoredQueue` is now using `mutex`, we cannot run it in the scope of the same `Mutex#withLock` as it'll conflict and lock both operations.
`ParselyAPIConnection` will always run in scope of `sdkScope` which has `Dispatchers.IO` as it constant context.
@wzieba wzieba marked this pull request as ready for review November 13, 2023 21:31
@wzieba wzieba requested a review from oguzkocer November 14, 2023 15:44
Copy link
Collaborator

@oguzkocer oguzkocer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzieba I've completed reviewing the implementation part of the PR. I haven't reviewed the tests yet or tested the implementation because if you decide the address any of the comments, it'd be best to review the tests with those included.

I am hoping that I am not completely off with my questions and suggestions, but if I am, please excuse me as I am getting back to product development PR reviews 😰

fun purgeStoredQueue() {
persistObject(ArrayList<Map<String, Any>>())
}
open suspend fun remove(toRemove: List<Map<String, Any?>?>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not about this specific line, but I think this is the best place available in this PR review.

There is a slight inconsistency about where the Mutex lock is applied within this class. It seems every function has the lock applied, except for persistObject. However, persistObject is also always called from within a lock.

Also, it seems the mutex lock is sometimes called from an already locked state as such:

mutex.withLock {
  mutex.withLock {
  }
}

I am not entirely sure how this is handled internally, but at the very least, it seems a bit inefficient. This inefficiency might be necessary in order to keep things safe, I am not sure.

I think it'd be good to look into how Mutex lock is used as a whole to make it more consistent and possibly more efficient. But, maybe it could be done in its own PR?

Copy link
Collaborator Author

@wzieba wzieba Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment. I'm totally fine with changing this logic, but I'd like to explain my rationale on those decisions.

It seems every function has the lock applied, except for persistObject.

True, and it's intended. As you rightfully notice, mutex of LocalStorageRepository locks execution of all public methods. As we control all possible executions of private persistObject, it's safe to leave it without mutex. Also, adding the same lock to this method would create a thread lock (sample).

Also, it seems the mutex lock is sometimes called from an already locked state as such:

It's crucial to emphasize, that in the SDK those two mutexes are different objects and have different purpose. If it was the same object, we would have a thread lock (sample above). Adding more context to the example you provided:

mutexA.withLock {
  mutexB.withLock {
        persistObject
  }
  //continue with execution locked by `mutexA` only
}

mutexA is from SendEvents/FlushQueue use case. It doesn't allow running this use case twice at the same time. Otherwise, if we run it twice, we're at risk of sending duplicated events (first execution wouldn't call remove method before second execution called getStoredQueue). I could address it in implementation of the use case by e.g. introducing LocalStorageRepository#getAndRemoveStoredQueue but I thought Mutex is also a fine solution.

Also, you might want to take a look at InMemoryBuffer - in case of this class, the mutexA is there, but the reasoning is the same (although I don't have an idea for a different, no-mutex approach there)

mutexB is from LocalStorageRepository. It doesn't allow manipulating the local file at the same time by two different methods. It's crucial to not risk corrupting the local file and safely performing operations.

I think it'd be good to look into how Mutex lock is used as a whole to make it more consistent and possibly more efficient. But, maybe it could be done in its own PR?

Sure - we could probably remove mutexA (I mean: the one in SendEvents/FlushQueue class) but I'm not sure if it's crucial for performance. If you see other improvements, I'm eager to discuss them! If you'd like to experiment with Mutex usage, I can recommend running FunctionalTests#stressTest for verification if coroutines are not locking and/or are efficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's crucial to emphasize, that in the SDK those two mutexes are different objects and have different purpose. If it was the same object, we would have a thread lock (sample above). Adding more context to the example you provided:

Yeap, I was aware of that, and I should have specified that in my pseudocode as I can see how it could be interpreted as the same locks. Sorry about that!

True, and it's intended. As you rightfully notice, mutex of LocalStorageRepository locks execution of all public methods. As we control all possible executions of private persistObject, it's safe to leave it without mutex.

I think in the context of "some of these functions are meant to be publicly consumed", trade-off makes sense to me. I was thinking in the lines of having an internal only storage that is exposed to the public through an encapsulating type. However, I didn't mention that at all and I am not even sure there is enough benefit to the indirection.


I think the case I should have brought up as concrete example is the mutex usage in insertEvents & getStoredQueue. In this case, we are locking the mutex, reading the disk, releasing the lock, then locking it immediately after to persist objects and then finally releasing it again. This type of relationship may be necessary in some cases when part of the relationship is outside of our control. However, in this case, we are able to control this, so I think it'd be good to lock the mutex once, read the events, persist the objects, and then release the lock. Here is what I am thinking:

    private fun getInternalStoredQueue(): ArrayList<Map<String, Any?>?> {
        var storedQueue: ArrayList<Map<String, Any?>?> = ArrayList()
        try {
            val fis = context.applicationContext.openFileInput(STORAGE_KEY)
            val ois = ObjectInputStream(fis)
            @Suppress("UNCHECKED_CAST")
            storedQueue = ois.readObject() as ArrayList<Map<String, Any?>?>
            ois.close()
            fis.close()
        } catch (ex: EOFException) {
            // Nothing to do here.
        } catch (ex: FileNotFoundException) {
            // Nothing to do here. Means there was no saved queue.
        } catch (ex: Exception) {
            ParselyTracker.PLog(
                "Exception thrown during queue deserialization: %s",
                ex.toString()
            )
        }
        return storedQueue
    }

    override suspend fun getStoredQueue(): ArrayList<Map<String, Any?>?> = mutex.withLock {
        return getInternalStoredQueue()
    }

To connect the dots, this setup is actually pretty similar to what I mentioned related to "internal only storage" above. Instead of having a second type to separate the internal functions, we are utilizing private functions. Although, If we did have an internal only type that doesn't have a mutex, we'd have been able to use that one in other parts of our internal implementation and avoid the mutex inside another mutex type of situation. It's also generally a design that I like, but this is definitely outside of this PR and I am just sharing my general thoughts on the subject.

Looking forward to hear your thoughts!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to lock the mutex once, read the events, persist the objects, and then release the lock

True - I see your point. I've updated the code in c32302f to lock the mutex once - it seems even safer this way, as e.g. in insertEvent, between getStoredEvents and mutex.withLock a new event might be added to the queue in a different lock, making the final list provided to persistObject incorrect. Thanks!

If we did have an internal only type that doesn't have a mutex, we'd have been able to use that one in other parts of our internal implementation and avoid the mutex inside another mutex type of situation.

LocalStorageRepository is internal (in a sense that it's impossible for SDK consumer to trigger any methods on it). Do you think about a different definition of "internal" here?

Thinking about removing the necessity of mutex nesting, I don't see a clear solution with file-based storage. At a time of querying or saving data, we can't allow for other operations to be performed on the file and there's a constant possibility that e.g. InMemoryBuffer will try to save new events.

But I can also be tunnel-vissioned about the proposed approach. I think it'd be more clear for me if we discussed it with some code snippets, maybe in context of #94 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have made some good improvements to the Mutex usage and I am fine with leaving it as is to keep the more important parts of the project moving.

Thinking about removing the necessity of mutex nesting, I don't see a clear solution with file-based storage. At a time of querying or saving data, we can't allow for other operations to be performed on the file and there's a constant possibility that e.g. InMemoryBuffer will try to save new events.

I think in the ideal scenario, all writes for our storage would go through a single mutex. So, no 2 things could write to it at once. We'd also only let someone read the storage if there is no write access happening at the same time. This would be much closer to how a real DB works. Having said that, it's quite an involved process to get this right and I am not sure if there is enough benefit in trying to optimize it to the fullest.

If the current solution works out fine and doesn't have any major performance concerns, keeping it simpler seems like the way to go for us.

Makes creating fake objects less problematic by removing a need to call real object constructor.
Makes creating fake objects less problematic by removing a need to call real object constructor.
Makes creating fake objects less problematic by removing a need to call real object constructor.
Makes creating fake objects less problematic by removing a need to call real object constructor.
The `FlushManager` eventually invokes `SendEvents` which checks for the size of stored queue anyways. This change reduces unnecessary complexity and overhead. More context: #92 (comment)
This change decouples `ParselyFlushManager` and `ParselyTracker`. It also makes `FlushManagerTest` resistant to `ParselyTracker` implementation changes.
To improve readability and align with `eventsToSend.isEmpty()` check
@wzieba
Copy link
Collaborator Author

wzieba commented Nov 17, 2023

Thank you @oguzkocer for the review! I've applied some of the comments and replied to other comments so we can continue discussion. Ready for re-review.

@wzieba wzieba requested a review from oguzkocer November 17, 2023 15:18
Comment on lines +28 to +31
if (skipSendingEvents) {
ParselyTracker.PLog("Debug mode on. Not sending to Parse.ly. Otherwise, would sent ${eventsToSend.size} events")
repository.remove(eventsToSend)
return@launch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to stop the flushManager as well? I guess it'll get stopped the next time it's flushed. 🤔

I wonder if we should skip starting the FlushQueue all together in a case where we don't want the events to be sent. This is probably beyond the scope of this PR though, so I am just making a note of it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed stopping flushManager from this part of code in 784a021.

I wonder if we should skip starting the FlushQueue all together in a case where we don't want the events to be sent.

I see your point but the skipSendingEvents option is used only for tests (debug mode). As it's like --dry-run, I wouldn't opt to remove starting flushManager then, as this would cause a significant discrepancy between debug and non-debug modes of SDK. In other words: in debug mode, I believe we want to be as close as possible to non-debug. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but the skipSendingEvents option is used only for tests (debug mode). As it's like --dry-run, I wouldn't opt to remove starting flushManager then, as this would cause a significant discrepancy between debug and non-debug modes of SDK. In other words: in debug mode, I believe we want to be as close as possible to non-debug. WDYT?

That makes sense and sounds like the more practical option 👍

Copy link
Collaborator

@oguzkocer oguzkocer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzieba I've reviewed the tests and the example app changes and left a minor comment.

I initially had a few more nitpicks about using named parameters for coroutine scope in tests, adding an extra validation for events are sent in FlushQueueTest etc, but I don't think those are adding enough value for us to spend extra time in the PR review process, so I decided to remove them.

We have a couple open discussions, so I haven't tested the changes yet. Once everything is finalized, I'll test it and approve the PR. I hope the review so far has been helpful.

Comment on lines 14 to 15
private lateinit var sut: FlushManager
private val tracker = FakeTracker()
private var flushEventsCounter = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these properties, or would it be possible to use local variables instead? I skimmed over the tests and I can't see the need for the properties, but I might be missing something.

Comment on lines 14 to 15

private lateinit var sut: FlushQueue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my previous comment, do we need to use a property or could we use a local variable instead?

Now, instead of multiple locks in case of removing or
inserting data, we do the same with a single lock. See: #92 (comment)
If stored queue will be empty, on next execution of `FlushQueue`, the manager will be stopped anyway.

#92 (comment)
@wzieba
Copy link
Collaborator Author

wzieba commented Nov 24, 2023

Thank you @oguzkocer for second round! I think I answered all your comments, ready for the next review.

@wzieba wzieba requested a review from oguzkocer November 24, 2023 14:00
Copy link
Collaborator

@oguzkocer oguzkocer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzieba Everything looks great. I've tested the example app using the instructions from PR description and I've observed all the logs except for Event queue empty, flush timer cleared. which was removed in 784a021.

:shipit:

@wzieba wzieba merged commit 62cfa12 into coroutines Nov 28, 2023
@wzieba wzieba deleted the coroutines-send-event branch November 28, 2023 19:43
MartinAkram pushed a commit that referenced this pull request Jan 17, 2024
The `FlushManager` eventually invokes `SendEvents` which checks for the size of stored queue anyways. This change reduces unnecessary complexity and overhead. More context: #92 (comment)
MartinAkram pushed a commit that referenced this pull request Jan 17, 2024
Now, instead of multiple locks in case of removing or
inserting data, we do the same with a single lock. See: #92 (comment)
MartinAkram pushed a commit that referenced this pull request Jan 17, 2024
If stored queue will be empty, on next execution of `FlushQueue`, the manager will be stopped anyway.

#92 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants