Add Prediction Output #131

chadlagore · 2018-05-29T07:05:35Z

👷 Changes

Added prediction output capability.
Refactored classes a bit.

🔦 Testing Instructions

pytest -vvv

Follow README instructions.

iKevinY · 2018-06-01T08:26:45Z

minutes/audio.py

@@ -8,6 +8,9 @@


 class Audio:
+    """Internal audio maninpulation class. I reserve the right to change this


Nit: s/maninpulation/manipulation/g

iKevinY · 2018-06-11T06:47:13Z

minutes/base.py

+        return {
+            i: getattr(self, i) for i in self.intialization_params
+            if i in {
+                'ms_per_observation',


I think it would be good to declare this set as a constant somewhere and refer to it by name.

iKevinY · 2018-06-11T06:49:19Z

Hmm, for some reason the PR build is passing but the actual commit build is failing? 🤔

chadlagore · 2018-06-11T14:52:19Z

Yeah. The failure means that the model build is not deterministic, which may not be a problem, but affects they way the tests may be written. I can run that test 100 times and view the output to be sure.

chadlagore · 2018-06-16T19:06:49Z

Posting a fix for this today!

chadlagore · 2018-06-16T19:35:40Z

~/git/minutes chad/126-prediction-output*
minutes-c2ZPuskd ❯ pytest test/test_minutes.py
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.6.4, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/chadlagore/git/minutes, inifile:
plugins: cov-2.5.1
collected 3 items

test/test_minutes.py ..F                                                                                                                                                                                    [100%]

==================================================================================================== FAILURES =====================================================================================================
__________________________________________________________________________________________________ test_phrases ___________________________________________________________________________________________________

    def test_phrases():
        for model_name in Minutes.parents:
            minutes = Minutes(parent=model_name)
            minutes.add_speaker(c.SPEAKER1)
            minutes.add_speaker(c.SPEAKER2)
            minutes.fit()

            # Predict new phrases (make sure we ony predict once per obs)
            conversation = Conversation(c.CONVERSATION_AUDIO, minutes)
            raw, _ = conversation.get_observations(**minutes.preprocessing_params)
            assert len(conversation.phrases) == len(raw)
            print(conversation.phrases)

            # Make sure we ony predicted on speaker 1 and 2.
            names = [p.speaker.name for p in conversation.phrases]
>           assert sorted(list(np.unique(names))) == ['speaker1', 'speaker2']
E           AssertionError: assert ['speaker2'] == ['speaker1', 'speaker2']
E             At index 0 diff: 'speaker2' != 'speaker1'
E             Right contains more items, first extra item: 'speaker2'
E             Use -v to get the full diff

test/test_minutes.py:37: AssertionError
---------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------
[<minutes.conversation.Phrase object at 0x1201b7278>, <minutes.conversation.Phrase object at 0x1201b71d0>, <minutes.conversation.Phrase object at 0x1201b7978>, <minutes.conversation.Phrase object at 0x1201b7a58>, <minutes.conversation.Phrase object at 0x1201b7240>, <minutes.conversation.Phrase object at 0x1201b7940>, <minutes.conversation.Phrase object at 0x1201b7780>, <minutes.conversation.Phrase object at 0x1201b79e8>, <minutes.conversation.Phrase object at 0x1201b7a20>, <minutes.conversation.Phrase object at 0x1201b7898>]
======================================================================================= 1 failed, 2 passed in 7.35 seconds ========================================================================================

~/git/minutes chad/126-prediction-output* 10s
minutes-c2ZPuskd ❯ pytest test/test_minutes.py
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.6.4, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/chadlagore/git/minutes, inifile:
plugins: cov-2.5.1
collected 3 items

test/test_minutes.py ...                                                                                                                                                                                    [100%]

============================================================================================ 3 passed in 5.44 seconds =============================================================================================

I've replicated this locally. It is as we expected, the test is non-deterministic because we train the model each time!

Problem

We provide our model with a random_state parameter—this gets used by _generate_training_data to do a test-train split. Ideally, it would lock the Keras model as well, but its not that simple. Keras uses the Numpy seed and the Tensorflow seed, which is set globally.

from numpy.random import seed
from tensorflow import set_random_seed
seed(random_state)
set_random_seed(random_state)

Options

Admit that minutes cannot guarantee reproducibility, use the random_state to generate training data, then let the user set the global Numpy state if they want. This will cause confusion if someone sets the random state of the Minutes model, and expects some sort of stability in the answer.
Save the Numpy state, call the Keras functions, restore the Numpy state with a decorator. This makes fitting Minutes models non-thread safe 🤔.

chadlagore · 2018-06-16T19:49:54Z

Inform the user that they should set the np.random.seed and the tf.set_random_seed if they want reproducibility.

This seems like the simplest solution for now!

iKevinY

Looks great to me! Agreed that seeding the state is probably the easiest way to achieve reproducibility (and we mostly just care about it for consistent CI testing anyways).

Add prediction output

c4883e9

chadlagore added the Under Construction 🚧 label May 29, 2018

Refactored conversation and base to work maybe more harmoniously

a5e0e1e

chadlagore added Ready For Review 👋 PR is looking for a +1 and removed Under Construction 🚧 labels Jun 1, 2018

chadlagore requested review from iKevinY, grig-guz and Schemetrical June 1, 2018 04:32

Need my fixture

13b9a6b

iKevinY reviewed Jun 11, 2018

View reviewed changes

chadlagore added 2 commits June 16, 2018 13:06

Fix non-deterministic test; update README to talk about determinism

efa19f0

Typo; move preprocessing params to Audio class

e0a4cb3

iKevinY approved these changes Jun 16, 2018

View reviewed changes

Schemetrical approved these changes Dec 4, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prediction Output #131

Add Prediction Output #131

chadlagore commented May 29, 2018 •

edited

Loading

iKevinY Jun 1, 2018

iKevinY Jun 11, 2018

chadlagore Jun 11, 2018

iKevinY commented Jun 11, 2018

chadlagore commented Jun 11, 2018

chadlagore commented Jun 16, 2018

chadlagore commented Jun 16, 2018 •

edited

Loading

chadlagore commented Jun 16, 2018 •

edited

Loading

iKevinY left a comment

		@@ -8,6 +8,9 @@


		class Audio:
		"""Internal audio maninpulation class. I reserve the right to change this

Add Prediction Output #131

Are you sure you want to change the base?

Add Prediction Output #131

Conversation

chadlagore commented May 29, 2018 • edited Loading

👷 Changes

🔦 Testing Instructions

iKevinY Jun 1, 2018

Choose a reason for hiding this comment

iKevinY Jun 11, 2018

Choose a reason for hiding this comment

chadlagore Jun 11, 2018

Choose a reason for hiding this comment

iKevinY commented Jun 11, 2018

chadlagore commented Jun 11, 2018

chadlagore commented Jun 16, 2018

chadlagore commented Jun 16, 2018 • edited Loading

Problem

Options

chadlagore commented Jun 16, 2018 • edited Loading

iKevinY left a comment

Choose a reason for hiding this comment

chadlagore commented May 29, 2018 •

edited

Loading

chadlagore commented Jun 16, 2018 •

edited

Loading

chadlagore commented Jun 16, 2018 •

edited

Loading