Replies: 15 comments
-
Hi Jeremy. Thanks for that thoughtful post. I fully agree that consistency would be useful, but I don't think unique names can work well from a sociological perspective. The challenge is that any given individual will have hundreds to perhaps a few thousand sortings. Having them be responsible for coming up with unique names will (and I say this from experience) result in incomprehensible labels that even they can't keep track of. From my perspective one of the main goals of this system is to make things more reproducible and less error prone, and thus even though complex primary keys are a pain, they prevent errors in a way that names do not. If we had a way to generate unique, meaningful names, I'd be fine with that, or if the names were never used themselves (as would, I think, be the case with the Sortings table) but instead would always be retrieved based on the key, that could also work. In any case, we're having a group spikesorting meeting today, and I'll make sure we talk about this and get users' perspectives. |
Beta Was this translation helpful? Give feedback.
-
@lfrank I can see what you are saying regarding the challenge of creating and managing unique names. I will point out that sorting names do not need to be globally unique - just unique up to a given SpikeSortingRecording. I think I must be missing some information about how a user would interact with these tables. Generally, I don't see the purpose of a manual table if you can't refer to a row by a proper subset of its attributes. In other words, if you need to specify all the data in the row in order to retrieve the row, why do you need the row to be stored in a table. :) If you do decide to stick with the composite keys as they are now, I would suggest eliminating the The |
Beta Was this translation helpful? Give feedback.
-
@magland. Kyu and I will be meeting shortly and we’ll discuss this, but one quick response:
As far as I understand things, we can’t use the restrictions because we need to be able to flexibly take different combinations of the recordings, sorter parameters, and artifact lists, and the restriction operation assumes that you can code up that combination at the time of the make.
And I’m okay with the idea of a name as an additional primary key, but in practice I think this would be used infrequently. That said, if you think it would be useful for you, then adding it seems like a good idea to me unless we come up with some other problem it would introduce.
On Mar 10, 2022, at 9:40 AM, Jeremy Magland ***@***.******@***.***>> wrote:
This Message Is From an External Sender
This message came from outside your organization.
@lfrank<https://urldefense.com/v3/__https://github.com/lfrank__;!!LQC6Cpwp!_LEiZcThY42g3XruMPYMlvczXQ21n4ziE3mLyiBDM5NFbuuI71RZliZE5htKb-5Wew$> I can see what you are saying regarding the challenge of creating and managing unique names. I will point out that sorting names do not need to be globally unique - just unique up to a given SpikeSortingRecording. I think I must be missing some information about how a user would interact with these tables.
Generally, I don't see the purpose of a manual table if you can't refer to a row by a subset of its attributes. In other words, if you need to specify all the data in the row in order to retrieve the row, why do you need the row. :)
If you do decide to stick with the composite keys as they are now, I would suggest eliminating the SpikeSortingRecordingSelection and SpikeSortingSelection Manual tables and instead always use the restrictions argument when calling populate() on SpikeSortingRecording and SpikeSorting. This would eliminate the Manual tables and you would only have the Computed ones.
The DerivedSpikeSorting (or whatever it's called) table would need to be Manual and I still think you should still consider having that be jointly keyed off of SpikeSorting and a user-specified name. Note that the name only needs to be unique up to the SpikeSorting.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/LorenFrankLab/nwb_datajoint/issues/158*issuecomment-1064325795__;Iw!!LQC6Cpwp!_LEiZcThY42g3XruMPYMlvczXQ21n4ziE3mLyiBDM5NFbuuI71RZliZE5ht1LQd45w$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABV4PSLM3QMJRXCLTWNDGE3U7IXZLANCNFSM5QNASV3Q__;!!LQC6Cpwp!_LEiZcThY42g3XruMPYMlvczXQ21n4ziE3mLyiBDM5NFbuuI71RZliZE5hv9JNyUiw$>.
Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!LQC6Cpwp!_LEiZcThY42g3XruMPYMlvczXQ21n4ziE3mLyiBDM5NFbuuI71RZliZE5huKKl84og$> or Android<https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!LQC6Cpwp!_LEiZcThY42g3XruMPYMlvczXQ21n4ziE3mLyiBDM5NFbuuI71RZliZE5huZSUqDZw$>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
-
Would the following not work?
|
Beta Was this translation helpful? Give feedback.
-
It would, but if (as is almost guaranteed to happen) an inexperienced user calls SpikeSortingRecording.populate(), then it will automatically generate all possible combinations, which would be hard to clean up. I really want us to avoid those sorts of failure modes if at all possible.
On Mar 10, 2022, at 9:52 AM, Jeremy Magland ***@***.******@***.***>> wrote:
This Message Is From an External Sender
This message came from outside your organization.
@magland<https://urldefense.com/v3/__https://github.com/magland__;!!LQC6Cpwp!7cZYg9nDdou_L7D-SQkp0buQx3ngdZfAWWBg3vRA7x_Bh55zBfJ23Uaw1J9sOL1iMg$>. Kyu and I will be meeting shortly and we’ll discuss this, but one quick response: As far as I understand things, we can’t use the restrictions because we need to be able to flexibly take different combinations of the recordings, sorter parameters, and artifact lists, and the restriction operation assumes that you can code up that combination at the time of the make.
Would the following not work?
SpikeSortingRecording.populate({
'nwb_file_name': ...,
'sort_interval_name': ...,
'sort_group_id': ...,
'preproc_params_name': ...
})
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/LorenFrankLab/nwb_datajoint/issues/158*issuecomment-1064335623__;Iw!!LQC6Cpwp!7cZYg9nDdou_L7D-SQkp0buQx3ngdZfAWWBg3vRA7x_Bh55zBfJ23Uaw1J_oFdeYmg$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABV4PSI3T4DLIHQE6UJ5WSDU7IZFHANCNFSM5QNASV3Q__;!!LQC6Cpwp!7cZYg9nDdou_L7D-SQkp0buQx3ngdZfAWWBg3vRA7x_Bh55zBfJ23Uaw1J9KEDigww$>.
Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!LQC6Cpwp!7cZYg9nDdou_L7D-SQkp0buQx3ngdZfAWWBg3vRA7x_Bh55zBfJ23Uaw1J_Vwpfblg$> or Android<https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!LQC6Cpwp!7cZYg9nDdou_L7D-SQkp0buQx3ngdZfAWWBg3vRA7x_Bh55zBfJ23Uaw1J_6VUUygQ$>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
-
Yeah, that makes sense. |
Beta Was this translation helpful? Give feedback.
-
I have only three thoughts I would add to this discussion.
|
Beta Was this translation helpful? Give feedback.
-
But if you are going to write out all the attributes, then it's not that much more efficient than first inserting the dict in the Selection table and then calling I didn't get your point about using |
Beta Was this translation helpful? Give feedback.
-
I see. I'm not too concerned about the joins being difficult yet because I'm not sure how often we will use that operation on
Yes, but the thing is, this table is populated directly from an NWB file when it is first ingested into the database (i.e. even though this is a
That used to be the case (i.e. @lfrank originally made two separate tables: |
Beta Was this translation helpful? Give feedback.
-
@jsoules Good points. A few responses
|
Beta Was this translation helpful? Give feedback.
-
I'm not sure that is the main purpose. As such, I don't think we have to impose such discipline upon ourselves to always have the primary keys of I think you prefer meaningful names because the user has to refer to them downstream. But I actually think that a name is more confusing to a user than a set of composite keys. As @lfrank points out, I'm most definitely going to forget what I meant by That means the user will basically have to carry around a dict containing all the attributes that she needs to interact with the database (chances are, the user will first make a general query about a particular NWB file, and then use intersection to gradually narrow it down). That's not as easy as just using a name. But I think clarity and completeness should trump convenience for referencing in this case (or at least that was the impression that I got from @lfrank). As for what attributes to include in the |
Beta Was this translation helpful? Give feedback.
-
@khl02007 okay that sounds good. Just one clarification -- I was proposing that the sorting name is only unique up to the SpikeSortingRecording. So if there was a sorting with name 'good_sorting1' you wouldn't have to wonder what nwb file it came from because the sorting name is not the entire key. It is just part of the key that distinguishes it from other sortings performed on the same SpikeSortingRecording. Without the name, you would need to just query to get a list of all the sortings performed and then inspect all the parameters by eye to see which is the one you wanted. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Oh I see, sorry I missed this important point (which I now see that you have made several times in this discussion - clearly I should read more carefully!). In that case I have absolutely no problem with it. I think in some cases (like your example of I spent some time reading the Datajoint documentation about what they recommend for primary keys, and found this (quoting relevant parts below, full link here):
Then it goes on to discuss using hashes and Given this, I'm inclined to use as the PK for |
Beta Was this translation helpful? Give feedback.
-
@khl02007 One snag; I believe you can only use Otherwise I think this is a great solution. |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about the
dj.Manual
anddj.Computed
tables in nwb_datajoint, especially in the context of our CuratedSpikeSorting discussion.For the Manual entry tables: I believe the main purpose is to be able to refer to the manually-inserted rows from other tables, especially Computed tables. I think that the primary keys for the Manual tables should therefore always be meaningful names that are specified by the user at the time of insertion. That way they can be referred to by those names in downstream tables.
Let's look at a bunch of examples of Manual tables in nwb_datajoint:
Yes, that conforms. Similarly,
LabTeam
,Institution
, andLab
all conform.This also confirms (subject_id serves as the name)
This conforms.
This pretty much conforms, noting that Session is referenced via
nwb_file_name
. Similar forSortInterval
Yep
This pretty much conforms. We can refer to a row by
{'sorter': 'mountainsort4', 'sorter_params_name': 'default'}
.Yes, this pretty much conforms.
Now we come to the exceptions
This is different... and it makes the downstream tables cumbersome.
Also different and difficult to work with.
I understand the decision to have separate tables for
SpikeSortingRecordingSelection
andSpikeSortingRecording
, one being Manual and the other being Computed. Similar forSpikeSortingSelection
andSpikeSorting
. I think that you wantSpikeSortingRecording
andSpikeSorting
to bedj.Computed
tables, and you want to be able to call.populate()
on them. Thepopulate()
function works well when you want to automatically compute all possible rows (parings or tuples between multiple dependent tables). What you are doing here is populating only a sparse subset of the table of all possible rows. One possibility is to use therestrictions
argument ofpopulate()
, and do away with the*Selection
tables altogether (never callpopulate()
without a narrow restriction. That could simplify things. But you would still have the problem of easily referring to these rows for further downstream processing... curation, etc.What I propose is that you rework these tables as follows:
I would then recommend having a Manual table for
DerivedSpikeSorting
(or something), which could hold derived spike sortings obtained by automatic or manual curations. This table would also contain one row for each SpikeSorting row (uncurated). Would look something like this:Rather than memorizing all the names - the user can query the database to see what SpikeSortingRecording's, SpikeSorting's and DerivedSpikeSorting's are available.
Beta Was this translation helpful? Give feedback.
All reactions