Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds extraction links for moonmining observers #319

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

vo1
Copy link

@vo1 vo1 commented Dec 22, 2021

  • creates links between ObserverData and Extraction
  • creates links between ObserverData and character / user
  • creates links between ObserverData and type
  • adds observerData::creating to properly link to extracted chunk

*/
public function character()
{
return $this->belongsTo(CharacterInfo::class, 'character_id', 'character_id');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since corporation mining data can also track non corporate character, it would be better to make this relationship to UniverseName where primary is entity_id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added separate link to universe_name since both should be present in my opinion. What do you think?

Copy link
Member

@Crypta-Eve Crypta-Eve Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem you run into is that by splitting into two relationships, you now need to decide which one to call at runtime. What is the application you are using the relationship for? For name resolution, you are much better-suited sourcing from universe_name as there will be more entries. For linking back to users as I suspect you are doing, this raises an interesting question... As we cannot link back to the character from universe_name without doing individual DB lookups this would be quite expensive.. Makes me have thoughts about polymorphic relationships from UniverseName back to a relevant class (but that is not in scope of this PR). WIll defer to @warlof on this one.

Copy link
Member

@warlof warlof Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about a poly for a while, but there are few issues using it :

  • first, we will non longer have the hability to distingue if or if not an entity is registered withing an instance or if it's simply a third party
  • second, in despite my attempt, currently, DataTable builder isn't compliant with polymorphic relationship - they're quite recent and the API prevent a natural join (perhaps I'll put an eye in it when L9 will be released since they've redesigned builders)

you mostly get the primary question in your concern, what will be the purpose of that relation ? And there, the answer is : he's looking for a way to group entries per user - which is not possible on behalf UniverseName.

against having both UniverseName and CharacterInfo inside same model, however, we might add CharacterInfo, AllianceInfo and CorporationInfo relationships in UniverseName

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the idea is to group.

public function up()
{
Schema::table('corporation_industry_mining_observer_data', function (Blueprint $table) {
$table->integer('extraction_id')->default(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a problematic foreign key. I don't know yet how to handle that though.
Spawning a dummy extraction sounds wrong to me - but avoiding data to be recorded isn't good either.

Copy link
Member

@Crypta-Eve Crypta-Eve Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically have two options here, the first is to create a key to the extraction_id as has been done. The second is to base the relationship in the extractions model which will grab all mining data with that structure_id and between the known datetimes of those extractions. In my dev env I had done the latter.. It hurt performance slightly, but kept the DB free of the extra key. The benefit of the relationship though is you can be much more sure you have the correct extraction as it is based on the dates reported by eve.

The problem with the key that needs to be addressed is the interval used to find the extraction. It should not be plus-minus 5 days. The reason being is you may get overlapping extractions with this range. The minimum drill time is 6 days, which can be started as soon as the current chunk is fractured. An unrigged structure will have an asteroid field that lasts 48h. Combined with the last_updated only being accurate to a day, this gives you a separation of only 3 days (48h plus 1d overflow). However, in a T2 rigged structure these fields can be mined for an extra 100% of time, totalling 96 hours. This means when you add the day for overflow, your separation is only 1 day.

The other issue, is that if the extraction does not get populated in the DB prior to the ObserverData then the extraction_id resolution occurring in the boot of the ObserverData model will fail to find an extraction_id and will never try again to populate this, losing the link in the data. The only way then to resolve that would be to add a correction job that runs periodically and attaches all orphaned data to extraction_ids if possible. This in my opinion would be clunky.

My proposal instead would be to add an index to last_updated, and then create the relationship in the Extraction to the ObserverData based on matching structure_id and use the days of the extraction key dates (plus 1 on the end) as bounds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it tries to find date by chunk_arrival_time BETWEEN (6 .. last_updated .. 6)
since 6 is the minimal, 5 would be safest I suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that last_updated will continue to update right up until the mining stops. This means it will slowly travel from the chunk_arrival_time all the way to natural_decay_time + 24h. And you cannot guarantee at what point in that cycle we will first receive that data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what we're sure about is :

  • extractions are attached to a moon structure
  • observer is nothing else than a drilling structure event
  • observer is therefore tied to a structure
  • observer data are tied to an observer event (at least by their naming)

Early I said : corporation mining has not be implemented because it's not robust yet - but it appears none of you put this in consideration and mostly assume our modeling is right.

What if the observer is falsy recorded in our current architecture ?
Let me rephrase : observer is an event - as per ESI, it has the following mandatory fields :

  • observer_id
  • observcer_type
  • last_updated

We already know that observer_id is the structure to which observer is attached to.
Now, what if we should record data as soon as we don't have a matching pair of observer_id, last_updated ?
This would imply an addition of a new primary AI field called id and a refactor of the observer job in order to spawn a new entry as soon as we don't know about the pair observer_id and last_updated ?

Based on this, we can notice all last_updated field for either corporation_industry_mining_observers or corporation_industry_mining_observer_data are always formatting like YYYY-mm-dd 00:00:00 - is it an error from our side ? at least, since both match, we can properly link observer_data to observer_event (based on observer_id and last_updated fields).

So now, last question, what is the relation between observer and extraction ? Is there something else than the structure ID ? Like, we get an event every time an extraction cycle is pass (start, chunk arrival, explode/natural decay time) ?

Since I'm non longer involving in a corporation, I will not be able to check those upper theory. But, I think it could be interesting to apply those few refactor on a dev env, collect data. Beeing able to validate all upper could help a lot to achieve what you're trying to do without all those complexe things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My corporation is having one moon per day with max cycle. So far this code works good.
Big problem as you said, observerID is just structureID, so to calculate actual extraction you'd need to match observerID+lastUpdated. Since we are storing now also extractionID it is easier to match single moon extraction to calculate actually what was happened in last extraction cycle, and not by lastUpdated, but by whole three extraction days it was available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warlof You are right. We have not discussed if the structure we have in the DB is the best structure. What we have at the moment mirrors the results we get back from ESI. This data structure is more flexible for possible future expansion as hinted at by the /corporation/{corporation_id}/mining/observers/ endpoint. This results in the corporation_industry_mining_observers table with structure as below:

+----------------+-------------------+------+-----+---------+-------+
| Field          | Type              | Null | Key | Default | Extra |
+----------------+-------------------+------+-----+---------+-------+
| corporation_id | bigint(20)        | NO   | MUL | NULL    |       |
| observer_id    | bigint(20)        | NO   | PRI | NULL    |       |
| last_updated   | datetime          | NO   |     | NULL    |       |
| observer_type  | enum('structure') | NO   |     | NULL    |       |
| created_at     | timestamp         | YES  |     | NULL    |       |
| updated_at     | timestamp         | YES  |     | NULL    |       |
+----------------+-------------------+------+-----+---------+-------+

This table is the one @vo1 wanted to deprecate as at the moment it serves little purpose currently. The reason it is of little use for the application of this PR is due to the fact that currently, the only observer_type is a structure. That said I would not rule out in the future more mining observers being added based on this. Also, I have not confirmed this, but I have a theory based on the description of the endpoint that a mining structure placed on a moon in such a way that it could start an extraction, but has not yet fired the moon drill, will be present in this return. If correct, this is a good data point to have for the reason of knowing which structures are situated correctly on moon beacons vs just in space (yes I have seen structures not on moons still fitted with moon drills).

Paginated list of all entities capable of observing and recording mining for a corporation

@vo1 To address your last post see below.

My corporation is having one moon per day with max cycle. So far this code works good.

Congrats on running so many moons, but this is not the issue I presented to you above. The issue with the 6-day interval you have chosen is when the moon drills are set to minimum cycle, with the rig for maximum asteroid field life. Then you run into problems with a 6-day interval overlapping across multiple extraction_id. You can easily differentiate between different moons with the observer_id so the fact you have overlapping cycles across multiple moons is no problem. The issue is the close spacing of cycles on the same moon.

Big problem as you said, observerID is just structureID

The fact that observer_id is just the structure_id holds, for now, is true. However, based on CCP's API we should be ready for that not to always be the case.

Since we are storing now also extractionID it is easier to match single moon extraction to calculate actually what was happened in last extraction cycle, and not by lastUpdated, but by whole three extraction days it was available.

Yes, with the extraction_id stored it is much easier to match the details, but you are not guaranteeing storage of the extraction_id in this PR. If the CorporationIndustryMiningExtraction model does not have a record of the extraction when the ObserverDetails job runs then the extraction_id will not be populated and will be forever orphaned. Either a periodic job needs to be created which will re-attach the orphaned data, or the relationship based on the last_updated field used as this does not require data hydration to function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lowered to 5 days since 3d I think is kinda unsafe.
Grouping by last_updated with some virtual column that will make it a single extraction is simply same as grouping them by extraction_id with this 5d range, except slower.
Post-processing with a cronjob I think also might have similar problem.

src/Models/Industry/CorporationIndustryMiningObserver.php Outdated Show resolved Hide resolved
*/
public function character()
{
return $this->belongsTo(CharacterInfo::class, 'character_id', 'character_id');
Copy link
Member

@Crypta-Eve Crypta-Eve Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem you run into is that by splitting into two relationships, you now need to decide which one to call at runtime. What is the application you are using the relationship for? For name resolution, you are much better-suited sourcing from universe_name as there will be more entries. For linking back to users as I suspect you are doing, this raises an interesting question... As we cannot link back to the character from universe_name without doing individual DB lookups this would be quite expensive.. Makes me have thoughts about polymorphic relationships from UniverseName back to a relevant class (but that is not in scope of this PR). WIll defer to @warlof on this one.

public function up()
{
Schema::table('corporation_industry_mining_observer_data', function (Blueprint $table) {
$table->integer('extraction_id')->default(0);
Copy link
Member

@Crypta-Eve Crypta-Eve Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically have two options here, the first is to create a key to the extraction_id as has been done. The second is to base the relationship in the extractions model which will grab all mining data with that structure_id and between the known datetimes of those extractions. In my dev env I had done the latter.. It hurt performance slightly, but kept the DB free of the extra key. The benefit of the relationship though is you can be much more sure you have the correct extraction as it is based on the dates reported by eve.

The problem with the key that needs to be addressed is the interval used to find the extraction. It should not be plus-minus 5 days. The reason being is you may get overlapping extractions with this range. The minimum drill time is 6 days, which can be started as soon as the current chunk is fractured. An unrigged structure will have an asteroid field that lasts 48h. Combined with the last_updated only being accurate to a day, this gives you a separation of only 3 days (48h plus 1d overflow). However, in a T2 rigged structure these fields can be mined for an extra 100% of time, totalling 96 hours. This means when you add the day for overflow, your separation is only 1 day.

The other issue, is that if the extraction does not get populated in the DB prior to the ObserverData then the extraction_id resolution occurring in the boot of the ObserverData model will fail to find an extraction_id and will never try again to populate this, losing the link in the data. The only way then to resolve that would be to add a correction job that runs periodically and attaches all orphaned data to extraction_ids if possible. This in my opinion would be clunky.

My proposal instead would be to add an index to last_updated, and then create the relationship in the Extraction to the ObserverData based on matching structure_id and use the days of the extraction key dates (plus 1 on the end) as bounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants