Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo structure and packaging/distribution #15

Open
bertsky opened this issue Mar 27, 2020 · 2 comments
Open

repo structure and packaging/distribution #15

bertsky opened this issue Mar 27, 2020 · 2 comments

Comments

@bertsky
Copy link

bertsky commented Mar 27, 2020

I just stumbled over this project. Excellent work!

IMO you should publish this in an academic paper, since many scholars (in DH and beyond) will be interested...

I have a few suggestions on making this more readily available to users. (And I'd be happy to at least partially volunteer, if you are interested.)

requirements.txt

To maximize utilty for other users, it's necessary to minimize the assumptions/constraints of your explicit dependencies. Currently, that list requires the exact version you happened to have installed when you did pip freeze – but your actual requirements will almost always be broader. Also, it's not necessary to list dependent packages there. E.g. instead of ...

h5py==2.8.0
numpy==1.15.1
keras==2.2.4
scipy==1.1.0

...you could just write keras<2.3.

But most of these are already required by Mask-RCNN anyway (see next point), so perhaps this list could be drastically reduced to just:

mask-rcnn>=2.1
pandas
requests
image

Mask-RCNN redistribution

It's not good practice to dump an open source repo's code into your own, for several reasons:

  • you loose its history and source annotation (you have to dig it up externally)
  • you loose upstream changes/fixes (you have to sync them in manually)
  • you loose the possibility to (conveniently) contribute your own changes to upstream

Instead, you should make mask-rcnn an external dependency, like the others. If you do need to make changes of your own, then do that in a Github fork, which you then can either integrate here as a git submodule or reference in requirements.txt via the GH URL of your fork.

Currently, IIUC, the only change you made is your custom.py and Jupyter notebooks, right?

separate code and data

You (thankfully) provided all your raw and intermediate data files along with the converter scripts and tools, so one can re-produce your work. But for re-use it's usually better to strictly separate data and code files. At least in the directory structure, but preferably also with distinct repos (pointing to the data repo in a submodule only).

distribute on PyPI

The most straightforward and visible distribution option for Python projects is the Python Packaging Index. To publish there, all you have to do is register for an account, bring your repo into shape (following the Python packaging guidelines – most notably, create a setup.py with entry_points and linking to requirements.txt), then use twine to build and upload your package.

@kirschbombe
Copy link
Member

@bertsky Wow, thank you for all the feedback! I've been putting off cleaning up this repo and merging in some updates, so this is just the motivation I needed -- and changes I hadn't considered.

I'll begin working on some of these updates, but of course would not say no to any volunteer work :)

@bertsky
Copy link
Author

bertsky commented Mar 30, 2020

@kirschbombe Splendid!

but of course would not say no to any volunteer work :)

unfortunately I just ran out of GPU computing quota, so I can only commence testing your project in April.

AFAICT the base revision you used is matterport/Mask_RCNN@a49160a. There have been a couple of important fixes on master since then. But strictly, one should compare the impact of these on your training+evaluation. I could do that myself (first reproducing the current state of affairs, then moving to mask-rcnn master and repeating everything), but for you it's probably a lot less effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants