Skip to content

Conversation

@emdupre
Copy link
Contributor

@emdupre emdupre commented May 13, 2024

Addresses #31, #62

  • To be discussed : Do we want to add recommendations on using LLMs in coding ?

@emdupre emdupre force-pushed the doc/coding-standards branch from 99db30b to 87fb4b8 Compare May 13, 2024 22:16
@emdupre
Copy link
Contributor Author

emdupre commented May 13, 2024

cc @sjshim, please let me know if you have any feedback on this ! 👩‍💻

@sjshim
Copy link
Contributor

sjshim commented May 13, 2024

This is definitely much better! Hoping that materials re:packaging will finalize more following next week's meeting?

@emdupre
Copy link
Contributor Author

emdupre commented May 13, 2024

Definitely will be good to get more ideas down following that discussion 😸

@poldrack, let us know if this looks OK (enough, for now) to merge on your end !

Comment on lines +81 to +82
def confirm_data_frame_index_alignment(df1, df2):
assert all(df1.index == df2.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a unit test helper. Something like assert_matching_indices(dataframe1, dataframe2). A unit test would then be something like:

def test_dataframe_transformation():
    df = make_default_df()
    transformed_df = my_transformation(df)
    # Whatever else we do, do not break the index
    assert_matching_indices(df, transformed_df)

## Python packaging

For projects that aim to develop pip-installable packages should follow current best-practices in Python Packaging.
As of May 2024, this is outlined in [this blog post](https://effigies.gitlab.io/posts/python-packaging-2023/) by lab member Chris Markiewicz.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +63 to +71
h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1)
```

Compare this with a modular, portable refactoring:

```python
# load health data
def load_health_data(datadir, filename='health.csv'):
return pd.read_csv(os.path.join(datadir, filename), index_col=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't quite do the same things. If you're going to say, don't do A, do B, it would be good if A and B produced the same result.

Do you want to add something like:

data = load_health_data(datadir)
demeaned = data[columns].dropna().mean(1)

Do you want to go into getting datadir from os.environ or sys.argv? Given the bullet points above, that might help make clear what the alternatives look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants