Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: correct docstring examples (#3439) #16432

Merged
merged 14 commits into from
May 31, 2017

Conversation

ProsperousHeart
Copy link
Contributor

@ProsperousHeart ProsperousHeart commented May 22, 2017

xref #3439

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone May 22, 2017
0 0.548814 0.544883 0.437587 0.383442
1 0.715189 0.423655 0.891773 0.791725
2 0.602763 0.645894 0.963663 0.528895

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to remove this line (even though it seems like it should be there, based on the output).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw man ... WTF?^^ lol

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@@ -1129,8 +1134,7 @@ def get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False,
1 0 1 0
2 0 0 1

>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give a linting error (we check for PEP8)

The way you can solve this is by adding ... (then it should pass the doctests):

>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
...                    'C': [1, 2, 3]})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, the lines should be less than 80 characters wide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw man .... I did that to several ...any suggested way to fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like in mass? lol

@@ -940,6 +941,7 @@ def wide_to_long(df, stubnames, i, j, sep="", suffix='\d+'):
8 3 3 2.1 2.9
>>> l = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age')
>>> l
... # doctest: +NORMALIZE_WHITESPACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some explanation why this is needed in this case ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I think the issue was when there's a df.index.name the output has whitespace for the rest of that line (which we don't want to include in the source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, ...
Is that actually something we might want to solve in pandas, in the repr? (it's not a bug, but I also don't think it is a feature someone relies upon? So could change that (if it is easy) to not have to do this here) But that is certainly for another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could set this flag globally if this occurs a lot? (if that is possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what @TomAugspurger was thinking, but I don't think he knows either?

@@ -689,7 +689,7 @@ def _convert_level_number(level_num, columns):
new_labels = [np.arange(N).repeat(levsize)]
new_names = [this.index.name] # something better?

new_levels.append(level_vals)
new_levels.append(frame.columns.levels[level_num])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an actual change in the code. Is this to fix a bug? If so, could you do this as a separate PR (and add a test for it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make that change so I don't know what that is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is in your commit, you somehow made that change :-)
But if it was not the intent, you can just revert it (change it back to how it was based on the diff you see here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't - I swear I didn't. I don't even know what that means. I'll change it back, but I promise it wasn't me.


>>> s.unstack(level=0)
one two
a 1 2
b 3 4
a 1.0 3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was actually an error in the example!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! So ...... how do I fix it? Cause when I try to change it, I get an error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, now it is correct! So your change is perfect. I just noticed that by running the doctests, we actually corrected some errors in the docs, which is its purpose, so that is good :-)

two a 3
b 4
one a 1.0
b 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option would be to change the series construction, use np.arange(1, 5) instead of np.arange(1.0, 5.0).

I think using integers makes the example slightly simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - working on this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - will be committing soon

@jorisvandenbossche jorisvandenbossche changed the title 1st update for issue 3439 DOC: correct docstring examples (#3439) May 22, 2017
X id
0 0 0
1 1 1
2 2 2
2 1 2

>>> pd.wide_to_long(df, ['A(quarterly)', 'B(quarterly)'],
i='id', j='year', sep='-')
X A(quarterly) B(quarterly)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the X column below is not correct? (it has no 1's)

Copy link
Contributor Author

@ProsperousHeart ProsperousHeart May 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X is random ... which didn't make sense to me.

    >>> df = pd.DataFrame({'A(quarterly)-2010': np.random.rand(3),
    ...                    'A(quarterly)-2011': np.random.rand(3),
    ...                    'B(quarterly)-2010': np.random.rand(3),
    ...                    'B(quarterly)-2011': np.random.rand(3),
    ...                    'X' : np.random.randint(3, size=3)})
    >>> df['id'] = df.index
    >>> df
    ... # doctest: +NORMALIZE_WHITESPACE
       A(quarterly)-2010  A(quarterly)-2011  B(quarterly)-2010  B(quarterly)-2011  \
    0           0.548814           0.544883           0.437587           0.383442
    1           0.715189           0.423655           0.891773           0.791725
    2           0.602763           0.645894           0.963663           0.528895
       X  id
    0  0   0
    1  1   1
    2  1   2```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either change X to something not random, or leave it as is(with the random seed, it is also consistent), but the output below should just match the input (which is now not the case I think)

1 0.634401 0.611024 0.361789 0.630976
2 0.849432 0.722443 0.228263 0.092105
\
... # doctest: +NORMALIZE_WHITESPACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also put this on the previous line (like the first example in https://docs.python.org/2/library/doctest.html#directives)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was too long then :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant

>>> df # doctest: +NORMALIZE_WHITESPACE

which would not be too long in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - I'll add in next commit

b 2
two a 3
b 4
dtype: int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one will depend on the platform you are using, so to be more robust, we should probably best add the dtype to np.arange. I would use 'int64', as this is the default in pandas (in numpy the default is platform dependent).
Or as an alternative use range(1, 5), then pandas Series will always convert this to int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, left at int32 - np.arrange and range() did not seem to make a difference so I kept it at np.arrange

@ProsperousHeart
Copy link
Contributor Author

What does it mean when the continuous-integration/travis-ci/py failed?

@jorisvandenbossche
Copy link
Member

In this case it is due to some linting errors (some style issues, based on PEP8 and flake are checked in the third build on travis, which is the one that is failing. If you go to travis and click on the failing build, and scroll down, you can see why).
The output from travis:

$ ci/lint.sh

inside ci/lint.sh
Linting  *.py
pandas/core/reshape/reshape.py:992:80: E501 line too long (84 > 79 characters)
pandas/core/reshape/reshape.py:993:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:994:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:995:80: E501 line too long (81 > 79 characters)
pandas/core/reshape/reshape.py:1000:1: W293 blank line contains whitespace
pandas/core/reshape/reshape.py:1001:80: E501 line too long (88 > 79 characters)
pandas/core/reshape/reshape.py:1015:80: E501 line too long (117 > 79 characters)
pandas/core/reshape/reshape.py:1133:71: W291 trailing whitespace
Linting *.py DONE

@jorisvandenbossche
Copy link
Member

See http://pandas.pydata.org/pandas-docs/stable/contributing.html#python-pep8 (I would recommend you to set up the editor you are using to check for this as well, that should be possible with most code editors)

@ProsperousHeart
Copy link
Contributor Author

I don't see any way (may not have search correct) for Notepad++ and PEP8

@codecov
Copy link

codecov bot commented May 23, 2017

Codecov Report

Merging #16432 into master will decrease coverage by 0.37%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16432      +/-   ##
==========================================
- Coverage   90.79%   90.42%   -0.38%     
==========================================
  Files         161      161              
  Lines       51063    51025      -38     
==========================================
- Hits        46363    46139     -224     
- Misses       4700     4886     +186
Flag Coverage Δ
#multiple 88.26% <ø> (-0.38%) ⬇️
#single 40.17% <ø> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 95.08% <ø> (ø) ⬆️
pandas/core/reshape/reshape.py 99.28% <ø> (-0.01%) ⬇️
pandas/core/reshape/concat.py 97.62% <ø> (ø) ⬆️
pandas/core/reshape/tile.py 90.25% <ø> (ø) ⬆️
pandas/io/formats/excel.py 74.24% <0%> (-22.41%) ⬇️
pandas/io/excel.py 62.31% <0%> (-18.33%) ⬇️
pandas/conftest.py 95.83% <0%> (-0.6%) ⬇️
pandas/io/parsers.py 95.32% <0%> (-0.35%) ⬇️
pandas/core/window.py 96.24% <0%> (-0.24%) ⬇️
pandas/util/testing.py 80.79% <0%> (-0.2%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d9ee0...ffd3e3c. Read the comment docs.

... "one", "two", "two", "two", "one"], dtype=object)
>>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
... "shiny", "dull", "shiny", "shiny", "shiny"],
... dtype=object)

>>> crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this pd.crosstab instead of crosstab ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - will be in next commit

>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3,
labels=["good","medium","bad"])
>>> result, bins = pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]),
... 3, labels=["good","medium","bad"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you still would need

>>> result
... [output]
>>> bins
... [output]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what that would look like :(

@ProsperousHeart
Copy link
Contributor Author

Once confirmed, pivot completed.

@TomAugspurger
Copy link
Contributor

Just pushed some fixes for the pep8 failures, and enabled the doctests on the doc build. We'll check the output of https://travis-ci.org/pandas-dev/pandas/jobs/236410485 when it finishes.

@jorisvandenbossche
Copy link
Member

@ProsperousHeart Thanks a lot for working on this!
Merging this now (follow-up PRs to do further improvements are always welcome!)

The appveyor failure is the matplotlib one, so ignoring that one.

There is one failing doctest, but that seems a python version issue (get different results when testing that locally as well on different python version). Will do another PR to check if changing doc build to py 3.6 solves things.

@jorisvandenbossche jorisvandenbossche merged commit d4f80b0 into pandas-dev:master May 31, 2017
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants