This Python project generates probability density function (PDFs) and cumulative distribution functions (CDFs) for the future prices of stocks, as implied by (call) options prices.
These probability distributions reflect market expectations but are not necessarily accurate predictions of future stock prices.
If you believe in the efficient market hypothesis, then the probability distributions generated by this package represent the best available, risk-neutral estimates of future stock price movements, as they are derived from the collective information and expectations of market participants. They can serve as a useful tool for understanding market-implied uncertainty, skewness, and tail risks.
pip install oipdPlease note that this project requires Python 3.10 or later.
The file example.ipynb is supplied as a demo.
The user will need to specify 4 mandatory arguments:
- 
input_data:- Can either be a str: a file path to a CSV, or apd.DataFrame: a pandas DataFrame containing the options data
- The input data must contain at least the following columns: 'strike', 'last_price', 'bid', 'ask'
 
- Can either be a 
- 
current_price: A number representing the underlying asset's current price
- 
days_forward: A number representing the days between the current date and the strike date
- 
risk_free_rate: A number indicating the annual risk-free rate in nominal terms
- 
fit_kernel_pdf: (optional)- A boolean (TrueorFalse). Default:False
- If True, fits a kernel-density estimator (KDE) on the raw probability distribution. KDE may improve edge-behavior of the PDF
 
- A boolean (
- 
save_to_csv: (optional)- A boolean (TrueorFalse). Default:False
- If True, saves the output to a CSV file
 
- A boolean (
- 
output_csv_path: (optional)- A string specifying the file path where the user wishes to save the results
- Required if save_to_csv=True
 
- 
solver_method: (optional)- A string indicating which solver to use
- Options: 'newton'or'brent'. Default:'brent'
 
- 
column_mapping: (optional)- A dictionary mapping user-provided column names to the expected format, like so: {"user_column_name": "expected_column_name"}
- Example:
column_mapping = { "strike_price": "strike", "last_price": "last_price", "bid_price": "bid", "ask_price": "ask" } 
 
- A dictionary mapping user-provided column names to the expected format, like so: 
3 examples of options data is provided in the data/ folder, downloaded from Yahoo Finance.
Note that oipd only uses call options data for now.
import oipd.generate_pdf as op
# Example - SPY
input_csv_path = "path_to_your_options_data_csv"
current_price = 593.83
current_date = "2025-03-03"
strike_date = "2025-05-16"
# Convert the strings to datetime objects
current_date_dt = datetime.strptime(current_date, "%Y-%m-%d")
strike_date_dt = datetime.strptime(strike_date, "%Y-%m-%d")
# Calculate the difference in days
days_difference = (strike_date_dt - current_date_dt).days
spy_pdf = op.run(
    input_data=input_csv_path,
    current_price=float(current_price),
    days_forward=int(days_difference),
    risk_free_rate=0.03,
    fit_kernel_pdf=True,
)Another interesting example is US Steel:
The distribution is bimodal, and given Nippon Steel’s proposed $55 per share acquisition, it can be thought of as two overlapping scenarios:- 
Acquisition is approved: - In this scenario, the share price would likely move above the $55 per share offer. This creates a “second peak” in the distribution
- By inspecting the cumulative prbability table (see example.ipynb), we see there's a 33% probability that the deal is approved and price rises above $55
 
- 
Acquisition falls apart: - Without the approval, the share price may drop back toward a level driven by “business as usual” fundamentals—here, that appears lower than the current $39.39. This is the “first peak” in the distribution
 
Note that the domain (x-axis) is limited in this graph, due to (1) not many strike prices exist for US Steel, and (2) some extreme ITM/OTM options did not have solvable IVs.
An option is a financial derivative that gives the holder the right, but not the obligation, to buy or sell an asset at a specified price (strike price) on a certain date in the future. Intuitively, the value of an option depends on the probability that it will be profitable or "in-the-money" at expiration. If the probability of ending "in-the-money" (ITM) is high, the option is more valuable. If the probability is low, the option is worth less.
As an example, imagine Apple stock (AAPL) is currently $150, and you buy a call option with a strike price of $160 (meaning you can buy Apple at $160 at expiration).
- If Apple is likely to rise to $170, the option has a high probability of being ITM → more valuable
- If Apple is unlikely to go above $160, the option has little chance of being ITM → less valuable
This illustrates how option prices contain information about the probabilities of the future price of the stock (as determined by market expectations). By knowing the prices of options, we can reverse-engineer and extract information contained about probabilities.
For a simplified worked example, see this excellent blog post. For a complete reading of the financial theory, see this paper.
The process of generating the PDFs and CDFs is as follows:
- For an underlying asset, options data along the full range of strike prices are read from a CSV file to create a DataFrame. This gives us a table of strike prices along with the last price1 each option sold for
- Using the Black-Sholes formula, we convert strike prices into implied volatilities (IV)2. IV are solved using either Newton's Method or Brent's root-finding algorithm, as specified by the solver_methodargument.
- Using B-spline, we fit a curve-of-best-fit onto the resulting IVs over the full range of strike prices3. Thus, we have extracted a continuous model from discrete IV observations - this is called the volatility smile
- From the volatility smile, we use Black-Scholes to convert IVs back to prices. Thus, we arrive at a continuous curve of options prices along the full range of strike prices
- From the continuous price curve, we use numerical differentiation to get the first derivative of prices. Then we numerically differentiate again to get the second derivative of prices. The second derivative of prices multiplied by a discount factor $\exp^{r*\uptau}$ , results in the probability density function 4
- We can fit a KDE onto the resulting PDF, which in some cases will improve edge-behavior at very high or very low prices. This is specified by the argument fit_kernal_pdf
- Once we have the PDF, we can calculate the CDF
Feedback is most welcome! I'd love to hear about how you make use of this information, what are the use cases, and what further features you want to see. Email me at [email protected] to chat.
Contributions are also welcome. Please fork the repository, make your changes, and submit a pull request.
The methodology in this package is a basic implementation, and more rigour is necessary for professional use. I will work on these features when I have free time, and contributions to the following would be welcome:
- 
Dividend handling in the current Black-Scholes Model 
- 
Improved support for American options - Currently, all options are assumed to be European for simplicity. To better support American options:
- Implement a method to "de-Americanize" American options in order to be compatible with the existing Black-Scholes implementation, or
- Add an American options pricing model. However with an American model, the current method (using the second derivative to derive the probability density function) is no longer valid, so a new approach would need to be discovered
 
 
- Currently, all options are assumed to be European for simplicity. To better support American options:
- 
Integration of alternative pricing models - Offer the choice among different models, including the following suggestions from users:
- Heston
- Jump-Diffusion
- Binomial Tree
 
 
- Offer the choice among different models, including the following suggestions from users:
DISCLAIMER: This software is provided for informational and educational purposes only and does not constitute financial advice. Use it at your own risk. The author is not responsible for any financial losses or damages resulting from its use. Always consult with a qualified financial professional before making any financial decisions.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Footnotes
- 
We chose to use last price instead of calculating the mid-price given the bid-ask spread. This is because Yahoo Finance, a common source for options chain data, often lacks bid-ask data. See for example Apple options ↩ 
- 
We convert from price-space to IV-space, and then back to price-space as described in step 4. See this blog post for a breakdown of why we do this double conversion ↩ 
- 
See this paper for more details. In summary, options markets contains noise. Therefore, generating a volatility smile through simple interpolation will result in a noisy smile function. Then converting back to price-space will result in a noisy price curve. And finally when we numerically twice differentiate the price curve, noise will be amplified and the resulting PDF will be meaningless. Thus, we need either a parametric or non-parametric model to try to extract the true relationship between IV and strike price from the noisy observations. The paper suggests a 3rd order B-spline as a possible model choice ↩ 

