05 Feb 05:48

chris-greening

68bdd4b

v2.1.2 Latest

Latest

Fix

Fixed wrong profile_pic_url and profile_pic_url_hd scrapes when passing a sessionid to Profile.scrape

Assets 2

20 Jan 01:28

chris-greening

v2.1.0

fe7b6ce

v2.1.0

New feature

`instascrape.scrape_tools.scrape_posts`

Takes a list of unscraped instascrape.Post objects and scrapes them with a variety of different configurations and options for usage. Returns successfully scraped posts as well as the posts that were not successfully scraped.

Sample Usage

from instascrape import Post, scrape_posts

# Some code creating a list of posts and valid header info, etc...

# Scrape the first 100 posts 
scraped_posts, unscraped_posts = scrape_posts(posts_list, headers=headers, limit=100)

# Scrape all posts since January 1st, 2020
import datetime 
scraped_posts, unscraped_posts = scrape_posts(posts_list, headers=headers, limit=datetime.date(2020, 1, 1))

etc.

Available arguments

posts : List[instascrape.Post]
Required, list of unscraped Post objects
session : requests.Session
Optional, custom requests.Session object
webdriver : selenium.webdriver.chrome.webdriver.WebDriver
Optional, custom Selenium webdriver (overrides session if passed)
limit : Union[int, datetime.datetime]
Optional, integer or date value to stop scraping at. Defaults to all posts
headers : dict
Optional, dictionary of request headers
pause : int
Optional, pause between scrapes
on_exception : str
Optional, available options when an exception occurs are "raise", "pass", "return". Defaults to "raise".
silent : bool
Optional, print output while scraping. Defaults to True (no output)
inplace : bool
Optional, directly modifies the post objects that are passed. Otherwise, creates a copy and returns lists of copies

Assets 2

17 Jan 17:34

chris-greening

v2.0.2

354becd

v2.0.2

Fixes

Fixed default None argument for instascrape.scrapers.Profile.get_posts. Passing a specific amount works but not passing anything resulted in a comparison between NoneType and int

Assets 2

17 Jan 04:34

chris-greening

v2.0.0

78255b5

v2.0.0

New features

Below is a list of new features

scrape tools

json_from_soup

Returns JSON Instagram data from BeautifulSoup

flatten_dict

Returns a flattened dictionary of all leaf nodes in a tree of JSON data

New flatten argument for json_from_* functions, returns a flattened dictionary

scrapers

New inplace argument for the scrape method

Similar to the pandas inplace parameter except the default is True as opposed to pandas's False. By default, scrape will modify an instance inplace, setting attributes equal to the scraped data. If False, the current instance will remain untouched and scrape will instead return another instance with the scraped data. Useful for chaining methods

New 'sessionparameter for thescrape` method

Allows passing of a custom session object

New webdriver parameter for the scrape method

Uses a webdriver for scraping the data instead of a session

Fixes

fixed Post scraper KeyError that was occuring on all scrapes

Breaking changes

Below is a list of breaking changes to the library

Renamed instascrape.scrapers.json_tools to instascrape.scrapers.scrape_tools
Renamed parse_json_from_mapping function to parse_data_from_json
Removed FlatJSONDict, replaced with the flatten_dict function in scrape_tools that will flatten any dictionary
json_from_* functions now return a list of all JSON dictionary's from the page as opposed to just the first dictionary.

Non-breaking changes behind the scenes

Below is a list of everything that changed behind the scenes that has no bearing on the API

refactored out a lot of complexity from instascrape.core._static_scraper._StaticHtmlScraper's implementation, greatly improving code readability
Changed imports to reflect file moves
Reimplemented to rely more on reusable functions as opposed to static methods unnecessarily bound to classes
Changed how data is loaded into namespace when using the scrape method to make room for the inplace argument. inplace is defaulted as True so this doesn't break any existing code but instead provides a new alternative.
updated documentation with docstrings

Assets 2

26 Dec 16:37

chris-greening

v1.7.1

cb8fac6

v1.7.1

Deprecated data point

Removed business_email as an available data point from instascrape.scrapers.Profile scraper. Instagram seems to have removed the ability to view business email's from the web version of the platform and all values were being returned as nan. This will be explored further in the future but for now it is being removed.

Assets 2

22 Dec 19:17

chris-greening

v1.7.0

58dcd7f

v1.7.0

Deprecations

Officially removed deprecated methods from all scrapers as listed below

All scrapers

load instance method

instascrape.scrapers.Hashtag

from_profile class method

instascrape.scrapers.Post

from_shortcode class method

instascrape.scrapers.Profile

from_username class method

The functionality for all of these methods is covered by the scrape instance method and are thus redundant and less powerful.

Documentation

Removed misleading documentation for outdated scrapers. Improved existing scrapers
Added and improved type hints

Assets 2

14 Dec 06:02

chris-greening

v1.6.1

e7cb048

v1.6.1

Docs

Added type hints for better documentation

Assets 2

14 Dec 03:16

chris-greening

v1.6.0

9c8d610

v1.6.0

New feature

Added instascrape.scrapers.IGTV for scraping IGTV posts. instascrape.scrapers.IGTV is a subclass of instascrape.scrapers.Post and thus inherited all of its methods and behaviors

Sample usage:

from instascrape import IGTV 
google_igtv = IGTV('https://www.instagram.com/tv/CIrIIMYl8VQ/')
google_igtv.scrape()

Assets 2

14 Dec 00:03

chris-greening

v1.5.0

f4466ba

v1.5.0

New feature

Introduced the Reel scraper for scraping Instagram reels. Reel is a subclass of Post so pretty much everything you expect from Post is available in Reel as well.

Sample usage:

from instascrape import Reel
sample_reel = Reel("https://www.instagram.com/reel/CIrJSrFFHM_/")
sample_reel.scrape()

Bug fixes

json_from_url

Added optional/default request headers argument to instascrape.scrapers.json_from_url

unit tests

Fixed some of the broken unit tests. The library was fine but some of the tests were a little outdated and needed what appears to be required browser headers now to run properly.

Assets 2

10 Dec 22:50

chris-greening

v1.4.0

2c4bbb2

v1.4.0

New features

Location scraper

Ability to scrape Instagram Location pages.

Sample usage

from instascrape import Location 
url = "https://www.instagram.com/explore/locations/212988663/new-york-new-york/"
new_york = Location(url)
new_york.scrape()
print(f"{new_york.amount_of_posts:,} people have been to New York"
>>> 61,202,403 people have been to New York

Optional header for requests

Now supports passing an optional browser header to the scrape method of all scraper objects. Syntax is exactly the same as a header dict you would pass to requests.get.

The default header is

headers={"User-Agent": "user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57"}

Sample usage is

from instascrape import Profile 
headers={"User-Agent": "user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57"}
google = Profile("google")
google.scrape(headers=headers)

Fixes

It appears Instagram tightened restrictions overnight, all GET requests from the library were being returned 429 HTTP response status codes (Too Many Requests). Prior to now, instascrape did not pass or have any support for passing browser headers. This newest default and option to pass in headers seems to have returned library functioning for now. Keep an eye out for more robust session handling and better cookie support in later updates

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix

New feature

`instascrape.scrape_tools.scrape_posts`

Sample Usage

Available arguments

New features

scrape tools

scrapers

Fixes

Breaking changes

Non-breaking changes behind the scenes

Deprecated data point

Deprecations

All scrapers

instascrape.scrapers.Hashtag

instascrape.scrapers.Post

instascrape.scrapers.Profile

Documentation

Docs

New feature

Sample usage:

New feature

Sample usage:

Bug fixes

json_from_url

unit tests

New features

Location scraper

Optional header for requests

Fixes

Releases: chris-greening/instascrape

v2.1.2

Fix

v2.1.0

New feature

instascrape.scrape_tools.scrape_posts

Sample Usage

Available arguments

v2.0.2

v2.0.0

New features

scrape tools

scrapers

Fixes

Breaking changes

Non-breaking changes behind the scenes

v1.7.1

Deprecated data point

v1.7.0

Deprecations

All scrapers

instascrape.scrapers.Hashtag

instascrape.scrapers.Post

instascrape.scrapers.Profile

Documentation

v1.6.1

Docs

v1.6.0

New feature

Sample usage:

v1.5.0

New feature

Sample usage:

Bug fixes

json_from_url

unit tests

v1.4.0

New features

Location scraper

Optional header for requests

Fixes

`instascrape.scrape_tools.scrape_posts`