This package provides both a CLI application and python library for querying video information from the TikTok Research API.
This library requires TikTok Research API access. It does not provide any access by itself.
Python3.11+ is required. Some newer features are directly used and earlier versions won't work (e.g. Walrus, type hinting chaining "|", etc., StrEnum)
You need to put your API credentials in yaml file which the client code will use for authentication. Expected fields (no quotes):
client_id: 123
client_secret: abc
client_key: abc
A query is a combination of a "type (and, or, not)" with multiple Conditions ("Cond")
Each condition is a combination of a "field" (Fields, F), "value" and a operation ("Operations", "Op").
from tiktok_research_api_helper.query import Query, Cond, Fields, Op
query = Query(
and_=[
Cond(Fields.hashtag_name, "garfield", Op.EQ),
Cond(Fields.region_code, "US", Op.EQ),
# Alternative version with multiple countries - Then the operation changes to "IN" instead of "EQ" (equals) as it's a list
# the library handles list vs str natively
# Cond(Fields.region_code, ["US", "UK"], Op.IN),
],
)
TikTokApiClient provides a high-level interface to fetch all api results, and optionally store them in a database
from pathlib import Path
from datetime import datetime
from tiktok_research_api_helper.query import Query, Cond, Fields, Op
from tiktok_research_api_helper.api_client import ApiClientConfig, TikTokApiClient
config = ApiClientConfig(query=query,
start_date=datetime.fromisoformat("2024-03-01"),
end_date=datetime.fromisoformat("2024-03-02"),
engine=None,
api_credentials_file=Path("./secrets.yaml"))
api_client = TikTokApiClient.from_config(config)
# api_results_iter yields each API reponse as a parsed TikTokApiClientFetchResult.
# Iteration stops when the API indicates the query results have been fully delivered
for result in api_client.api_results_iter():
# do something with the result
print(result.videos)
# Alternatively fetch_all fetches all API results and returns a single TikTokApiClientFetchResult with all API results. NOTE: this blocks until all results are fetched which could be multiple days if query results exceed daily quota limit.
api_client.fetch_all()
# If you provide a SqlAlchemy engine in the ApiClientConfig you can use TikTokApiClient to store results as they are received
api_client.fetch_and_store_all() # or equivalent call: fetch_all(store_results_after_each_response=True)
from pathlib import Path
from tiktok_research_api_helper.api_client import TikTokApiRequestClient, TikTokRequest
# reads from secrets.yaml in the same directory
request_client = TikTokApiRequestClient.from_credentials_file(Path("./secrets.yaml"))
from tiktok_research_api_helper.query import Query, Cond, Fields, Op
query = Query(or_=Cond(Fields.video_id, ["7345557461438385450", "123456"], Op.IN))
# sample query
req = TikTokRequest(
query=query,
start_date="20240301",
end_date="20240329",
)
# then fetch the first page of results for the query. NOTE: this does not automatically fetch subsequent pages.
result = request_client.fetch(req)
# to request the next page of resuls, you must create a new request with the cursor and search_id values from previous result. NOTE: make sure to check results.data['has_more'] == true
new_req = TikTokRequest(query=query,
cursor=result.data['cursor'],
search_id=result.data['search_id'],
)
result = request_client.fetch(new_req)
- This library requires TikTok Research API access. It does not provide any access by itself.
- Create a new file
secrets.yaml
in the root folder you are running code from (you can specify a different file with--api-credentials-file
). View thesample_secrets.yaml
file for formatting. The client_id, client_secret and client_key are required. The library automatically manages the access token and refreshes it when needed. - View the
ExampleInterface.ipynb
for a quick example of interfacing with it for small queries.
You can query the API for videos that include and/or exclude hashtags and/or keywords with the following flags:
--include-any-hashtags
--include-all-hashtags
--exclude-any-hashtags
--exclude-all-hashtags
--include-any-keywords
--include-all-keywords
--exclude-any-keywords
--exclude-all-keywords
These flags take a comma separated list of values (eg --include-all-hashtags butter,cheese
, --only-from-usernames amuro,roux
)
flags with any
in the name will query the API for videos that have one or more
of the provided values. for example --include-any-hashtags butter,cheese
would
match videos with hashtags #butter
, #cheese
, and/or #butter #cheese
(ie
both). The same applies for keyword variants of these flags
flags with all
in the name will query the API for videos that have all the
provided values, and would not match videos which only a subset of the provided
values. for example --include-all-hashtags butter,cheese
would match videos
with hashtags #butter #cheese
, but would not match videos with only #butter
but not #cheese
and vice versa. The same applies for keyword variants of these
flags
You can also limit results by username. Either querying for videos only from specific usernames or excluding videos from specific usernames. NOTE: these flags are mutually exclusive:
--only-from-usernames
--exclude-from-usernames
You can also limit the videos by the region in which the use registered their
account with --region
(this flag can be provided multiple times include
multiple regions). See tiktok API documentation for more info about this field
https://developers.tiktok.com/doc/research-api-specs-query-videos/
If you would like to preview the query that would be sent to the API (without
actually sending a request to the API) you can use the command print-query
like so:
$ tiktok-lib print-query --include-all-keywords cheese,butter --exclude-any-hashtags pasta,tomato --region US --region FR --exclude-from-usernames carb_hater,only-vegetables
This prints the JSON query to stdout.
If the provided querying functionality does not meet your needs or you want to
provide your own query use the --query-file-json
flag. This takes a path to a
JSON file that will be used as the query for requests to the API. NOTE: the
provided file is NOT checked for validity.
See tiktok documentation for more info about crafting queries https://developers.tiktok.com/doc/research-api-get-started/
You can use the print-query
command to create a starting point. For example if you
wanted to match videos about shoes with more specific search criteria you could
create a base onto which you would build with something like:
$ tiktok-lib print-query --include-any-keywords shoe,shoes,sneakers,pumps,heels,boots > shoes-query.json
then edit shoes-query.json
as desired, and use it with
$ tiktok-lib run --query-json-file shoes-query.json ...
- For larger queries, first install SQLite.
- Run a test query with
tiktok-lib test
- Edit the
query.yaml
file to include the query you want to run. - View the available commands in the run command with
tiktok-lib run --help
You can also store files in a postgresql database. Use the --db-url
flag to
specify the connection string.
- Currently only video data ("Query Videos") is supported directly.
- Long running queries are automatically split into smaller 28 days chunks. This is to avoid the 30 day limit on the TikTok API.
- The library automatically manages the access token and refreshes it when needed.
- TikTok research API quota is 1000 requests per day (https://developers.tiktok.com/doc/research-api-faq). When the API indicates that limit has been reached this library will retry (see
--rate-limit-wait-strategy
flag for available strategies) until quota limit resets and continue collection. TikTokApiClient
provides a high-level interface for querying TikTok Research API- Handles API pagination (ie requesting results from API until API indicates query results have been completely delivered), access token fetch/refresh, and retry on request failures.
- Client provides an iterator (
api_results_iter
) which yields each parsed API response, orfetch_all
which returns all parsed results in one object. store_fetch_result
stores crawl and videos data to the databasefetch_and_store_all
does all the above (fetching all results from API and storing them in database as responses are received).
- Database
- All "Crawls" (really each request to the API) are stored in a seperate table
crawl
and the data itself invideo
. - Mapping of video <-> crawl is stored in
videos_to_crawls
. - Hashtags are stored in a separate table (with an internal ID, NOT from the
API)
hashtag
, and the mapping of hashtags <-> videos is stored invideos_to_hashtags
- Effect IDs are stored simimarly to Hashtags with an associtation table
videos_to_effect_ids
. The naming ineffect
table can be little confusing becauseid
is the internal database ID, andeffect_id
is the value from the API (which is a string, but this author has only ever seen ints (as strings) from API). - Data is written to DB after every TikTokRequest, by default containing up to 100 instances.
- If a query tag (via the
--query-tag
flag) is provided, crawls and videos are associated to the query tag incrawls_to_query_tags
andvideos_to_query_tags
tables respectively.
- All "Crawls" (really each request to the API) are stored in a seperate table
- Fix warning when retrying - Only show if the retry is unsuccessful
- Add code docs
- Allow for continuing a query directly from the last run.
- Support for other data types (e.g. "Query Users")
- Having a query as python code inside a file?
- To make facilitate not having to write extensive json queries in the CLI.
- Not tiktok-research-client?
- At the time of creation, the library was not available.
Install with pip:
git clone <this repo>
cd tiktok-library
pip install .
OR you can install hatch
(see https://hatch.pypa.io/latest/install/) and run code/tests from that:
git clone <this repo>
cd tiktok-library
hatch --env test run run # run unit tests
hatch run tiktok-lib run --db-url ... # query API
To run unit tests locally (requires pytest installed):
python3 -m pytest
OR use hatch hatch run test:run
To run postgresql integration test (requires docker installed, may have to run as sudo):
docker compose build && docker compose run postgres-integration-test && docker compose down
OR run with hatch (this runs above docker commands as sudo):
hatch run test:postgres-integration-test-docker-as-sudo
To check if ruff would change code (but not actually make changes):
hatch fmt --check
To apply changes from ruff:
hatch fmt
NOTE: formatting fixes will not be applied if linter finds errors, if this
happens you can run the formatter only with hatch fmt --formatter
(good for
when there is a linter issue the formatter can fix).
hatch run jupyter:notebook
Alembic is a tool/framework for database schema migrations. For instructions on how to use and and create revisions see https://alembic.sqlalchemy.org/en/latest/tutorial.html
There is hatch env for this which can be invoked like so:
$ hatch --env alembic run alembic ...
The alembic config in this repo alembic.ini
is a basic config barely modified
from generic default. To use it you will need to define sqlalchemy.url
. If you
want to operate on different database URLs you can use a technique documented in
https://alembic.sqlalchemy.org/en/latest/cookbook.html#run-multiple-alembic-environments-from-one-ini-file
briefly you would add something like the following to alembic.ini
:
[test]
sqlalchemy.url = driver://user:pass@localhost/test_database_name
[prod]
sqlalchemy.url = driver://user:pass@localhost/prod_database_name
Then you can specify which database to use via the config section name:
$ hatch --env alembic run alembic --name test <alembic command>
NOTE: if a database has not had alembic run against it, but nonetheless has up-to-date schema, alembic commands will fail saying it is not on the latest version. You can "stamp" the database as being at HEAD (refering here to alembic versions, NOT git commits) with the following:
$ hatch --env alembic run alembic --name test stamp head