Releases: mendableai/firecrawl
Releases · mendableai/firecrawl
v1.1.1
What's Changed
- feat(python-sdk): Make API key optional for self-hosted instances by @RutamBhagat in #990
- Sitemap fixes by @mogery in #1010
- fixed optional+default bug on llm schema by @rafaelsideguide in #955
- [FIR-37] feat: extract and return favicon URL during scraping by @ftonato in #1018
- fix: merge mock success data by @yujunhui in #1013
- feat(rust-sdk): Make API key optional for self-hosted instances by @RutamBhagat in #991
- feat(scrapeURL/pdf): switch to MU (FIR-356) by @mogery in #1016
New Contributors
Full Changelog: v1.1.0...v1.1.1
v1.1.0
Starting today we are going to be posting weekly releases here and on firecrawl.dev/changelog. This release is just a summary of all the improvements and fixes we pushed since v1 release here. Thank you all for the contributions!
v1.1.0
Changelog Highlights
Feature Enhancements
- New Features:
- Geolocation, mobile scraping, 4x faster parsing, better webhooks,
- Credit packs, auto-recharges and batch scraping support.
- Iframe support and query parameter differentiation for URLs.
- Similar URL deduplication.
- Enhanced map ranking and sitemap fetching.
Performance Improvements
- Faster crawl status filtering and improved map ranking algorithm.
- Optimized Kubernetes setup and simplified build processes.
- Sitemap discoverability and performance improved
Bug Fixes
- Resolved issues:
- Badly formatted JSON, scrolling actions, and encoding errors.
- Crawl limits, relative URLs, and missing error handlers.
- Fixed self-hosted crawling inconsistencies and schema errors.
SDK Updates
- Added dynamic WebSocket imports with fallback support.
- Optional API keys for self-hosted instances.
- Improved error handling across SDKs.
Documentation Updates
- Improved API docs and examples.
- Updated self-hosting URLs and added Kubernetes optimizations.
- Added articles: mastering
/scrape
and/crawl
.
Miscellaneous
- Added new Firecrawl examples
- Enhanced metadata handling for webhooks and improved sitemap fetching.
- Updated blocklist and streamlined error messages.
What's Changed
- Add docs to api spec example by @ericciarla in #637
- [Docs] upgraded the path of the self-hosted documentation URL to
/v1
. by @shige in #635 - Removal of generic classnames/ids from onlyMainContent cleaning by @nickscamara in #638
- Improved team credits check and billing notifications by @nickscamara in #640
- Fixed 500 errors when JSON is badly formatted by @nickscamara in #648
- Better engine for wait + other params by @nickscamara in #649
- fix(py-sdk): removed asyncio package by @rafaelsideguide in #654
- perf(js-sdk): move
dotenv
anduuid
todevDependencies
, fixzod
import by @MonsterDeveloper in #614 - build(js-sdk): simplify build process by @MonsterDeveloper in #611
- fix(v0/crawl-status): don't crash on big crawls when requesting jobs from supa by @mogery in #653
- Manual Rate Limiter for select team ids by @nickscamara in #664
- O1 crawler example by @ericciarla in #676
- [Bug] Fixed screenshot typo and added test for fullpage screenshot by @rafaelsideguide in #677
- v1/map improvements + higher limits by @nickscamara in #674
- Remove print statement in map by @anjor in #612
- fix wrong link to self host documentation by @itasli in #623
- feat: kubernetes example optimization by @yekkhan in #639
- Rust SDK 1.0.0 by @mogery in #689
- feat: Actions by @mogery in #682
- Fix the error message when trying search in v0 by @nickscamara in #690
- remove space in the examples/o1_web_crawler folder name by @h4r5h4 in #679
- o1 job recommender example by @ericciarla in #707
- Move auth and check credits operations into an RPC by @mogery in #704
- bugfix: using onlyIncludeTags and removeTags together by @skeptrunedev in #685
- Concurrency limits by @mogery in #721
- Docs: Remove wait_until_done from python-sdk example by @bytrangle in #728
- Improves error handler in Node SDK to return the status code by @nickscamara in #727
- Fixes crawl failed and webhooks not working properly by @nickscamara in #731
- [BUG] Fixed URLs with params by @rafaelsideguide in #732
- Fixed the self host issues where methods don't work by @nickscamara in #733
- Make sure the entrypoint script has the correct line endings by @busaud in #753
- Rm cluster mode + rm fly deployments by @nickscamara in #754
- Fixed Issue #734 by @Harsh0707005 in #747
- bugfix: self-host crawling doesnt respect limit by @busaud in #755
- [BUG] Fixed missing error handling in JS-SDK by @rafaelsideguide in #759
- [SKD] Cancel Crawl by @rafaelsideguide in #760
- fixed developer.notion special case by @rafaelsideguide in #762
- Spelling Corrections in README by @fadkeabhi in #763
- [RPC] Improvements to credit_usage rpc by @nickscamara in #767
- [BUG] filters failed and unknown jobs now by @rafaelsideguide in #761
- [Doc] Better explained how includePaths and excludePaths work by @rafaelsideguide in #766
- Update README.md by @busaud in #757
- ADDED : Contributors and Back to top by @Ruhi14 in #768
- Retries for ACUC RPC + Price credits fallback by @nickscamara in #773
- [BUG] added check files on crawl by @rafaelsideguide in #779
- [Feat] Performance improvements crawl status filters by @rafaelsideguide in #780
- Admin alerts for high usage by @nickscamara in #783
- Geolocation support for Firecrawl by @nickscamara in #784
- Return all the website metadata by @nickscamara in #785
- Extractor options logging v1 fix by @nickscamara in #788
- Update requirements.txt by @rishi-raj-jain in #790
- Improved /map ranking algorithm for search queries by @nickscamara in #798
- Fix Typos and Grammar in
SELF_HOST.md
by @Mefisto04 in #799 - [Bug] encoding error for special token by @rafaelsideguide in #793
- [BUG-SDK] missing error in response by @rafaelsideguide in #796
- examples: sales web crawler by @rishi-raj-jain in #797
- feat: clear ACUC cache endpoint based on team ID by @mogery in #807
- feat: skipTlsVerification by @tomkosm in #808
- feat: Batch Scrape by @mogery in #789
- feat: Auto Recharge Credits + Credit Packs by @nickscamara in #809
- Remove ph logs for single_urls by @nickscamara in #829
- Bump to gemini-1.5-pro-002 website_qa_with_gemini_caching.ipynb and add flash example by @s-smits in #739
- Add SearchApi as a Web Search Tool by @SebastjanPrachovskij in #628
- RM wait before interacting by @nickscamara in #838
- chore(README.md): use
satisfies
instead ofas
for ts example by @twlite in #831 - Geo-location rename to location by @nickscamara in #830
- concurrency limit fix by @mogery in #824
- [feat] Iframe support by @tomkosm in #855
- Fix go parser by @tomkosm in #856
- Support for the 2 new actions by @nickscamara in #858
- Adds support for mobile web scraping + mobile screenshot by @nickscamara in #847
- [Feat] Added remove base64 images options (true by default) by @rafaelsideguide in #867
- [Fix] Prevent Python Firecrawl logger from interfering with loggers in client applications by @reasonmethis in #613
- [BUG] Added trycatch and removed redundancy by @rafaelsideguide in #869
- Update CONTRIBUTING.md by @swyxio in https://github.com/mendableai/firecrawl/p...
Welcome to v1 - A more reliable and developer friendly API
Firecrawl V1 is here! With that we introduce a more reliable and developer friendly API.
August 29th, 2024
Here is what’s new:
- Output Formats for /scrape. Choose what formats you want your output in.
- New /map endpoint for getting most of the URLs of a webpage.
- Developer friendly API for /crawl/{id} status.
- 2x Rate Limits for all plans.
- Go SDK and Rust SDK
- Teams support
- API Key Management in the dashboard.
- onlyMainContent is now default to true.
- /crawl webhooks and websocket support.
Learn more about it here
Start using v1 right away at https://firecrawl.dev
What's Changed (including v0 + v1)
- Delete .DS_Store by @szepeviktor in #8
- [Bugfix] added normalized apikey to craw/status route by @rafaelsideguide in #12
- [Feat] improving reative paths by @rafaelsideguide in #4
- Fix typos by @szepeviktor in #9
- [Feat] Added html to markdown table parser by @rafaelsideguide in #11
- Option to extract only the main content, excluding headers, navs, footers etc. by @nickscamara in #14
- [Feat] Adding pdf parser by @rafaelsideguide in #17
- adding ci-cd workflow by @rafaelsideguide in #20
- adding workflow by @rafaelsideguide in #21
- adding env secrets by @rafaelsideguide in #22
- [Feat] Added TSDocs and types for js-sdk by @rafaelsideguide in #28
- Added option to replace all relative paths with absolute paths by @rafaelsideguide in #25
- [Bugfix] Fixed scrape preview test by @rafaelsideguide in #30
- Caleb: fixing some documentation and rebuilding the server by @calebpeffer in #32
- Rate limit fixes for crawl status by @nickscamara in #36
- Better logging by @nickscamara in #35
- [Feat] Added type declarations by @rafaelsideguide in #31
- Refactor api routes by @nickscamara in #37
- Logging by @nickscamara in #38
- Cjp/making db auth optional <> Running project locally by @calebpeffer in #40
- chore: add context.close by @mattzcarey in #46
- Fixes table parsing for websites such as news.ycombinator.com (HN) by @nickscamara in #52
- [Feat] Server health check + slack message by @rafaelsideguide in #53
- [Feat] Added blocklist for social media urls by @rafaelsideguide in #55
- [Feat:mvp] Search Endpoint => serp api + firecrawl => 🔥 🔍 by @nickscamara in #56
- [Feat] Added anthropic vision api by @rafaelsideguide in #5
- [Bugfix] Trim and Lowercase all urls by @rafaelsideguide in #13
- Implements the ability for the crawler to output all the links it found, without scraping by @nickscamara in #34
- Serper params by @nickscamara in #62
- Support for tbs, filter, lang, country and location with Serper search. by @rogerserper in #61
- [Feat] Added allowed urls by @rafaelsideguide in #64
- /search support in node sdk by @nickscamara in #72
- Free credits increase by @nickscamara in #75
- [Bugfix] JS-SDK: Remove dotenv and add tests by @mdp in #68
- [Feat] Coupon system by @rafaelsideguide in #66
- Specific website params support by @nickscamara in #83
- Greenpay fixes by @nickscamara in #84
- [Feat] Implemented retry attempts to handle 502 errors by @rafaelsideguide in #67
- feat: LLM Extraction (mvp) by @nickscamara in #90
- Update README.md by @bllchmbrs in #110
- Add Posthog Logging by @ericciarla in #109
- Refactor of main web scraper + Partial data streaming by @nickscamara in #120
- [Feat] Added includeHTML option by @rafaelsideguide in #126
- Cancel Job Route by @nickscamara in #129
- [Feat] Added max depth option by @rafaelsideguide in #130
- Add keyAuth endpoint by @ericciarla in #131
- [Test] Added integration tests suite by @rafaelsideguide in #118
- Adds Zod Integration for LLM Extraction in the Firecrawl JS SDK by @nickscamara in #135
- [Docs] Updated examples by @rafaelsideguide in #137
- Switching to AGPL - We Need Your Consent! by @calebpeffer in #134
- Nsc/refactor scraping order by @nickscamara in #139
- Update models.ts by @ericciarla in #144
- Timeout on /scrape by @nickscamara in #145
- [Doc] Added default value for crawlOptions.limit by @rafaelsideguide in #142
- feat: 4x-5x faster crawler (fast mode) by @nickscamara in #149
- Add Docker Compose for easy self hosting by @chand1012 in #119
- refactor: fix typo in WebScraper/index.ts by @eltociear in #27
- [Tests] Added crawl test suite -> crawl improvements by @rafaelsideguide in #153
- feat: Docx Support by @nickscamara in #158
- Fixes pdfs not found if .pdf is not present by @nickscamara in #29
- Update README.md: Typo fix by @elimisteve in #160
- [Feat] Added rate limits by @rafaelsideguide in #151
- Allow override of API URL by @mattjoyce in #166
- feat: HyperDX Integration by @nickscamara in #167
- beta: Fire-Engine fallback by @nickscamara in #174
- Add additional file extensions to crawler.ts by @tractorjuice in #77
- [Bug] Fixing /crawl limit by @rafaelsideguide in #143
- Update issue templates by @rafaelsideguide in #180
- [Feat] Added proxy and media blocking support for Playwright by @JakobStadlhuber in #181
- update: wait until body attached in playwright-service by @qyou in #170
- feat: Allow privacy/legal/ other pages in social media websites by @nickscamara in #168
- [Bug] Added data check for python SDK by @rafaelsideguide in #176
- Fix FIRECRAWL_API_URL bug, also various PyLint fixes by @mattjoyce in #178
- [Feat] Added idempotency key to crawl route by @rafaelsideguide in #132
- Feat: Provide more details for 429 error msg by @simonha9 in #190
- Limit on /search is not deterministic by @Keredu in #186
- Various PyPi Metadata by @mattjoyce in #191
- [Test] Added sdk e2e tests by @rafaelsideguide in #183
- Allow users to manually set the waitFor param on /scrape by @nickscamara in #200
- [Feat] Added custom scraping conditions for readme docs by @rafaelsideguide in #204
- Feat/screenshot support by @ericciarla in #207
- feat: New pricing/limits changes by @nickscamara in #216
- [sdk] Fixes waiting status not being present on check status by @nickscamara in #218
- Fixed fire-engine content bug by @rafaelsideguide in #228
- Use @ instead of # for default BULL_AUTH_KEY. Hash mark is reserved...