Nicolas
98d82c4cec
Update search.ts
2024-06-06 20:02:21 -07:00
Nicolas
5e80f8af87
Nick: llm extract 50
2024-06-06 18:35:44 -07:00
rafaelsideguide
7b7a6f8a39
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-06-06 17:51:28 -03:00
rafaelsideguide
f2695df215
updated sdk versions
2024-06-06 17:51:12 -03:00
rafaelsideguide
560f256a35
fixing minor problems on workflow
2024-06-06 17:36:48 -03:00
rafaelsideguide
f5318ea7d7
Update index.test.ts
2024-06-06 16:50:20 -03:00
rafaelsideguide
cd7f9abcec
Update index.test.ts
2024-06-06 16:44:46 -03:00
rafaelsideguide
7b9b668b95
Update index.test.ts
2024-06-06 16:36:51 -03:00
rafaelsideguide
82e0ed4cd3
Update index.test.ts
2024-06-06 16:33:27 -03:00
rafaelsideguide
dac7612be2
Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
2024-06-06 16:07:25 -03:00
Nicolas
c2ad358390
Nick:
2024-06-06 12:05:20 -07:00
rafaelsideguide
79ec9f04dc
Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
2024-06-06 15:58:14 -03:00
Nicolas
de06b13deb
Update rate-limiter.ts
2024-06-06 11:56:22 -07:00
Nicolas
27a8fd0c3c
Update rate-limiter.ts
2024-06-06 11:56:00 -07:00
Nicolas
1129d33321
Update rate-limiter.ts
2024-06-06 11:53:12 -07:00
rafaelsideguide
b234b4be5a
Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
2024-06-06 15:44:29 -03:00
rafaelsideguide
af0bfca847
Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
2024-06-06 15:36:28 -03:00
rafaelsideguide
8132f22c73
nice
2024-06-06 15:36:20 -03:00
Nicolas
f1b5ec8517
Nick: fixes
2024-06-06 11:23:10 -07:00
Nicolas
deae7dcd61
Update email_notification.ts
2024-06-06 10:41:54 -07:00
Nicolas
f725fa5a97
Update email_notification.ts
2024-06-06 10:41:23 -07:00
rafaelsideguide
fb758fa05e
go
2024-06-06 14:01:16 -03:00
Nicolas
0310da6729
Update rate-limiter.ts
2024-06-06 09:31:44 -07:00
Nicolas
01503c1fbf
Nick:
2024-06-06 09:29:25 -07:00
rafaelsideguide
b3cae4c858
adding js and testing twine
2024-06-06 13:27:31 -03:00
rafaelsideguide
bc1c1e5053
updating version to check if it runs
2024-06-06 11:41:01 -03:00
Rafael Miller
7686ad5702
Merge pull request #196 from mattjoyce/main
...
Python-SDK transitional build setup for pyproject.toml
2024-06-06 10:26:16 -03:00
Nicolas
525b4f2a83
Update rate-limiter.ts
2024-06-05 14:38:10 -07:00
Nicolas
d7f8208cdb
Update email_notification.ts
2024-06-05 13:53:31 -07:00
Nicolas
ec10eb09f3
Update credit_billing.ts
2024-06-05 13:22:03 -07:00
Nicolas
5991000d2b
Update credit_billing.ts
2024-06-05 13:21:15 -07:00
Nicolas
5683bb2cc8
Nick:
2024-06-05 13:20:26 -07:00
rafaelsideguide
164676c70a
bugfix screenshot for readme pages
2024-06-05 15:34:42 -03:00
rafaelsideguide
935406b96a
Merge branch 'main' into pr/196
2024-06-05 15:19:25 -03:00
Nicolas
b4c6819a54
Nick:
2024-06-05 11:11:09 -07:00
rafaelsideguide
0d51b11dcd
missing breaks
2024-06-05 15:02:28 -03:00
Rafael Miller
64423441b2
Merge branch 'main' into main
2024-06-05 14:44:29 -03:00
Nicolas
beb7526d1d
Update webhook.ts
2024-06-05 10:38:05 -07:00
Nicolas
1a16378fe8
Merge pull request #234 from JakobStadlhuber/feat/webhook-self-hosted
...
Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
2024-06-05 10:25:05 -07:00
Nicolas
7cb14edec8
Nick:
2024-06-05 10:13:52 -07:00
Rafael Miller
9e000ded03
Merge branch 'main' into feat/better-gdrive-pdf-fetch
2024-06-05 14:07:56 -03:00
rafaelsideguide
ccc55127d6
Added scroll xpaths on fire-engine for handling readme docs
2024-06-05 11:48:41 -03:00
rafaelsideguide
b5045d1661
[feat] improved the scrape for gdrive pdfs
2024-06-04 17:47:28 -03:00
Nicolas
96257b7b17
Update handleCustomScraping.ts
2024-06-04 12:22:46 -07:00
Nicolas
674500affa
Nick:
2024-06-04 12:15:39 -07:00
rafaelsideguide
5ae4d1caf5
Update single_url.ts
2024-06-04 15:28:09 -03:00
Jakob Stadlhuber
9e5ddec207
Remove default webhook URL from .env.example
...
The default value for the SELF_HOSTED_WEBHOOK_URL in the .env.example file was removed to prevent unintentional exposure or usage. The users are now required to explicitly specify
2024-06-04 19:56:35 +02:00
Jakob Stadlhuber
6208f4207d
Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
...
This commit introduces the capability of using a Self-Hosted Webhook URL. The application now checks for a self-hosted URL before querying the database for the webhook settings. If a Self-Hosted Webhook URL is set in the environment variables, it will be used directly, diminishing unnecessary database queries.
2024-06-04 19:55:07 +02:00
rafaelsideguide
93f3098672
build files
2024-06-04 14:54:54 -03:00
rafaelsideguide
64a4338ff0
Update single_url.ts
2024-06-04 14:40:05 -03:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
...
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
Rafael Miller
665a40d9f4
Merge pull request #212 from mendableai/bugfix/partial-data-js-sdk
...
[Bug] Improved js response and test for getting partial_data
2024-06-04 14:05:23 -03:00
rafaelsideguide
1f4c6b7a87
Update package.json
2024-06-04 13:59:48 -03:00
Rafael Miller
19c67916d4
Merge pull request #211 from mendableai/fix/rename-variables
...
[Fix] Changed timeout parameter name on js sdk
2024-06-04 13:57:58 -03:00
Rafael Miller
f4f87b5374
Merge branch 'main' into bugfix/partial-data-js-sdk
2024-06-04 13:40:42 -03:00
rafaelsideguide
4e3a0495d7
updated version 0.0.12 -> 0.0.13
...
- [ ] publish
2024-06-04 12:03:55 -03:00
Rafael Miller
b80fb374e5
Merge branch 'main' into playwright-service-bug-222
2024-06-04 11:57:17 -03:00
rafaelsideguide
6920ec8a61
bugfixing. already on main
2024-06-04 11:05:50 -03:00
Nicolas
d91b725c6f
Update fly.toml
2024-06-04 00:41:15 -07:00
Nicolas
cbf8d79cce
Update pdfProcessor.ts
2024-06-04 00:13:37 -07:00
Nicolas
3fc9004ba8
Update fly.toml
2024-06-03 23:49:46 -07:00
Nicolas
2ea01f1456
Update single_url.ts
2024-06-03 23:42:39 -07:00
Nicolas
854d5b3cb3
Update single_url.ts
2024-06-03 23:32:55 -07:00
Nicolas
99059814a8
Nick:
2024-06-03 21:32:48 -07:00
Nicolas
918059ee9e
Merge branch 'main' into nsc/improvemnts-fixes-misc
2024-06-03 16:46:02 -07:00
Nicolas
38e583f66c
Update socialBlockList.test.ts
2024-06-03 16:44:23 -07:00
Nicolas
c69c89f838
Nick:
2024-06-03 16:42:42 -07:00
Nicolas
48d1ec05b2
Merge branch 'main' into nsc/improved-blocklist
2024-06-03 16:38:03 -07:00
Nicolas
d30ced4394
Merge pull request #221 from mendableai/nsc/fwd-header-auth
...
feat: Ability to forward headers to reliable providers for auth etc...
2024-06-03 16:33:40 -07:00
Romain Bruyère
4987f901d1
Merge branch 'mendableai:main' into main
2024-06-03 21:29:33 +02:00
rafaelsideguide
4100cc9223
Update index.test.ts
2024-06-03 16:29:16 -03:00
rombru
3ff91ddd1f
fix: use @ instead of # for default BULL_AUTH_KEY. hash mark is reserved for URI fragments.
2024-06-03 21:28:25 +02:00
rafaelsideguide
c1aed1360e
Update index.test.ts
2024-06-03 15:51:07 -03:00
rafaelsideguide
1fc3a15149
Update single_url.ts
2024-06-03 15:24:40 -03:00
Nicolas
fde522c3e1
Update single_url.ts
2024-06-02 20:23:45 -07:00
Matt Joyce
deefe65cbe
Change the way the playwright response is parsed
...
Was failing with a Type Error, but actually looked ok.
This fixes the type error, and stop scraper fallback.
2024-06-01 19:16:56 +10:00
Matt Joyce
14896a9fdd
Fix PLAYWRIGHT_MICROSERVICE_URL
...
It needs to end in html, otherwise scrape will 404
2024-06-01 19:03:16 +10:00
Matt Joyce
1eacad4ef3
Clarifying wait type and name
2024-06-01 18:53:03 +10:00
Matt Joyce
c516140bfb
Various Linting
...
Pylint
C0114: Missing module docstring
C0115: Missing class docstring
C0116: Missing function or method docstring
C0303: Trailing whitespace
Import ordering
2024-06-01 18:53:03 +10:00
Matt Joyce
2a39b5382b
Add timeout to class and provide default.
2024-06-01 18:52:42 +10:00
Nicolas
8cb62dde92
Update website_params.ts
2024-05-31 16:09:39 -07:00
Nicolas
3b8059edb6
Update single_url.ts
2024-05-31 15:43:06 -07:00
Nicolas
6bea803120
Nick:
2024-05-31 15:39:54 -07:00
Nicolas
2139129296
Nick: v12
2024-05-31 11:39:55 -07:00
Nicolas
260e31c68b
Merge branch 'nsc/new-pricing'
2024-05-30 16:08:31 -07:00
Nicolas
aa8133ca7f
Update load-testing-example.ts
2024-05-30 16:07:14 -07:00
Nicolas
0c115c6181
Merge pull request #216 from mendableai/nsc/new-pricing
...
feat: New pricing/limits changes
2024-05-30 15:36:59 -07:00
Nicolas
6860ace4af
Nick:
2024-05-30 15:07:49 -07:00
Nicolas
6ceb7ff50a
Nick:
2024-05-30 14:46:55 -07:00
Nicolas
33f10a7f91
Nick: fixes
2024-05-30 14:42:32 -07:00
Nicolas
ace46f340b
Nick: new limits, new pricing
2024-05-30 14:31:36 -07:00
Matt Joyce
5c4b3e8f8a
Initial pyproject.toml
...
This will enable building using 'python -m build', without impacting the utility of setup.py, also provide a base for other build tools and automation.
2024-05-30 21:48:40 +10:00
Matt Joyce
dec225d368
Move version to __init__.py
...
Setup.py does not need to be edited when building the package.
2024-05-30 21:48:40 +10:00
rafaelsideguide
2b763d848b
improved js response and test for getting partial_data
2024-05-30 08:44:38 -03:00
rafaelsideguide
5b8b6902e7
Update index.ts
2024-05-30 08:25:13 -03:00
Nicolas
6c939d534d
Nick: small refactor
2024-05-29 19:43:51 -07:00
Eric Ciarla
37915e11e8
Final push
2024-05-29 21:18:24 -04:00
Eric Ciarla
a0e404f94e
init commit
2024-05-29 18:56:57 -04:00
rafaelsideguide
ee9a2184e2
Added custom scraping conditions for readme docs
2024-05-29 13:39:43 -03:00
Nicolas
c20c38721d
Update index.test.ts
2024-05-28 17:17:20 -07:00
Nicolas
0f43a12906
Update index.test.ts
2024-05-28 17:17:12 -07:00
Nicolas
f53d25efac
Merge branch 'main' into nsc/wait-for-param
2024-05-28 12:56:28 -07:00
Nicolas
1b3547dcf2
Nick:
2024-05-28 12:56:24 -07:00
rafaelsideguide
71187b03a2
added timeout
2024-05-27 16:48:08 -03:00
rafaelsideguide
d5c83803cd
fixing idempotency test
2024-05-27 16:35:01 -03:00
rafaelsideguide
41c4ef6a82
dotenv was missing
2024-05-27 16:23:57 -03:00
rafaelsideguide
127d2db1dd
added js/ts sdk tests
2024-05-27 15:54:09 -03:00
rafaelsideguide
a9b68d95d8
Update test.py
2024-05-27 14:28:44 -03:00
rafaelsideguide
667d3e4c4f
Merge branch 'test-sdks' of https://github.com/mendableai/firecrawl into test-sdks
2024-05-27 14:23:39 -03:00
rafaelsideguide
19decd1062
fixing workflow
2024-05-27 14:21:33 -03:00
Rafael Miller
3c8edf683c
Merge branch 'main' into test-sdks
2024-05-27 14:15:18 -03:00
rafaelsideguide
63772ea711
added github action workflow
2024-05-27 14:14:00 -03:00
Nicolas
1ef307cb6f
Nick: checks
2024-05-27 10:01:12 -07:00
Nicolas
01cc91c53d
Update fly.staging.toml
2024-05-27 10:00:52 -07:00
Nicolas
1de53cc4d0
Nick: fixes
2024-05-26 18:15:05 -07:00
Nicolas
efb821d63b
Merge branch 'main' into main
2024-05-26 18:12:23 -07:00
Nicolas
ed4226fd1f
Update setup.py
2024-05-26 18:11:54 -07:00
Nicolas
1bbfb98d7e
Merge pull request #186 from Keredu/main
...
Limit on /search is not deterministic
2024-05-26 18:08:16 -07:00
Nicolas
67a53a9ae0
Merge pull request #190 from simonha9/simonha9/improve-rate-limit-error-msg
...
Feat: Provide more details for 429 error msg
2024-05-26 18:07:42 -07:00
Nicolas
7e2df7bd5e
Update auth.ts
2024-05-26 18:07:21 -07:00
Nicolas
7948c6cee2
Nick: fixed pip issues
2024-05-26 18:03:37 -07:00
Matt Joyce
b061e12030
added python versions requirement
...
this is inline with requests module, a critical dependency
2024-05-26 11:37:47 +10:00
Matt Joyce
f00dffbbb1
added misc PyPi keys
...
help potential users find and understand the purpose and status of the project.
2024-05-26 11:36:29 +10:00
Matt Joyce
cd7f260288
Added PyPi classifiers
...
These classifiers will help potential users find and understand the purpose and status of the project. use python 3.8 as the base, because that's what module 'requests' needs.
2024-05-26 11:33:28 +10:00
Matt Joyce
e5c6ac23fe
Added long description to PyPi
...
https://packaging.python.org/en/latest/guides/making-a-pypi-friendly-readme/
2024-05-26 10:01:35 +10:00
Simon H
115204e6b6
Feat: Provide more details for 429 error msg
...
- Added better error code for when rate limit exceeded including
consumed/remaining points, reset date and retry-after seconds
2024-05-25 12:03:20 -04:00
Keredu
2192978f91
Limit on /search is not deterministic
2024-05-25 00:12:26 +02:00
Nicolas
e98434606d
Update blocklist.ts
2024-05-24 15:04:15 -07:00
Nicolas
e5c8719554
Update blocklist.ts
2024-05-24 14:53:04 -07:00
rafaelsideguide
397769c7e3
added python sdk e2e tests with pytest
...
some of them are still missing though
2024-05-24 17:56:27 -03:00
rafaelsideguide
d39860c08b
Merge branch 'main' into feat/idempotency-key
2024-05-24 14:15:37 -03:00
Nicolas
8c380d70a5
Update firecrawl.py
2024-05-24 09:48:48 -07:00
Nicolas
65fe9c4f80
Merge branch 'main' into main
2024-05-24 09:47:12 -07:00
Rafael Miller
53a7ec0f6e
Removed hard coded timeout
2024-05-24 13:46:16 -03:00
Nicolas
e0d979edad
Merge pull request #176 from mendableai/bug/data-check-in-python-sdk
...
[Bug] Added data check for python SDK
2024-05-24 09:45:39 -07:00
Nicolas
53a214cefb
Merge pull request #168 from mendableai/nsc/allowed-keywords-in-blocklist
...
feat: Allow privacy/legal/ other pages in social media websites
2024-05-24 09:43:15 -07:00
Nicolas
e166c07690
Merge pull request #170 from qyou/fix-hardcode-timeout
...
update: wait until body attached in playwright-service
2024-05-24 09:41:27 -07:00
Jakob Stadlhuber
9fc5a0ff98
Update comment in .env.example for proxy settings
...
This commit modifies the comment in .env.example to specify that proxy settings are for Playwright. This clarification aims to provide users a more clear context about when and why these proxy settings are used.
2024-05-24 17:45:59 +02:00
Jakob Stadlhuber
b001aded46
Add proxy and media blocking configurations
...
Updated environment variables and application settings to include proxy configurations and media blocking option. The proxy settings allow users to use a proxy service, while the media blocking is an optional feature that can help save bandwidth. Changes have been made in the .env.example, docker-compose.yaml, and main.py files.
2024-05-24 17:41:34 +02:00
rafaelsideguide
7ca431b202
crawl load tests 7 and 8
2024-05-23 16:36:05 -03:00
rafaelsideguide
c201ea1986
added idempotency key to python sdk
2024-05-23 12:52:59 -03:00
rafaelsideguide
35927a65a5
Merge branch 'main' into feat/idempotency-key
2024-05-23 12:20:06 -03:00
rafaelsideguide
184e4678f1
bugfix on idempotency key check
2024-05-23 11:47:04 -03:00
Matt Joyce
96630154d3
Merge pull request #1 from mendableai/main
...
Fix FIRECRAWL_API_URL bug, also various PyLint fixes
2024-05-23 09:16:03 +10:00
Matt Joyce
106c18d11f
Use truthiness check for 'success' key in API response
...
PyLint C0121
2024-05-23 08:57:53 +10:00
Matt Joyce
5c21aed9c7
adding pylintrc to allow longer lines
2024-05-23 08:45:56 +10:00
Matt Joyce
48e91c89e7
Removed unnecessary If block
...
PyLint R1731
2024-05-23 08:42:07 +10:00
Matt Joyce
7d2efe5acb
Added request timeouts
...
connection timeout to 5 seconds and the response timeout to 10
PyLint W3101
2024-05-23 08:39:19 +10:00
Matt Joyce
96b19172a1
Removed trailing whitespace
...
PyLint C0303: Trailing whitespace (trailing-whitespace)
2024-05-23 08:30:23 +10:00
Matt Joyce
6216c85322
Time module already imported
...
Pylint
W0404: Reimport 'time' (imported line 16) (reimported)
C0415: Import outside toplevel (time) (import-outside-toplevel)
2024-05-23 08:21:32 +10:00