0
Commit Graph

585 Commits

Author SHA1 Message Date
Nicolas
6fc1ee32fd
Merge pull request #275 from mendableai/feat/issue-273
Added pageOptions.removeTags
2024-06-13 13:27:01 -07:00
rafaelsideguide
bb859ae9a7 Added metadata.pageStatusCode and metadata.pageError properties to the responses 2024-06-13 17:08:40 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
Nicolas
182f8d4d6c Update index.ts 2024-06-12 18:07:05 -07:00
Nicolas
11b6d5afa5 Update fly.toml 2024-06-12 18:00:22 -07:00
Nicolas
67dc46b454 Nick: clusters 2024-06-12 17:53:04 -07:00
rafaelsideguide
d20af257ba Added jobId to webhook data 2024-06-12 15:38:41 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
01c9f071fa fixed 2024-06-12 11:27:06 -03:00
rafaelsideguide
dc6acbf1f0 Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option 2024-06-12 11:01:05 -03:00
Nicolas
f93231499f
Merge pull request #265 from mendableai/feat/issue-264
[Feat] Added route to clean completed jobs and a github action cron that triggers every 24h
2024-06-11 21:33:52 -07:00
Nicolas
45dee63943
Merge pull request #262 from mendableai/nsc/webhook-self-host-fix
Only fetch webhook from db if self host webhook not set and using db auth
2024-06-11 15:46:57 -07:00
rafaelsideguide
157fbe4a1e added bull auth key 2024-06-11 17:52:01 -03:00
rafaelsideguide
df3a678cf4 getting back the cancel test, this should work 2024-06-11 17:46:56 -03:00
rafaelsideguide
def2ba9987 added tests 2024-06-11 17:46:25 -03:00
Nicolas
1e3e06a1d5 Update replacePaths.test.ts 2024-06-11 13:02:39 -07:00
Nicolas
2239e03269 Update replacePaths.test.ts 2024-06-11 12:54:02 -07:00
Nicolas
520739c9f4 Nick: fixed bugs associated with absolute path replacements 2024-06-11 12:43:16 -07:00
Nicolas
b87725c683 Update openapi.json 2024-06-11 12:08:49 -07:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
rafaelsideguide
a9f93c2f1e Added route to clean completed jobs and a github action cron that triggers every 24h 2024-06-11 14:18:05 -03:00
Nicolas
da38dad9a7 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-06-10 18:26:31 -07:00
Nicolas
9390816c1b Update openapi.json 2024-06-10 18:26:25 -07:00
Nicolas
f6b06ac27a Nick: ignoreSitemap, better crawling algo 2024-06-10 18:12:41 -07:00
Nicolas
1bd0327e1a Merge branch 'main' into nsc/pageoptions-crawler 2024-06-10 17:15:10 -07:00
Nicolas
99f2ffd6d5 Update webhook.ts 2024-06-10 17:03:10 -07:00
Nicolas
7ae9778642 Update single_url.ts 2024-06-10 16:57:31 -07:00
Nicolas
913c1dd568 Nick: fetch -> axios and fix timeouts 2024-06-10 16:49:03 -07:00
Nicolas
3091f0134c Nick: 2024-06-10 16:27:10 -07:00
Matt Joyce
827354a116 Added logging to python sdk FIRECRAWL_LOGGING_LEVEL
Instantiates the logger early and depends on env to set.
2024-06-10 21:21:23 +10:00
Nicolas
aafd23fa8a
Merge pull request #252 from mattjoyce/fix-208-py-sdk-interval-poll-name
Fix 208 py sdk interval poll name
2024-06-08 21:33:17 -07:00
Matt Joyce
6fd9ce1c89 type hints and linting 2024-06-08 11:46:52 +10:00
Matt Joyce
7477c5e5bd Use error handler consistently 2024-06-08 11:28:51 +10:00
Matt Joyce
9f306736af More detailed error handling 2024-06-08 11:18:30 +10:00
Matt Joyce
c71ea7a795 Prepare headers consistently 2024-06-08 11:08:26 +10:00
Matt Joyce
8f9a165c2f Lint - whitespace 2024-06-08 08:03:02 +10:00
Matt Joyce
5f0df596ec Align param name with JS SDK
timeout becomes poll_interval
2024-06-08 07:37:08 +10:00
Nicolas
f24ca76618 Nick: removing rate limit emails for now 2024-06-07 10:39:11 -07:00
Nicolas
98d82c4cec Update search.ts 2024-06-06 20:02:21 -07:00
Nicolas
5e80f8af87 Nick: llm extract 50 2024-06-06 18:35:44 -07:00
rafaelsideguide
7b7a6f8a39 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-06-06 17:51:28 -03:00
rafaelsideguide
f2695df215 updated sdk versions 2024-06-06 17:51:12 -03:00
rafaelsideguide
560f256a35 fixing minor problems on workflow 2024-06-06 17:36:48 -03:00
rafaelsideguide
f5318ea7d7 Update index.test.ts 2024-06-06 16:50:20 -03:00
rafaelsideguide
cd7f9abcec Update index.test.ts 2024-06-06 16:44:46 -03:00
rafaelsideguide
7b9b668b95 Update index.test.ts 2024-06-06 16:36:51 -03:00
rafaelsideguide
82e0ed4cd3 Update index.test.ts 2024-06-06 16:33:27 -03:00
rafaelsideguide
dac7612be2 Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk 2024-06-06 16:07:25 -03:00
Nicolas
c2ad358390 Nick: 2024-06-06 12:05:20 -07:00
rafaelsideguide
79ec9f04dc Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk 2024-06-06 15:58:14 -03:00
Nicolas
de06b13deb Update rate-limiter.ts 2024-06-06 11:56:22 -07:00
Nicolas
27a8fd0c3c Update rate-limiter.ts 2024-06-06 11:56:00 -07:00
Nicolas
1129d33321 Update rate-limiter.ts 2024-06-06 11:53:12 -07:00
rafaelsideguide
b234b4be5a Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk 2024-06-06 15:44:29 -03:00
rafaelsideguide
af0bfca847 Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk 2024-06-06 15:36:28 -03:00
rafaelsideguide
8132f22c73 nice 2024-06-06 15:36:20 -03:00
Nicolas
f1b5ec8517 Nick: fixes 2024-06-06 11:23:10 -07:00
Nicolas
deae7dcd61 Update email_notification.ts 2024-06-06 10:41:54 -07:00
Nicolas
f725fa5a97 Update email_notification.ts 2024-06-06 10:41:23 -07:00
rafaelsideguide
fb758fa05e go 2024-06-06 14:01:16 -03:00
Nicolas
0310da6729 Update rate-limiter.ts 2024-06-06 09:31:44 -07:00
Nicolas
01503c1fbf Nick: 2024-06-06 09:29:25 -07:00
rafaelsideguide
b3cae4c858 adding js and testing twine 2024-06-06 13:27:31 -03:00
rafaelsideguide
bc1c1e5053 updating version to check if it runs 2024-06-06 11:41:01 -03:00
Rafael Miller
7686ad5702
Merge pull request #196 from mattjoyce/main
Python-SDK transitional build setup for pyproject.toml
2024-06-06 10:26:16 -03:00
Nicolas
525b4f2a83 Update rate-limiter.ts 2024-06-05 14:38:10 -07:00
Nicolas
d7f8208cdb Update email_notification.ts 2024-06-05 13:53:31 -07:00
Nicolas
ec10eb09f3 Update credit_billing.ts 2024-06-05 13:22:03 -07:00
Nicolas
5991000d2b Update credit_billing.ts 2024-06-05 13:21:15 -07:00
Nicolas
5683bb2cc8 Nick: 2024-06-05 13:20:26 -07:00
rafaelsideguide
164676c70a bugfix screenshot for readme pages 2024-06-05 15:34:42 -03:00
rafaelsideguide
935406b96a Merge branch 'main' into pr/196 2024-06-05 15:19:25 -03:00
Nicolas
b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
rafaelsideguide
0d51b11dcd missing breaks 2024-06-05 15:02:28 -03:00
Rafael Miller
64423441b2
Merge branch 'main' into main 2024-06-05 14:44:29 -03:00
Nicolas
beb7526d1d Update webhook.ts 2024-06-05 10:38:05 -07:00
Nicolas
1a16378fe8
Merge pull request #234 from JakobStadlhuber/feat/webhook-self-hosted
Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
2024-06-05 10:25:05 -07:00
Nicolas
7cb14edec8 Nick: 2024-06-05 10:13:52 -07:00
Rafael Miller
9e000ded03
Merge branch 'main' into feat/better-gdrive-pdf-fetch 2024-06-05 14:07:56 -03:00
rafaelsideguide
ccc55127d6 Added scroll xpaths on fire-engine for handling readme docs 2024-06-05 11:48:41 -03:00
rafaelsideguide
b5045d1661 [feat] improved the scrape for gdrive pdfs 2024-06-04 17:47:28 -03:00
Nicolas
96257b7b17 Update handleCustomScraping.ts 2024-06-04 12:22:46 -07:00
Nicolas
674500affa Nick: 2024-06-04 12:15:39 -07:00
rafaelsideguide
5ae4d1caf5 Update single_url.ts 2024-06-04 15:28:09 -03:00
Jakob Stadlhuber
9e5ddec207 Remove default webhook URL from .env.example
The default value for the SELF_HOSTED_WEBHOOK_URL in the .env.example file was removed to prevent unintentional exposure or usage. The users are now required to explicitly specify
2024-06-04 19:56:35 +02:00
Jakob Stadlhuber
6208f4207d Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
This commit introduces the capability of using a Self-Hosted Webhook URL. The application now checks for a self-hosted URL before querying the database for the webhook settings. If a Self-Hosted Webhook URL is set in the environment variables, it will be used directly, diminishing unnecessary database queries.
2024-06-04 19:55:07 +02:00
rafaelsideguide
93f3098672 build files 2024-06-04 14:54:54 -03:00
rafaelsideguide
64a4338ff0 Update single_url.ts 2024-06-04 14:40:05 -03:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
Rafael Miller
665a40d9f4
Merge pull request #212 from mendableai/bugfix/partial-data-js-sdk
[Bug] Improved js response and test for getting partial_data
2024-06-04 14:05:23 -03:00
rafaelsideguide
1f4c6b7a87 Update package.json 2024-06-04 13:59:48 -03:00
Rafael Miller
19c67916d4
Merge pull request #211 from mendableai/fix/rename-variables
[Fix] Changed timeout parameter name on js sdk
2024-06-04 13:57:58 -03:00
Rafael Miller
f4f87b5374
Merge branch 'main' into bugfix/partial-data-js-sdk 2024-06-04 13:40:42 -03:00
rafaelsideguide
4e3a0495d7 updated version 0.0.12 -> 0.0.13
- [ ] publish
2024-06-04 12:03:55 -03:00
Rafael Miller
b80fb374e5
Merge branch 'main' into playwright-service-bug-222 2024-06-04 11:57:17 -03:00
rafaelsideguide
6920ec8a61 bugfixing. already on main 2024-06-04 11:05:50 -03:00
Nicolas
d91b725c6f Update fly.toml 2024-06-04 00:41:15 -07:00
Nicolas
cbf8d79cce Update pdfProcessor.ts 2024-06-04 00:13:37 -07:00
Nicolas
3fc9004ba8 Update fly.toml 2024-06-03 23:49:46 -07:00
Nicolas
2ea01f1456 Update single_url.ts 2024-06-03 23:42:39 -07:00