0
Commit Graph

827 Commits

Author SHA1 Message Date
Jakob Stadlhuber
07246d0e1c Update README file in k8n directory
Removed a redundant list item and updated section title for deleting Firecrawl. The "Delete Firecrawl" section is now more concise and does not refer specifically to an environment, making it more generally applicable.
2024-06-04 20:59:04 +02:00
Jakob Stadlhuber
078d4c8d41 Add Kubernetes configuration for Firecrawl deployment
Added new files for setting up Firecrawl on a Kubernetes Cluster. The files include Kubernetes manifests for deploying API, worker, playwright service, and Redis with associated ConfigMap and Secret associated resources. Also, updated the self-host documentation to include instructions for Kubernetes deployment.
2024-06-04 20:52:08 +02:00
Nicolas
fc04d5b033
Merge pull request #235 from mendableai/feat/gdrive-pdfs
[Feat] Added custom scraping for google-drive pdf usecase
2024-06-04 11:31:53 -07:00
rafaelsideguide
5ae4d1caf5 Update single_url.ts 2024-06-04 15:28:09 -03:00
Jakob Stadlhuber
9e5ddec207 Remove default webhook URL from .env.example
The default value for the SELF_HOSTED_WEBHOOK_URL in the .env.example file was removed to prevent unintentional exposure or usage. The users are now required to explicitly specify
2024-06-04 19:56:35 +02:00
Jakob Stadlhuber
6208f4207d Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload
This commit introduces the capability of using a Self-Hosted Webhook URL. The application now checks for a self-hosted URL before querying the database for the webhook settings. If a Self-Hosted Webhook URL is set in the environment variables, it will be used directly, diminishing unnecessary database queries.
2024-06-04 19:55:07 +02:00
rafaelsideguide
93f3098672 build files 2024-06-04 14:54:54 -03:00
rafaelsideguide
64a4338ff0 Update single_url.ts 2024-06-04 14:40:05 -03:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
Rafael Miller
665a40d9f4
Merge pull request #212 from mendableai/bugfix/partial-data-js-sdk
[Bug] Improved js response and test for getting partial_data
2024-06-04 14:05:23 -03:00
rafaelsideguide
1f4c6b7a87 Update package.json 2024-06-04 13:59:48 -03:00
Rafael Miller
19c67916d4
Merge pull request #211 from mendableai/fix/rename-variables
[Fix] Changed timeout parameter name on js sdk
2024-06-04 13:57:58 -03:00
Rafael Miller
f4f87b5374
Merge branch 'main' into bugfix/partial-data-js-sdk 2024-06-04 13:40:42 -03:00
Rafael Miller
f17cb1a0d4
Merge pull request #224 from mattjoyce/playwright-service-bug-222
Playwright service bugs #222  #179  #197
2024-06-04 12:05:56 -03:00
rafaelsideguide
4e3a0495d7 updated version 0.0.12 -> 0.0.13
- [ ] publish
2024-06-04 12:03:55 -03:00
Rafael Miller
b80fb374e5
Merge branch 'main' into playwright-service-bug-222 2024-06-04 11:57:17 -03:00
rafaelsideguide
6920ec8a61 bugfixing. already on main 2024-06-04 11:05:50 -03:00
Nicolas
d6762386f8
Update fly-direct.yml 2024-06-04 01:09:04 -07:00
Nicolas
d10c0839b0
Update fly-direct.yml 2024-06-04 01:04:06 -07:00
Nicolas
5d50b259b7 Create fly-direct.yml 2024-06-04 00:42:07 -07:00
Nicolas
d91b725c6f Update fly.toml 2024-06-04 00:41:15 -07:00
Nicolas
cbf8d79cce Update pdfProcessor.ts 2024-06-04 00:13:37 -07:00
Nicolas
3fc9004ba8 Update fly.toml 2024-06-03 23:49:46 -07:00
Nicolas
0cc7031acb Update fly.yml 2024-06-03 23:47:10 -07:00
Nicolas
2ea01f1456 Update single_url.ts 2024-06-03 23:42:39 -07:00
Nicolas
3563e3ae45 Update fly.yml 2024-06-03 23:34:52 -07:00
Nicolas
854d5b3cb3 Update single_url.ts 2024-06-03 23:32:55 -07:00
Nicolas
99059814a8 Nick: 2024-06-03 21:32:48 -07:00
Nicolas
918059ee9e Merge branch 'main' into nsc/improvemnts-fixes-misc 2024-06-03 16:46:02 -07:00
Nicolas
93bb53271e Merge branch 'nsc/improved-blocklist' 2024-06-03 16:44:33 -07:00
Nicolas
38e583f66c Update socialBlockList.test.ts 2024-06-03 16:44:23 -07:00
Nicolas
b26c5f1588
Merge pull request #185 from mendableai/nsc/improved-blocklist
Improvements to the blocklist regex
2024-06-03 16:43:34 -07:00
Nicolas
c69c89f838 Nick: 2024-06-03 16:42:42 -07:00
Nicolas
48d1ec05b2 Merge branch 'main' into nsc/improved-blocklist 2024-06-03 16:38:03 -07:00
Nicolas
d30ced4394
Merge pull request #221 from mendableai/nsc/fwd-header-auth
feat: Ability to forward headers to reliable providers for auth etc...
2024-06-03 16:33:40 -07:00
Nicolas
d865b0c5c8
Merge pull request #229 from rombru/main
Use @ instead of # for default BULL_AUTH_KEY. Hash mark is reserved for URI fragments.
2024-06-03 12:38:34 -07:00
Romain Bruyère
4987f901d1
Merge branch 'mendableai:main' into main 2024-06-03 21:29:33 +02:00
rafaelsideguide
4100cc9223 Update index.test.ts 2024-06-03 16:29:16 -03:00
rombru
3ff91ddd1f fix: use @ instead of # for default BULL_AUTH_KEY. hash mark is reserved for URI fragments. 2024-06-03 21:28:25 +02:00
rafaelsideguide
c1aed1360e Update index.test.ts 2024-06-03 15:51:07 -03:00
Nicolas
30a0c5de1a
Merge pull request #228 from mendableai/bugfix/fire-engine-content
Fixed fire-engine content bug
2024-06-03 11:42:03 -07:00
rafaelsideguide
1fc3a15149 Update single_url.ts 2024-06-03 15:24:40 -03:00
Eric Ciarla
3ea801d9dd Commit Roast My Website 2024-06-02 20:40:19 -07:00
Eric Ciarla
ea04fe2e3f Add Roast My Website Example 2024-06-02 20:38:05 -07:00
Nicolas
fde522c3e1 Update single_url.ts 2024-06-02 20:23:45 -07:00
Matt Joyce
deefe65cbe Change the way the playwright response is parsed
Was failing with a Type Error, but actually looked ok.
This fixes the type error, and stop scraper fallback.
2024-06-01 19:16:56 +10:00
Matt Joyce
14896a9fdd Fix PLAYWRIGHT_MICROSERVICE_URL
It needs to end in html, otherwise scrape will 404
2024-06-01 19:03:16 +10:00
Matt Joyce
1eacad4ef3 Clarifying wait type and name 2024-06-01 18:53:03 +10:00
Matt Joyce
c516140bfb Various Linting
Pylint
C0114: Missing module docstring
C0115: Missing class docstring
C0116: Missing function or method docstring
C0303: Trailing whitespace
Import ordering
2024-06-01 18:53:03 +10:00
Matt Joyce
2a39b5382b Add timeout to class and provide default. 2024-06-01 18:52:42 +10:00