0
Commit Graph

368 Commits

Author SHA1 Message Date
Jakob Stadlhuber
9fc5a0ff98 Update comment in .env.example for proxy settings
This commit modifies the comment in .env.example to specify that proxy settings are for Playwright. This clarification aims to provide users a more clear context about when and why these proxy settings are used.
2024-05-24 17:45:59 +02:00
Jakob Stadlhuber
b001aded46 Add proxy and media blocking configurations
Updated environment variables and application settings to include proxy configurations and media blocking option. The proxy settings allow users to use a proxy service, while the media blocking is an optional feature that can help save bandwidth. Changes have been made in the .env.example, docker-compose.yaml, and main.py files.
2024-05-24 17:41:34 +02:00
rafaelsideguide
35927a65a5 Merge branch 'main' into feat/idempotency-key 2024-05-23 12:20:06 -03:00
rafaelsideguide
184e4678f1 bugfix on idempotency key check 2024-05-23 11:47:04 -03:00
rafaelsideguide
4dfc371241 Update index.test.ts 2024-05-22 14:38:41 -03:00
rafaelsideguide
f4a3469b9e Merge branch 'main' into bug/crawl-limit 2024-05-22 14:27:28 -03:00
Nicolas
0d187f0425
Merge pull request #77 from tractorjuice/patch-1
Add additional file extensions to crawler.ts
2024-05-22 10:16:49 -07:00
Nicolas
cb2bd0e71f Update index.test.ts 2024-05-21 19:03:32 -07:00
Nicolas
253abb849f Update rate-limiter.ts 2024-05-21 18:53:58 -07:00
Nicolas
229b9908d2 Nick: only enable hyper dx in prod 2024-05-21 18:52:46 -07:00
Nicolas
a8ff295977 Update single_url.ts 2024-05-21 18:50:42 -07:00
Nicolas
a5e718b084 Nick: improvements 2024-05-21 18:34:23 -07:00
Nicolas
6285f12cd1
Merge pull request #167 from mendableai/nsc/hyper-dx-integration
feat: HyperDX Integration
2024-05-21 13:19:38 -07:00
Nicolas
7f64fe884a Update blocklist.ts 2024-05-20 17:26:01 -07:00
Nicolas
756f54466d Nick: allowed keywords for now 2024-05-20 17:24:21 -07:00
Nicolas
01783dc336 Update openapi.json 2024-05-20 17:10:55 -07:00
Nicolas
77a79b5a79 Nick: max num tokens for llm extract (for now) + slice the max 2024-05-20 17:07:38 -07:00
Nicolas
2644e1c029 Update .env.example 2024-05-20 13:36:51 -07:00
Nicolas
9e61d431f0 Nick: hyper dx integration init 2024-05-20 13:36:34 -07:00
Nicolas
c74f757b53 Update rate-limiter.ts 2024-05-19 13:05:36 -07:00
Nicolas
98a39b39ab Nick: increased rate limits 2024-05-19 12:59:29 -07:00
Nicolas
18fa15df25 Update index.test.ts 2024-05-19 12:50:06 -07:00
Nicolas
614c073af0 Nick: improvements 2024-05-19 12:45:46 -07:00
Nicolas
f473793ba3 Merge branch 'main' into feat/rate-limits 2024-05-19 12:23:34 -07:00
rafaelsideguide
a480595aa7 Update index.test.ts 2024-05-17 15:41:27 -03:00
rafaelsideguide
54049be539 Added e2e tests 2024-05-17 15:37:47 -03:00
Nicolas
6feb21cc35 Update website_params.ts 2024-05-17 11:21:26 -07:00
Nicolas
5be208f595 Nick: fixed 2024-05-17 10:40:44 -07:00
Nicolas
eb88447e8b Update index.test.ts 2024-05-17 10:00:05 -07:00
Nicolas
df6c3d1e7d Merge branch 'main' into detect-pdfs 2024-05-17 09:55:51 -07:00
Nicolas
9d635cb2a3 Nick: docx support 2024-05-16 11:48:02 -07:00
Nicolas
bcce0544e7 Update openapi.json 2024-05-16 11:03:32 -07:00
Nicolas
80250fb54f Update index.test.ts 2024-05-15 17:40:46 -07:00
Nicolas
098db17913 Update index.ts 2024-05-15 17:37:09 -07:00
Nicolas
93b1f0334e Update index.test.ts 2024-05-15 17:35:06 -07:00
Nicolas
123fb784ca Update index.test.ts 2024-05-15 17:29:22 -07:00
Nicolas
4a6cfb6097 Update index.test.ts 2024-05-15 17:22:29 -07:00
Nicolas
6ca368327f Merge branch 'main' into test/crawl-options 2024-05-15 17:18:25 -07:00
Nicolas
24be4866c5 Nick: 2024-05-15 17:16:20 -07:00
Nicolas
ade4e05cff Nick: working 2024-05-15 17:13:04 -07:00
Nicolas
bfccaf670d Nick: fixes most of it 2024-05-15 15:30:37 -07:00
rafaelsideguide
d91043376c not working yet 2024-05-15 18:54:40 -03:00
rafaelsideguide
fa014defc7 Fixing child links only bug 2024-05-15 18:35:09 -03:00
Nicolas
2ba743fb1a
Merge pull request #27 from eltociear/patch-1
refactor: fix typo in WebScraper/index.ts
2024-05-15 13:28:38 -07:00
Nicolas
0663d78324
Merge pull request #119 from chand1012/main
Add Docker Compose for easy self hosting
2024-05-15 13:27:40 -07:00
Nicolas
58053eb423 Update rate-limiter.ts 2024-05-15 12:47:35 -07:00
Nicolas
1601e93d69 Merge branch 'main' into test/crawl-options 2024-05-15 12:34:47 -07:00
Nicolas
3678d3c986 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-15 12:11:18 -07:00
Nicolas
fd82982a31 Nick: 2024-05-15 12:11:16 -07:00
rafaelsideguide
4925ee59f6 added crawl test suite 2024-05-15 15:50:50 -03:00
Nicolas
1b0d6341d3 Update index.ts 2024-05-15 11:48:12 -07:00
Nicolas
d10f81e7fe Nick: fixes 2024-05-15 11:28:20 -07:00
Nicolas
87570bdfa1 Update index.ts 2024-05-15 11:06:03 -07:00
rafaelsideguide
d4574851be Added rpc definition 2024-05-15 08:40:21 -03:00
rafaelsideguide
47c20c80ab Update auth.ts 2024-05-15 08:34:49 -03:00
Ikko Eltociear Ashimine
e91c122c69
Merge branch 'main' into patch-1 2024-05-15 12:14:52 +09:00
Nicolas
7d8ceab6de Merge branch 'feat/rate-limits' of https://github.com/mendableai/firecrawl into feat/rate-limits 2024-05-14 14:48:01 -07:00
Nicolas
0e0faa28b3 Update auth.ts 2024-05-14 14:47:36 -07:00
rafaelsideguide
672eddb999 updated rpc 2024-05-14 18:47:21 -03:00
Nicolas
4761ea510b Update rate-limiter.ts 2024-05-14 14:26:42 -07:00
rafaelsideguide
40ad97dee8 added rate limits 2024-05-14 18:08:31 -03:00
Nicolas
27e1e22a0a Update index.test.ts 2024-05-14 12:28:25 -07:00
Nicolas
a0fdc6f7c6 Nick: 2024-05-14 12:12:40 -07:00
Nicolas
7f31959be7 Nick: 2024-05-14 12:04:36 -07:00
Nicolas
8a72cf556b Nick: 2024-05-13 21:10:58 -07:00
Nicolas
26a092f780 Update index.ts 2024-05-13 21:04:49 -07:00
Nicolas
8101cbee37 Update index.ts 2024-05-13 21:02:47 -07:00
Nicolas
86b8439844 Nick: 2024-05-13 20:51:42 -07:00
Nicolas
a96fc5b96d Nick: 4x speed 2024-05-13 20:45:11 -07:00
Nicolas
bd27b0e17e
Merge pull request #142 from mendableai/doc/crawl-limit-default
[Doc] Added default value for crawlOptions.limit
2024-05-13 18:38:09 -07:00
Nicolas
999176d576 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-05-13 13:57:34 -07:00
Nicolas
f3ec21d9c4 Update runWebScraper.ts 2024-05-13 13:57:22 -07:00
Nicolas
65d89afba9 Nick: 2024-05-13 13:01:43 -07:00
Eric Ciarla
4cc46d4af8 Update models.ts 2024-05-13 15:23:31 -04:00
rafaelsideguide
8eb2e95f19 Cleaned up 2024-05-13 16:13:10 -03:00
Nicolas
2ce045912f Nick: disable vision right now 2024-05-13 10:56:08 -07:00
rafaelsideguide
f4348024c6 Added check during scraping to deal with pdfs
Checks if the URL is a PDF during the scraping process (single_url.ts).

TODO: Run integration tests - Does this strat affect the running time?

ps. Some comments need to be removed if we decide to proceed with this strategy.
2024-05-13 09:13:42 -03:00
Rafael Miller
5a2712fa5a
Merge branch 'main' into detect-pdfs 2024-05-10 15:53:13 -03:00
rafaelsideguide
bc6b929b43 [Bug] Fixing /crawl limit 2024-05-10 12:15:54 -03:00
rafaelsideguide
df16890f84 Added default value for crawlOptions.limit 2024-05-10 11:59:33 -03:00
rafaelsideguide
18480b2005 Removed .env.example, improved docs and docker compose envs 2024-05-10 11:38:17 -03:00
Nicolas
66bd1e4020 Update website_params.ts 2024-05-09 18:41:15 -07:00
Nicolas
d21091bb06 Update single_url.ts 2024-05-09 17:52:46 -07:00
Nicolas
be85008622 Nick: better 2024-05-09 17:48:11 -07:00
Nicolas
be5661a768 Nick: a lot better 2024-05-09 17:45:16 -07:00
Nicolas
fce17e6beb Update credit_billing.ts 2024-05-09 15:29:58 -07:00
Nicolas
9541ff6b30 Nick: 429 addressed 2024-05-08 15:14:39 -07:00
Nicolas
4a5f87623c
Merge pull request #118 from mendableai/feat/test-suite
[Test] Added integration tests suite
2024-05-08 12:47:17 -07:00
Nicolas
b7e3104c7b Ni 2024-05-08 12:18:53 -07:00
rafaelsideguide
3f460af6c5 Added idempotency key to crawl route 2024-05-07 15:29:27 -03:00
Eric Ciarla
d280bcadf3 Add keyAuth 2024-05-07 13:52:42 -04:00
Nicolas
dcedb8d798 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:49 -07:00
Nicolas
6505bf6bf2 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:44 -07:00
Nicolas
bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide
61d615c04b Added tests 2024-05-07 14:03:00 -03:00
rafaelsideguide
e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
Nicolas
f46bf19fa5 Nick: 2024-05-07 09:26:52 -07:00
rafaelsideguide
83f3408634 Added max depth option 2024-05-07 11:06:26 -03:00
Nicolas
2e3ff85509 Update crawl-cancel.ts 2024-05-06 17:22:16 -07:00
Nicolas
6d5da358cc Nick: cancel job 2024-05-06 17:16:43 -07:00
rafaelsideguide
509250c4ef changed to includeHtml 2024-05-06 19:45:56 -03:00
rafaelsideguide
538355f1af Added toMarkdown option 2024-05-06 11:36:44 -03:00
Nicolas
d1b6f6dcde Update fly.toml 2024-05-04 13:49:09 -07:00
Nicolas
cd9a0840b5 Update search.ts 2024-05-04 13:13:15 -07:00
Nicolas
5229a4902b Update search.ts 2024-05-04 13:09:11 -07:00
Nicolas
ce7bab7b35 Update status.ts 2024-05-04 13:00:38 -07:00
Nicolas
15b774e974 Update index.ts 2024-05-04 12:44:30 -07:00
Nicolas
67f135a5b6 Update crawl-status.ts 2024-05-04 12:31:28 -07:00
Nicolas
2aa09a3000 Nick: partial docs working, cleaner 2024-05-04 12:30:12 -07:00
Nicolas
00373228fa Update index.ts 2024-05-04 11:53:16 -07:00
Nicolas
21cdaf5996
Update log_job.ts 2024-05-02 12:40:49 -07:00
Eric Ciarla
caf3f9eede Add Posthog Logging 2024-05-02 15:30:22 -04:00
Nicolas
8a95cb42f0 Update models.ts 2024-04-30 18:36:21 -07:00
Nicolas
4967536501 Update index.ts 2024-04-30 18:19:55 -07:00
Nicolas
768166b066 Update single_url.ts 2024-04-30 16:57:44 -07:00
Nicolas
a386259511 Update scrape.ts 2024-04-30 16:35:44 -07:00
Nicolas
dfcf39f4c0 Update scrape.ts 2024-04-30 16:19:59 -07:00
Nicolas
3c7030dbb1 Nick: improvements 2024-04-30 16:19:32 -07:00
Nicolas
cbd9e88b77 Merge branch 'main' into llm-extraction 2024-04-30 14:49:20 -07:00
Nicolas
4f526cff92 Nick: cleanup 2024-04-30 12:19:43 -07:00
Caleb Peffer
d9d206aff6 Caleb: 2024-04-30 10:27:39 -07:00
Caleb Peffer
d1235a0029 Caleb: switched back to markdown for extraction 2024-04-30 10:23:12 -07:00
Caleb Peffer
ad9c8e77d1 Caleb: commented out massive test 2024-04-30 10:22:09 -07:00
Caleb Peffer
a32f2b37b6 Caleb: logs work 2024-04-30 10:21:41 -07:00
Caleb Peffer
3ca9e5153f Caleb: trying to get loggin workng 2024-04-30 09:20:15 -07:00
rafaelsideguide
a095e1b63d Resolve merge conflicts with main 2024-04-30 10:54:18 -03:00
rafaelsideguide
35480bd2ad Update index.test.ts 2024-04-30 10:40:32 -03:00
rafaelsideguide
d3c36adaa7 Update index.ts 2024-04-29 17:58:47 -03:00
Caleb Peffer
79cd7d2ebc Merge branch 'llm-extraction' of https://github.com/mendableai/firecrawl into llm-extraction 2024-04-29 12:12:58 -07:00
Caleb Peffer
4f7737c922 Caleb: added ajv json schema validation. 2024-04-29 12:12:55 -07:00
rafaelsideguide
f8b207793f changed the request to do a HEAD to check for a PDF instead 2024-04-29 15:15:32 -03:00
Nicolas
b69feab916 Merge branch 'main' into llm-extraction 2024-04-29 08:40:44 -07:00
Caleb Peffer
667f740315 Caleb: converted llm response to json 2024-04-28 19:28:28 -07:00
Caleb Peffer
2ad7a58eb7 Caleb: first test passing 2024-04-28 17:38:20 -07:00
Caleb Peffer
06497729e2 Caleb: got it to a testable state I believe 2024-04-28 15:52:09 -07:00
Caleb Peffer
6ee1f2d3bc Caleb: initially pulled inspiration code from https://github.com/mishushakov/llm-scraper 2024-04-28 13:59:35 -07:00
Nicolas
68838c9e0d Update single_url.ts 2024-04-28 12:44:00 -07:00
Nicolas
d8ee4e90d6 Update website_params.ts 2024-04-28 11:47:25 -07:00
Nicolas
8e44696c4d Nick: 2024-04-28 11:34:25 -07:00
Nicolas
1dc6458c6a Update crawler.ts 2024-04-27 11:17:10 -07:00
Nicolas
0f694e0608
Update crawler.ts 2024-04-27 11:14:52 -07:00
tractorjuice
a5d38039f2
Add additional file extensions to crawler.ts
Add additional file extensions.
2024-04-27 11:03:27 +01:00
Nicolas
7689c31d35 Update credit_billing.ts 2024-04-26 14:36:19 -07:00
Nicolas
0a607b9efa Merge branch 'main' into feat/coupons 2024-04-26 14:23:35 -07:00
Nicolas
fdf913e0f1 Update index.test.ts 2024-04-26 13:06:48 -07:00
Nicolas
8e32453424 Update auth.ts 2024-04-26 12:57:49 -07:00
rafaelsideguide
1f48998970 done 2024-04-26 16:27:31 -03:00
Nicolas
d210a57a9b Update credit_billing.ts 2024-04-26 10:24:36 -07:00
Nicolas
24e1bdec1b Update credit_billing.ts 2024-04-26 10:14:29 -07:00
rafaelsideguide
06675d1fe3 almost finished 2024-04-26 11:42:49 -03:00