rafaelsideguide
c201ea1986
added idempotency key to python sdk
2024-05-23 12:52:59 -03:00
rafaelsideguide
35927a65a5
Merge branch 'main' into feat/idempotency-key
2024-05-23 12:20:06 -03:00
rafaelsideguide
184e4678f1
bugfix on idempotency key check
2024-05-23 11:47:04 -03:00
Matt Joyce
96630154d3
Merge pull request #1 from mendableai/main
...
Fix FIRECRAWL_API_URL bug, also various PyLint fixes
2024-05-23 09:16:03 +10:00
Matt Joyce
106c18d11f
Use truthiness check for 'success' key in API response
...
PyLint C0121
2024-05-23 08:57:53 +10:00
Matt Joyce
5c21aed9c7
adding pylintrc to allow longer lines
2024-05-23 08:45:56 +10:00
Matt Joyce
48e91c89e7
Removed unnecessary If block
...
PyLint R1731
2024-05-23 08:42:07 +10:00
Matt Joyce
7d2efe5acb
Added request timeouts
...
connection timeout to 5 seconds and the response timeout to 10
PyLint W3101
2024-05-23 08:39:19 +10:00
Matt Joyce
96b19172a1
Removed trailing whitespace
...
PyLint C0303: Trailing whitespace (trailing-whitespace)
2024-05-23 08:30:23 +10:00
Matt Joyce
6216c85322
Time module already imported
...
Pylint
W0404: Reimport 'time' (imported line 16) (reimported)
C0415: Import outside toplevel (time) (import-outside-toplevel)
2024-05-23 08:21:32 +10:00
Matt Joyce
8adf2b7132
Added Docstrings for functions
...
PyLint C0116: Missing function or method docstring (missing-function-docstring)
2024-05-23 08:20:32 +10:00
Matt Joyce
971e1f85c4
Added module docstring
...
PyLint C0114 - missing-module-docstring
2024-05-23 08:03:58 +10:00
Matt Joyce
8d041c05b4
rearranged logic for FIRECRAWL_API_URL
...
It would not use the ENV unless the param was set to None which was counter-intuitive.
2024-05-23 08:00:56 +10:00
Nicolas
4e39701644
Update main.py
2024-05-22 12:59:56 -07:00
Nicolas
3aa5f26627
Update main.py
2024-05-22 10:45:43 -07:00
Nicolas
3e63985e53
Update main.py
2024-05-22 10:40:47 -07:00
rafaelsideguide
4dfc371241
Update index.test.ts
2024-05-22 14:38:41 -03:00
rafaelsideguide
f4a3469b9e
Merge branch 'main' into bug/crawl-limit
2024-05-22 14:27:28 -03:00
Nicolas
0d187f0425
Merge pull request #77 from tractorjuice/patch-1
...
Add additional file extensions to crawler.ts
2024-05-22 10:16:49 -07:00
rafaelsideguide
f9ae1729b6
Update firecrawl.py
2024-05-22 09:40:38 -03:00
Nicolas
cb2bd0e71f
Update index.test.ts
2024-05-21 19:03:32 -07:00
Nicolas
253abb849f
Update rate-limiter.ts
2024-05-21 18:53:58 -07:00
Nicolas
229b9908d2
Nick: only enable hyper dx in prod
2024-05-21 18:52:46 -07:00
Nicolas
a8ff295977
Update single_url.ts
2024-05-21 18:50:42 -07:00
Nicolas
a5e718b084
Nick: improvements
2024-05-21 18:34:23 -07:00
Nicolas
6285f12cd1
Merge pull request #167 from mendableai/nsc/hyper-dx-integration
...
feat: HyperDX Integration
2024-05-21 13:19:38 -07:00
youqiang
c47dae13a9
update: wait until body attached in playwright-service
2024-05-21 14:53:57 +08:00
Nicolas
7f64fe884a
Update blocklist.ts
2024-05-20 17:26:01 -07:00
Nicolas
756f54466d
Nick: allowed keywords for now
2024-05-20 17:24:21 -07:00
Nicolas
01783dc336
Update openapi.json
2024-05-20 17:10:55 -07:00
Nicolas
77a79b5a79
Nick: max num tokens for llm extract (for now) + slice the max
2024-05-20 17:07:38 -07:00
Nicolas
2644e1c029
Update .env.example
2024-05-20 13:36:51 -07:00
Nicolas
9e61d431f0
Nick: hyper dx integration init
2024-05-20 13:36:34 -07:00
Nicolas
d5d0d48848
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-05-20 10:06:52 -07:00
Nicolas
60002e79b8
Nick: python sdk bump
2024-05-20 10:06:48 -07:00
Matt Joyce
7e5ef4dec4
Allow override of API URL
...
Allows python sdk to be used with local installs.
2024-05-20 18:46:32 +10:00
Nicolas
c74f757b53
Update rate-limiter.ts
2024-05-19 13:05:36 -07:00
Nicolas
98a39b39ab
Nick: increased rate limits
2024-05-19 12:59:29 -07:00
Nicolas
18fa15df25
Update index.test.ts
2024-05-19 12:50:06 -07:00
Nicolas
614c073af0
Nick: improvements
2024-05-19 12:45:46 -07:00
Nicolas
f473793ba3
Merge branch 'main' into feat/rate-limits
2024-05-19 12:23:34 -07:00
rafaelsideguide
a480595aa7
Update index.test.ts
2024-05-17 15:41:27 -03:00
rafaelsideguide
54049be539
Added e2e tests
2024-05-17 15:37:47 -03:00
Nicolas
6feb21cc35
Update website_params.ts
2024-05-17 11:21:26 -07:00
Nicolas
5be208f595
Nick: fixed
2024-05-17 10:40:44 -07:00
Nicolas
eb88447e8b
Update index.test.ts
2024-05-17 10:00:05 -07:00
Nicolas
df6c3d1e7d
Merge branch 'main' into detect-pdfs
2024-05-17 09:55:51 -07:00
Nicolas
9d635cb2a3
Nick: docx support
2024-05-16 11:48:02 -07:00
Nicolas
bcce0544e7
Update openapi.json
2024-05-16 11:03:32 -07:00
Nicolas
80250fb54f
Update index.test.ts
2024-05-15 17:40:46 -07:00
Nicolas
098db17913
Update index.ts
2024-05-15 17:37:09 -07:00
Nicolas
93b1f0334e
Update index.test.ts
2024-05-15 17:35:06 -07:00
Nicolas
123fb784ca
Update index.test.ts
2024-05-15 17:29:22 -07:00
Nicolas
4a6cfb6097
Update index.test.ts
2024-05-15 17:22:29 -07:00
Nicolas
6ca368327f
Merge branch 'main' into test/crawl-options
2024-05-15 17:18:25 -07:00
Nicolas
24be4866c5
Nick:
2024-05-15 17:16:20 -07:00
Nicolas
ade4e05cff
Nick: working
2024-05-15 17:13:04 -07:00
Nicolas
bfccaf670d
Nick: fixes most of it
2024-05-15 15:30:37 -07:00
rafaelsideguide
d91043376c
not working yet
2024-05-15 18:54:40 -03:00
rafaelsideguide
fa014defc7
Fixing child links only bug
2024-05-15 18:35:09 -03:00
Nicolas
2ba743fb1a
Merge pull request #27 from eltociear/patch-1
...
refactor: fix typo in WebScraper/index.ts
2024-05-15 13:28:38 -07:00
Nicolas
0663d78324
Merge pull request #119 from chand1012/main
...
Add Docker Compose for easy self hosting
2024-05-15 13:27:40 -07:00
rafaelsideguide
da8d94105d
fixed for testing the crawl algorithm only
2024-05-15 17:16:03 -03:00
Nicolas
95ffaa2236
Update crawl.test.ts
2024-05-15 12:58:02 -07:00
Nicolas
f15b8f855e
Update crawl.json
2024-05-15 12:57:24 -07:00
Nicolas
98dd672d0a
Update crawl.json
2024-05-15 12:55:04 -07:00
Nicolas
499671c87f
Update crawl.test.ts
2024-05-15 12:50:13 -07:00
Nicolas
58053eb423
Update rate-limiter.ts
2024-05-15 12:47:35 -07:00
Nicolas
4745d114be
Update crawl.test.ts
2024-05-15 12:42:14 -07:00
Nicolas
1601e93d69
Merge branch 'main' into test/crawl-options
2024-05-15 12:34:47 -07:00
Nicolas
3678d3c986
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-05-15 12:11:18 -07:00
Nicolas
fd82982a31
Nick:
2024-05-15 12:11:16 -07:00
rafaelsideguide
4925ee59f6
added crawl test suite
2024-05-15 15:50:50 -03:00
Nicolas
1b0d6341d3
Update index.ts
2024-05-15 11:48:12 -07:00
Nicolas
d10f81e7fe
Nick: fixes
2024-05-15 11:28:20 -07:00
Nicolas
87570bdfa1
Update index.ts
2024-05-15 11:06:03 -07:00
rafaelsideguide
d4574851be
Added rpc definition
2024-05-15 08:40:21 -03:00
rafaelsideguide
47c20c80ab
Update auth.ts
2024-05-15 08:34:49 -03:00
Ikko Eltociear Ashimine
e91c122c69
Merge branch 'main' into patch-1
2024-05-15 12:14:52 +09:00
Nicolas
7d8ceab6de
Merge branch 'feat/rate-limits' of https://github.com/mendableai/firecrawl into feat/rate-limits
2024-05-14 14:48:01 -07:00
Nicolas
0e0faa28b3
Update auth.ts
2024-05-14 14:47:36 -07:00
rafaelsideguide
672eddb999
updated rpc
2024-05-14 18:47:21 -03:00
Nicolas
4761ea510b
Update rate-limiter.ts
2024-05-14 14:26:42 -07:00
rafaelsideguide
40ad97dee8
added rate limits
2024-05-14 18:08:31 -03:00
Nicolas
27e1e22a0a
Update index.test.ts
2024-05-14 12:28:25 -07:00
Nicolas
a0fdc6f7c6
Nick:
2024-05-14 12:12:40 -07:00
Nicolas
7f31959be7
Nick:
2024-05-14 12:04:36 -07:00
Nicolas
8a72cf556b
Nick:
2024-05-13 21:10:58 -07:00
Nicolas
a96fc5b96d
Nick: 4x speed
2024-05-13 20:45:11 -07:00
Nicolas
e26008a833
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-05-13 19:54:13 -07:00
Nicolas
512449e1aa
Nick: v21
2024-05-13 19:54:12 -07:00
Nicolas
bd27b0e17e
Merge pull request #142 from mendableai/doc/crawl-limit-default
...
[Doc] Added default value for crawlOptions.limit
2024-05-13 18:38:09 -07:00
Nicolas
aa0c8188c9
Nick: 408 handling
2024-05-13 18:34:00 -07:00
Nicolas
999176d576
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-05-13 13:57:34 -07:00
Nicolas
f3ec21d9c4
Update runWebScraper.ts
2024-05-13 13:57:22 -07:00
Nicolas
65d89afba9
Nick:
2024-05-13 13:01:43 -07:00
Eric Ciarla
4cc46d4af8
Update models.ts
2024-05-13 15:23:31 -04:00
rafaelsideguide
8eb2e95f19
Cleaned up
2024-05-13 16:13:10 -03:00
Nicolas
2ce045912f
Nick: disable vision right now
2024-05-13 10:56:08 -07:00
rafaelsideguide
f4348024c6
Added check during scraping to deal with pdfs
...
Checks if the URL is a PDF during the scraping process (single_url.ts).
TODO: Run integration tests - Does this strat affect the running time?
ps. Some comments need to be removed if we decide to proceed with this strategy.
2024-05-13 09:13:42 -03:00