0
Commit Graph

101 Commits

Author SHA1 Message Date
rafaelsideguide
d3c36adaa7 Update index.ts 2024-04-29 17:58:47 -03:00
rafaelsideguide
f8b207793f changed the request to do a HEAD to check for a PDF instead 2024-04-29 15:15:32 -03:00
Nicolas
b69feab916 Merge branch 'main' into llm-extraction 2024-04-29 08:40:44 -07:00
Caleb Peffer
2ad7a58eb7 Caleb: first test passing 2024-04-28 17:38:20 -07:00
Caleb Peffer
06497729e2 Caleb: got it to a testable state I believe 2024-04-28 15:52:09 -07:00
Caleb Peffer
6ee1f2d3bc Caleb: initially pulled inspiration code from https://github.com/mishushakov/llm-scraper 2024-04-28 13:59:35 -07:00
Nicolas
68838c9e0d Update single_url.ts 2024-04-28 12:44:00 -07:00
Nicolas
d8ee4e90d6 Update website_params.ts 2024-04-28 11:47:25 -07:00
Nicolas
8e44696c4d Nick: 2024-04-28 11:34:25 -07:00
rafaelsideguide
75597f72a1 [Feat] Added allowed urls
FireCrawl should be able to scrape LinkedIn Articles (/pulse/*)
2024-04-25 08:39:45 -03:00
Rafael Miller
f189589da4
Merge pull request #34 from mendableai/nsc/returnOnlyUrls
Implements the ability for the crawler to output all the links it found, without scraping
2024-04-24 10:34:42 -03:00
rafaelsideguide
942ac3b41c Resolved merge conflicts between feat/added-anthropic-vision-api and main 2024-04-24 09:57:45 -03:00
Nicolas
8939ca570b Merge branch 'main' into nsc/returnOnlyUrls 2024-04-23 18:05:48 -07:00
Nicolas
fdb2789eaa Nick: added url as return param 2024-04-23 17:14:34 -07:00
Nicolas
734c76fc56 Merge branch 'main' into nsc/mvp-search 2024-04-23 17:04:31 -07:00
Nicolas
f0695c7123 Update single_url.ts 2024-04-23 17:04:10 -07:00
Nicolas
0146157876 Nick: mvp 2024-04-23 15:28:32 -07:00
rafaelsideguide
849c0b6ebf [Feat] Added blocklist for social media urls 2024-04-23 18:50:35 -03:00
Nicolas
306cfe4ce1 Nick: 2024-04-23 11:15:11 -07:00
Nicolas
ddf9ff9c9a Nick: 2024-04-20 11:46:06 -07:00
Nicolas
f1dd97af0f Update index.ts 2024-04-19 15:37:27 -07:00
Nicolas
84cebf618b Nick: 2024-04-19 15:36:00 -07:00
Nicolas
5b93799149 Nick: a bit faster 2024-04-19 15:13:17 -07:00
Nicolas
c5cb268b61 Update pdfProcessor.ts 2024-04-19 13:13:42 -07:00
Nicolas
43cfcec326 Nick: disabling in crawl and sitemap for now 2024-04-19 13:12:08 -07:00
Nicolas
140529c609 Nick: fixes pdfs not found 2024-04-19 13:05:21 -07:00
Ikko Eltociear Ashimine
9e9d66f7a3
refactor: fix typo in WebScraper/index.ts
breakign -> breaking
2024-04-20 02:27:53 +09:00
rafaelsideguide
72e1dadccd adding option to replace all relative paths with absolute paths 2024-04-19 11:47:20 -03:00
rafaelsideguide
c4cc4b9262 fixing document response 2024-04-18 14:12:39 -03:00
Rafael Miller
704a059448
Update index.ts 2024-04-18 13:53:11 -03:00
rafaelsideguide
57e5b36014 [Feat] Adding pdf parser 2024-04-18 11:43:57 -03:00
Nicolas
ca2bf9cc12 Update single_url.ts 2024-04-17 18:27:08 -07:00
Nicolas
36abe0f7f9 Nick: 2024-04-17 18:24:46 -07:00
Nicolas
460763ba5f
Merge pull request #11 from mendableai/feat/parse-to-markdown-tables
[Feat] Added html to markdown table parser
2024-04-17 15:52:43 -04:00
Nicolas
52fb28bc1a Update index.ts 2024-04-17 12:52:15 -07:00
Nicolas
de439f6529 Update index.ts 2024-04-17 12:51:29 -07:00
Nicolas
871d5d91b0 Update index.ts 2024-04-17 12:51:12 -07:00
Nicolas
08ed68ff55 Nick: fixes 2024-04-17 12:44:23 -07:00
rafaelsideguide
ee8a097252 adding unit tests and fixing the parse function 2024-04-17 15:56:01 -03:00
Nicolas
2eb81545fa Update index.test.ts 2024-04-17 11:04:03 -07:00
rafaelsideguide
b375ce3e39 adding unit tests and bugfixing 2024-04-17 14:54:54 -03:00
Nicolas
db15724b0c
Update imageDescription.ts 2024-04-17 10:39:29 -07:00
Nicolas
27674a624d
Update index.ts 2024-04-17 10:39:00 -07:00
rafaelsideguide
ff622739b7 Added a html to markdown table parser 2024-04-17 11:01:19 -03:00
rafaelsideguide
ed5dc808c7 Update imageDescription.ts 2024-04-16 18:05:07 -03:00
rafaelsideguide
00941d94a4 Added anthropic vision to getImageDescription function 2024-04-16 18:03:48 -03:00
rafaelsideguide
d23a7ae591 improving relative paths 2024-04-16 16:34:01 -03:00
rafaelsideguide
a04610302a Spliting relative paths for images 2024-04-16 16:31:33 -03:00
Nicolas
4c4775e0b8 Nick: 2024-04-16 12:49:14 -04:00
Nicolas
93627ae87c Nick: 2024-04-16 12:06:46 -04:00
Nicolas
a6c2a87811 Initial commit 2024-04-15 17:01:47 -04:00