Resolve merge conflicts with main

2024-04-30 10:54:18 -03:00 · 2024-04-30 10:54:18 -03:00 · a095e1b63d
commit a095e1b63d
parent 35480bd2ad d3c36adaa7
73 changed files with 7136 additions and 1052 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -54,5 +54,5 @@ jobs:
        id: start_workers
      - name: Run E2E tests
        run: |
-          npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false
+          npm run test:prod
        working-directory: ./apps/api
--- a/.github/workflows/fly.yml
+++ b/.github/workflows/fly.yml
@ -54,7 +54,7 @@ jobs:
        id: start_workers
      - name: Run E2E tests
        run: |
-          npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false
+          npm run test:prod
        working-directory: ./apps/api
  deploy:
    name: Deploy app
--- a/.gitignore
+++ b/.gitignore
@ -6,3 +6,5 @@
 dump.rdb
 /mongo-data
 apps/js-sdk/node_modules/
 apps/api/.env.local
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -1,4 +1,114 @@
-# Contributing
+# Contributors guide: 
 Welcome to [Firecrawl](https://firecrawl.dev) 🔥! Here are some instructions on how to get the project locally, so you can run it on your own (and contribute) 
 If you're contributing, note that the process is similar to other open source repos i.e. (fork firecrawl, make changes, run tests, PR). If you have any questions, and would like help gettin on board, reach out to hello@mendable.ai for more or submit an issue!
 ## Running the project locally
 First, start by installing dependencies
 1. node.js [instructions](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs)
 2. pnpm [instructions](https://pnpm.io/installation)
 3. redis [instructions](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/) 
 Set environment variables in a .env  in the /apps/api/ directoryyou can copy over the template in .env.example.
 To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features )
 .env:
 ```
 # ===== Required ENVS ======
 NUM_WORKERS_PER_QUEUE=8 
 PORT=3002
 HOST=0.0.0.0
 REDIS_URL=redis://localhost:6379
 ## To turn on DB authentication, you need to set up supabase.
 USE_DB_AUTHENTICATION=false
 # ===== Optional ENVS ======
 # Supabase Setup (used to support DB authentication, advanced logging, etc.)
 SUPABASE_ANON_TOKEN= 
 SUPABASE_URL= 
 SUPABASE_SERVICE_TOKEN=
 # Other Optionals
 TEST_API_KEY= # use if you've set up authentication and want to test with a real API key
 SCRAPING_BEE_API_KEY= #Set if you'd like to use scraping Be to handle JS blocking
 OPENAI_API_KEY= # add for LLM dependednt features (image alt generation, etc.)
 BULL_AUTH_KEY= #
 LOGTAIL_KEY= # Use if you're configuring basic logging with logtail
 PLAYWRIGHT_MICROSERVICE_URL=  # set if you'd like to run a playwright fallback
 LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse pdfs
 ```
 ### Installing dependencies
 First, install the dependencies using pnpm.
 ```bash
 pnpm install
 ```
 ### Running the project
 You're going to need to open 3 terminals. 
 ### Terminal 1 - setting up redis
 Run the command anywhere within your project
 ```bash
 redis-server
 ```
 ### Terminal 2 - setting up workers
 Now, navigate to the apps/api/ directory and run:
 ```bash
 pnpm run workers
 ```
 This will start the workers who are responsible for processing crawl jobs.
 ### Terminal 3 - setting up the main server
 To do this, navigate to the apps/api/ directory and run if you don’t have this already, install pnpm here: https://pnpm.io/installation
 Next, run your server with:
 ```bash
 pnpm run start
 ```
 ### Terminal 3 - sending our first request.
 Alright: now let’s send our first request.
 ```curl
 curl -X GET http://localhost:3002/test
 ``` 
 This should return the response Hello, world!
 If you’d like to test the crawl endpoint, you can run this 
 ```curl
 curl -X POST http://localhost:3002/v0/crawl \
    -H 'Content-Type: application/json' \
    -d '{
      "url": "https://mendable.ai"
    }'
 ```   
 ## Tests:
 The best way to do this is run the test with `npm run test:local-no-auth` if you'd like to run the tests without authentication.
 If you'd like to run the tests with authentication, run `npm run test:prod`
 We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.
--- a/README.md
+++ b/README.md
@ -2,26 +2,29 @@
 Crawl and convert any website into LLM-ready markdown. Build by [Mendable.ai](https://mendable.ai?ref=gfirecrawl)
-
+_This repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. The primary objective is to enhance the accuracy of LLM responses by utilizing clean data. It is not ready for full self-host yet - we're working on it_
 *This repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. The primary objective is to enhance the accuracy of LLM responses by utilizing clean data. It is not ready for full self-host yet - we're working on it*
 ## What is Firecrawl?
 [Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown. We crawl all accessible subpages and give you clean markdown for each. No sitemap required.
 _Pst. hey, you, join our stargazers :)_
 <img src="https://github.com/mendableai/firecrawl/assets/44934913/53c4483a-0f0e-40c6-bd84-153a07f94d29" width="200">
 ## How to use it?
 We provide an easy to use API with our hosted version. You can find the playground and documentation [here](https://firecrawl.dev/playground). You can also self host the backend if you'd like.
 - [x] [API](https://firecrawl.dev/playground)
 - [x] [Python SDK](https://github.com/mendableai/firecrawl/tree/main/apps/python-sdk)
- [X] [Node SDK](https://github.com/mendableai/firecrawl/tree/main/apps/js-sdk)
+- [x] [Node SDK](https://github.com/mendableai/firecrawl/tree/main/apps/js-sdk)
 - [x] [Langchain Integration 🦜🔗](https://python.langchain.com/docs/integrations/document_loaders/firecrawl/)
 - [x] [Llama Index Integration 🦙](https://docs.llamaindex.ai/en/latest/examples/data_connectors/WebPageDemo/#using-firecrawl-reader)
- [ ] LangchainJS - Coming Soon
+- [X] [Langchain JS Integration 🦜🔗](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/firecrawl)
 - [ ] Want an SDK or Integration? Let us know by opening an issue.
-
+To run locally, refer to guide [here](https://github.com/mendableai/firecrawl/blob/main/CONTRIBUTING.md).
 Self-host. To self-host refer to guide [here](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md).
 ### API Key
@ -70,13 +73,82 @@ curl -X GET https://api.firecrawl.dev/v0/crawl/status/1234-5678-9101 \
        "title": "Mendable | AI for CX and Sales",
        "description": "AI for CX and Sales",
        "language": null,
-             "sourceURL": "https://www.mendable.ai/",
+        "sourceURL": "https://www.mendable.ai/"
      }
    }
  ]
 }
 ```
 ### Scraping
 Used to scrape a URL and get its content.
 ```bash
 curl -X POST https://api.firecrawl.dev/v0/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://mendable.ai"
    }'
 ```
 Response:
 ```json
 {
  "success": true,
  "data": {
    "content": "Raw Content ",
    "markdown": "# Markdown Content",
    "provider": "web-scraper",
    "metadata": {
      "title": "Mendable | AI for CX and Sales",
      "description": "AI for CX and Sales",
      "language": null,
      "sourceURL": "https://www.mendable.ai/"
    }
  }
 }
 ```
 ### Search (Beta)
 Used to search the web, get the most relevant results, scrap each page and return the markdown.
 ```bash
 curl -X POST https://api.firecrawl.dev/v0/search \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "query": "firecrawl",
      "pageOptions": {
        "fetchPageContent": true // false for a fast serp api
      }
    }'
 ```
 ```json
 {
  "success": true,
  "data": [
    {
      "url": "https://mendable.ai",
      "markdown": "# Markdown Content",
      "provider": "web-scraper",
      "metadata": {
        "title": "Mendable | AI for CX and Sales",
        "description": "AI for CX and Sales",
        "language": null,
        "sourceURL": "https://www.mendable.ai/"
      }
    }
  ]
 }
 ```
 Coming soon to the Langchain and LLama Index integrations.
 ## Using Python SDK
 ### Installing Python SDK
@ -108,6 +180,18 @@ url = 'https://example.com'
 scraped_data = app.scrape_url(url)
 ```
 ### Search for a query
 Performs a web search, retrieve the top results, extract data from each page, and returns their markdown.
 ```python
 query = 'what is mendable?'
 search_result = app.search(query)
 ```
 ## Contributing
 We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.
 *It is the sole responsibility of the end users to respect websites' policies when scraping, searching and crawling with Firecrawl. Users are advised to adhere to the applicable privacy policies and terms of use of the websites prior to initiating any scraping activities. By default, Firecrawl respects the directives specified in the websites' robots.txt files when crawling. By utilizing Firecrawl, you expressly agree to comply with these conditions.*
--- a/SELF_HOST.md
+++ b/SELF_HOST.md
@ -1,6 +1,6 @@
 # Self-hosting Firecrawl
-Guide coming soon.
+Refer to [CONTRIBUTING.md](https://github.com/mendableai/firecrawl/blob/main/CONTRIBUTING.md) for instructions on how to run it locally.
 *This repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. The primary objective is to enhance the accuracy of LLM responses by utilizing clean data. It is not ready for full self-host yet - we're working on it*
--- a/apps/api/.env.example
+++ b/apps/api/.env.example
@ -0,0 +1,26 @@
 # ===== Required ENVS ======
 NUM_WORKERS_PER_QUEUE=8 
 PORT=3002
 HOST=0.0.0.0
 REDIS_URL=redis://localhost:6379
 ## To turn on DB authentication, you need to set up supabase.
 USE_DB_AUTHENTICATION=true
 # ===== Optional ENVS ======
 # Supabase Setup (used to support DB authentication, advanced logging, etc.)
 SUPABASE_ANON_TOKEN= 
 SUPABASE_URL= 
 SUPABASE_SERVICE_TOKEN=
 # Other Optionals
 TEST_API_KEY= # use if you've set up authentication and want to test with a real API key
 SCRAPING_BEE_API_KEY= #Set if you'd like to use scraping Be to handle JS blocking
 OPENAI_API_KEY= # add for LLM dependednt features (image alt generation, etc.)
 BULL_AUTH_KEY= #
 LOGTAIL_KEY= # Use if you're configuring basic logging with logtail
 PLAYWRIGHT_MICROSERVICE_URL=  # set if you'd like to run a playwright fallback
 LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse pdfs
 SERPER_API_KEY= #Set if you have a serper key you'd like to use as a search api
 SLACK_WEBHOOK_URL= # set if you'd like to send slack server health status messages
--- a/apps/api/.env.local
+++ b/apps/api/.env.local
@ -7,8 +7,8 @@ SUPABASE_SERVICE_TOKEN=
 REDIS_URL=
 SCRAPING_BEE_API_KEY=
 OPENAI_API_KEY=
 ANTHROPIC_API_KEY=
 BULL_AUTH_KEY=
 LOGTAIL_KEY=
 PLAYWRIGHT_MICROSERVICE_URL=
-LLAMAPARSE_API_KEY=
+
 TEST_API_KEY=
--- a/apps/api/jest.config.js
+++ b/apps/api/jest.config.js
@ -2,4 +2,7 @@ module.exports = {
  preset: "ts-jest",
  testEnvironment: "node",
  setupFiles: ["./jest.setup.js"],
  // ignore dist folder root dir
  modulePathIgnorePatterns: ["<rootDir>/dist/"],
 };
--- a/apps/api/openapi.json
+++ b/apps/api/openapi.json
@ -3,11 +3,11 @@
  "info": {
    "title": "Firecrawl API",
    "version": "1.0.0",
-      "description": "API for interacting with Firecrawl services to convert websites to LLM-ready data.",
+    "description": "API for interacting with Firecrawl services to perform web scraping and crawling tasks.",
    "contact": {
      "name": "Firecrawl Support",
      "url": "https://firecrawl.dev/support",
-        "email": "help@mendable.ai"
+      "email": "support@firecrawl.dev"
    }
  },
  "servers": [
@ -37,6 +37,16 @@
                    "type": "string",
                    "format": "uri",
                    "description": "The URL to scrape"
                  },
                  "pageOptions": {
                    "type": "object",
                    "properties": {
                      "onlyMainContent": {
                        "type": "boolean",
                        "description": "Only return the main content of the page excluding headers, navs, footers, etc.",
                        "default": false
                      }
                    }
                  }
                },
                "required": ["url"]
@ -111,6 +121,11 @@
                        "description": "Generate alt text for images using LLMs (must have a paid plan)",
                        "default": false
                      },
                      "returnOnlyUrls": {
                        "type": "boolean",
                        "description": "If true, returns only the URLs as a list on the crawl status. Attention: the return response will be a list of URLs inside the data, not a list of documents.",
                        "default": false
                      },
                      "limit": {
                        "type": "integer",
                        "description": "Maximum number of pages to crawl"
@ -156,6 +171,81 @@
        }
      }
    },
    "/search": {
      "post": {
        "summary": "Search for a keyword in Google, returns top page results with markdown content for each page",
        "operationId": "searchGoogle",
        "tags": ["Search"],
        "security": [
          {
            "bearerAuth": []
          }
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "type": "object",
                "properties": {
                  "query": {
                    "type": "string",
                    "format": "uri",
                    "description": "The URL to scrape"
                  },
                  "pageOptions": {
                    "type": "object",
                    "properties": {
                      "onlyMainContent": {
                        "type": "boolean",
                        "description": "Only return the main content of the page excluding headers, navs, footers, etc.",
                        "default": false
                      },
                      "fetchPageContent": {
                        "type": "boolean",
                        "description": "Fetch the content of each page. If false, defaults to a basic fast serp API.",
                        "default": true
                      }
                    }
                  },
                  "searchOptions": {
                    "type": "object",
                    "properties": {
                      "limit": {
                        "type": "integer",
                        "description": "Maximum number of results. Max is 20 during beta."
                      }
                    }
                  }
                },
                "required": ["query"]
              }
            }
          }
        },
        "responses": {
          "200": {
            "description": "Successful response",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/SearchResponse"
                }
              }
            }
          },
          "402": {
            "description": "Payment required"
          },
          "429": {
            "description": "Too many requests"
          },
          "500": {
            "description": "Server error"
          }
        }
      }
    },
    "/crawl/status/{jobId}": {
      "get": {
        "tags": ["Crawl"],
@ -247,10 +337,10 @@
          "data": {
            "type": "object",
            "properties": {
-                "content": {
+              "markdown": {
                "type": "string"
              },
-                "markdown": {
+              "content": {
                "type": "string"
              },
              "metadata": {
@ -276,6 +366,50 @@
          }
        }
      },
      "SearchResponse": {
        "type": "object",
        "properties": {
          "success": {
            "type": "boolean"
          },
          "data": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "url": {
                  "type": "string"
                },
                "markdown": {
                  "type": "string"
                },
                "content": {
                  "type": "string"
                },
                "metadata": {
                  "type": "object",
                  "properties": {
                    "title": {
                      "type": "string"
                    },
                    "description": {
                      "type": "string"
                    },
                    "language": {
                      "type": "string",
                      "nullable": true
                    },
                    "sourceURL": {
                      "type": "string",
                      "format": "uri"
                    }
                  }
                }
              }
            }
          }
        }
      },
      "CrawlResponse": {
        "type": "object",
        "properties": {
@ -292,4 +426,3 @@
    }
  ]
 }
--- a/apps/api/package.json
+++ b/apps/api/package.json
@ -10,7 +10,9 @@
    "flyio": "node dist/src/index.js",
    "start:dev": "nodemon --exec ts-node src/index.ts",
    "build": "tsc",
-    "test": "jest --verbose",
+    "test": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathIgnorePatterns='src/__tests__/e2e_noAuth/*'",
    "test:local-no-auth": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathIgnorePatterns='src/__tests__/e2e_withAuth/*'",
    "test:prod": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathIgnorePatterns='src/__tests__/e2e_noAuth/*'",
    "workers": "nodemon --exec ts-node src/services/queue-worker.ts",
    "worker:production": "node dist/src/services/queue-worker.js",
    "mongo-docker": "docker run -d -p 2717:27017 -v ./mongo-data:/data/db --name mongodb mongo:latest",
@ -39,6 +41,7 @@
    "typescript": "^5.4.2"
  },
  "dependencies": {
    "@anthropic-ai/sdk": "^0.20.5",
    "@brillout/import": "^0.2.2",
    "@bull-board/api": "^5.14.2",
    "@bull-board/express": "^5.8.0",
@ -64,6 +67,7 @@
    "glob": "^10.3.12",
    "gpt3-tokenizer": "^1.1.5",
    "ioredis": "^5.3.2",
    "joplin-turndown-plugin-gfm": "^1.0.12",
    "keyword-extractor": "^0.0.25",
    "langchain": "^0.1.25",
    "languagedetect": "^2.0.0",
--- a/apps/api/pnpm-lock.yaml
+++ b/apps/api/pnpm-lock.yaml
@ -5,6 +5,9 @@ settings:
  excludeLinksFromLockfile: false
 dependencies:
  '@anthropic-ai/sdk':
    specifier: ^0.20.5
    version: 0.20.5
  '@brillout/import':
    specifier: ^0.2.2
    version: 0.2.3
@ -80,6 +83,9 @@ dependencies:
  ioredis:
    specifier: ^5.3.2
    version: 5.3.2
  joplin-turndown-plugin-gfm:
    specifier: ^1.0.12
    version: 1.0.12
  keyword-extractor:
    specifier: ^0.0.25
    version: 0.0.25
@ -222,6 +228,21 @@ packages:
      '@jridgewell/trace-mapping': 0.3.25
    dev: true
  /@anthropic-ai/sdk@0.20.5:
    resolution: {integrity: sha512-d0ch+zp6/gHR4+2wqWV7JU1EJ7PpHc3r3F6hebovJTouY+pkaId1FuYYaVsG3l/gyqhOZUwKCMSMqcFNf+ZmWg==}
    dependencies:
      '@types/node': 18.19.22
      '@types/node-fetch': 2.6.11
      abort-controller: 3.0.0
      agentkeepalive: 4.5.0
      form-data-encoder: 1.7.2
      formdata-node: 4.4.1
      node-fetch: 2.7.0
      web-streams-polyfill: 3.3.3
    transitivePeerDependencies:
      - encoding
    dev: false
  /@anthropic-ai/sdk@0.9.1:
    resolution: {integrity: sha512-wa1meQ2WSfoY8Uor3EdrJq0jTiZJoKoSii2ZVWRY1oN4Tlr5s59pADg9T79FTbPe1/se5c3pBeZgJL63wmuoBA==}
    dependencies:
@ -3923,6 +3944,10 @@ packages:
      - ts-node
    dev: true
  /joplin-turndown-plugin-gfm@1.0.12:
    resolution: {integrity: sha512-qL4+1iycQjZ1fs8zk3jSRk7cg3ROBUHk7GKtiLAQLFzLPKErnILUvz5DLszSQvz3s1sTjPbywLDISVUtBY6HaA==}
    dev: false
  /js-tiktoken@1.0.10:
    resolution: {integrity: sha512-ZoSxbGjvGyMT13x6ACo9ebhDha/0FHdKA+OsQcMOWcm1Zs7r90Rhk5lhERLzji+3rA7EKpXCgwXcM5fF3DMpdA==}
    dependencies:
--- a/apps/api/requests.http
+++ b/apps/api/requests.http
@ -13,13 +13,22 @@ GET http://localhost:3002/v0/jobs/active HTTP/1.1
 ### Scrape Website
-POST https://api.firecrawl.dev/v0/scrape HTTP/1.1
+POST http://localhost:3002/v0/crawl HTTP/1.1
 Authorization: Bearer 
 content-type: application/json
 {
-    "url":"https://www.mendable.ai"
+    "url":"https://www.mendable.ai",
    "crawlerOptions": {
        "returnOnlyUrls": true
    }
 }
 ### Scrape Website
@ -34,7 +43,7 @@ content-type: application/json
 ### Check Job Status
-GET http://localhost:3002/v0/crawl/status/333ab225-dc3e-418b-9d4b-8fb833cbaf89 HTTP/1.1
+GET http://localhost:3002/v0/crawl/status/a6053912-d602-4709-841f-3d2cb46fea0a HTTP/1.1
 Authorization: Bearer 
 ### Get Job Result
@ -50,3 +59,12 @@ content-type: application/json
 ### Check Job Status
 GET https://api.firecrawl.dev/v0/crawl/status/cfcb71ac-23a3-4da5-bd85-d4e58b871d66
 Authorization: Bearer 
 ### Get Active Jobs Count
 GET http://localhost:3002/serverHealthCheck
 content-type: application/json
 ### Notify Server Health Check
 GET http://localhost:3002/serverHealthCheck/notify
 content-type: application/json
--- a/apps/api/src/tests/e2e/index.test.ts
+++ b/apps/api/src/tests/e2e/index.test.ts
@ -1,254 +0,0 @@
 import request from 'supertest';
 import { app } from '../../index';
 import dotenv from 'dotenv';
 dotenv.config();
 const TEST_URL = 'http://localhost:3002'
 describe('E2E Tests for API Routes', () => {
  describe('GET /', () => {
    it('should return Hello, world! message', async () => {
      const response = await request(TEST_URL).get('/');
      expect(response.statusCode).toBe(200);
      expect(response.text).toContain('SCRAPERS-JS: Hello, world! Fly.io');
    });
  });
  describe('GET /test', () => {
    it('should return Hello, world! message', async () => {
      const response = await request(TEST_URL).get('/test');
      expect(response.statusCode).toBe(200);
      expect(response.text).toContain('Hello, world!');
    });
  });
  describe('POST /v0/scrape', () => {
    it('should require authorization', async () => {
      const response = await request(app).post('/v0/scrape');
      expect(response.statusCode).toBe(401);
    });
    it('should return an error response with an invalid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/scrape')
        .set('Authorization', `Bearer invalid-api-key`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(response.statusCode).toBe(401);
    });
    it('should return a successful response with a valid preview token', async () => {
      const response = await request(TEST_URL)
        .post('/v0/scrape')
        .set('Authorization', `Bearer this_is_just_a_preview_token`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(response.statusCode).toBe(200);
    }, 10000); // 10 seconds timeout
    it('should return a successful response with a valid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/scrape')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
        await new Promise((r) => setTimeout(r, 2000));
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('data');
      expect(response.body.data).toHaveProperty('content');
      expect(response.body.data).toHaveProperty('markdown');
      expect(response.body.data).toHaveProperty('metadata');
      expect(response.body.data.content).toContain('🔥 FireCrawl');
    }, 30000); // 30 seconds timeout
    it('should return a successful response for a valid scrape with PDF file', async () => {
      const response = await request(TEST_URL)
        .post('/v0/scrape')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://arxiv.org/pdf/astro-ph/9301001.pdf' });
      await new Promise((r) => setTimeout(r, 6000));
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('data');
      expect(response.body.data).toHaveProperty('content');
      expect(response.body.data).toHaveProperty('metadata');
      expect(response.body.data.content).toContain('We present spectrophotometric observations of the Broad Line Radio Galaxy');
    }, 30000); // 30 seconds
    it('should return a successful response for a valid scrape with PDF file without explicit .pdf extension', async () => {
      const response = await request(TEST_URL)
        .post('/v0/scrape')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://arxiv.org/pdf/astro-ph/9301001' });
      await new Promise((r) => setTimeout(r, 6000));
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('data');
      expect(response.body.data).toHaveProperty('content');
      expect(response.body.data).toHaveProperty('metadata');
      expect(response.body.data.content).toContain('We present spectrophotometric observations of the Broad Line Radio Galaxy');
    }, 30000); // 30 seconds
  });
  describe('POST /v0/crawl', () => {
    it('should require authorization', async () => {
      const response = await request(TEST_URL).post('/v0/crawl');
      expect(response.statusCode).toBe(401);
    });
    it('should return an error response with an invalid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/crawl')
        .set('Authorization', `Bearer invalid-api-key`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
        expect(response.statusCode).toBe(401);
    });
    it('should return a successful response with a valid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/crawl')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('jobId');
      expect(response.body.jobId).toMatch(/^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/);
    });
    // Additional tests for insufficient credits?
  });
  describe('POST /v0/crawlWebsitePreview', () => {
    it('should require authorization', async () => {
      const response = await request(TEST_URL).post('/v0/crawlWebsitePreview');
      expect(response.statusCode).toBe(401);
    });
    it('should return an error response with an invalid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/crawlWebsitePreview')
        .set('Authorization', `Bearer invalid-api-key`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(response.statusCode).toBe(401);
    });
    it('should return a successful response with a valid API key', async () => {
      const response = await request(TEST_URL)
        .post('/v0/crawlWebsitePreview')
        .set('Authorization', `Bearer this_is_just_a_preview_token`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('jobId');
      expect(response.body.jobId).toMatch(/^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/);
    });
  });
  describe('GET /v0/crawl/status/:jobId', () => {
    it('should require authorization', async () => {
      const response = await request(TEST_URL).get('/v0/crawl/status/123');
      expect(response.statusCode).toBe(401);
    });
    it('should return an error response with an invalid API key', async () => {
      const response = await request(TEST_URL)
        .get('/v0/crawl/status/123')
        .set('Authorization', `Bearer invalid-api-key`);
      expect(response.statusCode).toBe(401);
    });
    it('should return Job not found for invalid job ID', async () => {
      const response = await request(TEST_URL)
        .get('/v0/crawl/status/invalidJobId')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
      expect(response.statusCode).toBe(404);
    });
    it('should return a successful response for a valid crawl job', async () => {
      const crawlResponse = await request(TEST_URL)
        .post('/v0/crawl')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://firecrawl.dev' });
      expect(crawlResponse.statusCode).toBe(200);
      const response = await request(TEST_URL)
        .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty('status');
        expect(response.body.status).toBe('active');
        // wait for 30 seconds
        await new Promise((r) => setTimeout(r, 30000));
        const completedResponse = await request(TEST_URL)
        .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
        expect(completedResponse.statusCode).toBe(200);
        expect(completedResponse.body).toHaveProperty('status');
        expect(completedResponse.body.status).toBe('completed');
        expect(completedResponse.body).toHaveProperty('data');
        expect(completedResponse.body.data[0]).toHaveProperty('content');
        expect(completedResponse.body.data[0]).toHaveProperty('markdown');
        expect(completedResponse.body.data[0]).toHaveProperty('metadata');
        expect(completedResponse.body.data[0].content).toContain('🔥 FireCrawl');
    }, 60000); // 60 seconds
    // it('should return a successful response for a valid crawl job with PDF content', async () => {
    // });
    it('should return a successful response for a valid crawl job with PDF files without explicit .pdf extension', async () => {
      const crawlResponse = await request(TEST_URL)
        .post('/v0/crawl')
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
        .set('Content-Type', 'application/json')
        .send({ url: 'https://arxiv.org/abs/astro-ph/9301001', crawlerOptions: { limit: 10, excludes: [ 'list/*', 'login', 'abs/*', 'static/*', 'about/*', 'archive/*' ] }});
      expect(crawlResponse.statusCode).toBe(200);
      const response = await request(TEST_URL)
        .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty('status');
        expect(response.body.status).toBe('active');
        // wait for 30 seconds
        await new Promise((r) => setTimeout(r, 30000));
        const completedResponse = await request(TEST_URL)
        .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
        .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
        expect(completedResponse.statusCode).toBe(200);
        expect(completedResponse.body).toHaveProperty('status');
        expect(completedResponse.body.status).toBe('completed');
        expect(completedResponse.body).toHaveProperty('data');
        expect(completedResponse.body.data.length).toBeGreaterThan(1);
        expect(completedResponse.body.data).toEqual(
          expect.arrayContaining([
            expect.objectContaining({
              content: expect.stringContaining('asymmetries might represent, for instance, preferred source orientations to our line of sight.')
            })
          ])
        );
    }, 60000); // 60 seconds
  });
  describe('GET /is-production', () => {
    it('should return the production status', async () => {
      const response = await request(TEST_URL).get('/is-production');
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty('isProduction');
    });
  });
 });
--- a/apps/api/src/tests/e2e_noAuth/index.test.ts
+++ b/apps/api/src/tests/e2e_noAuth/index.test.ts
@ -0,0 +1,213 @@
 import request from "supertest";
 import { app } from "../../index";
 import dotenv from "dotenv";
 const fs = require("fs");
 const path = require("path");
 dotenv.config();
 const TEST_URL = "http://127.0.0.1:3002";
 describe("E2E Tests for API Routes with No Authentication", () => {
  let originalEnv: NodeJS.ProcessEnv;
  // save original process.env
  beforeAll(() => {
    originalEnv = { ...process.env };
    process.env.USE_DB_AUTHENTICATION = "false";
    process.env.SUPABASE_ANON_TOKEN = "";
    process.env.SUPABASE_URL = "";
    process.env.SUPABASE_SERVICE_TOKEN = "";
    process.env.SCRAPING_BEE_API_KEY = "";
    process.env.OPENAI_API_KEY = "";
    process.env.BULL_AUTH_KEY = "";
    process.env.LOGTAIL_KEY = "";
    process.env.PLAYWRIGHT_MICROSERVICE_URL = "";
    process.env.LLAMAPARSE_API_KEY = "";
    process.env.TEST_API_KEY = "";
  });
  // restore original process.env
  afterAll(() => {
    process.env = originalEnv;
  });
  describe("GET /", () => {
    it("should return Hello, world! message", async () => {
      const response = await request(TEST_URL).get("/");
      expect(response.statusCode).toBe(200);
      expect(response.text).toContain("SCRAPERS-JS: Hello, world! Fly.io");
    });
  });
  describe("GET /test", () => {
    it("should return Hello, world! message", async () => {
      const response = await request(TEST_URL).get("/test");
      expect(response.statusCode).toBe(200);
      expect(response.text).toContain("Hello, world!");
    });
  });
  describe("POST /v0/scrape", () => {
    it("should not require authorization", async () => {
      const response = await request(TEST_URL).post("/v0/scrape");
      expect(response.statusCode).not.toBe(401);
    });
    it("should return an error for a blocklisted URL without requiring authorization", async () => {
      const blocklistedUrl = "https://facebook.com/fake-test";
      const response = await request(TEST_URL)
        .post("/v0/scrape")
        .set("Content-Type", "application/json")
        .send({ url: blocklistedUrl });
      expect(response.statusCode).toBe(403);
      expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
    });
    it("should return a successful response", async () => {
      const response = await request(TEST_URL)
        .post("/v0/scrape")
        .set("Content-Type", "application/json")
        .send({ url: "https://firecrawl.dev" });
      expect(response.statusCode).toBe(200);
    }, 10000); // 10 seconds timeout
  });
  describe("POST /v0/crawl", () => {
    it("should not require authorization", async () => {
      const response = await request(TEST_URL).post("/v0/crawl");
      expect(response.statusCode).not.toBe(401);
    });
    it("should return an error for a blocklisted URL", async () => {
      const blocklistedUrl = "https://twitter.com/fake-test";
      const response = await request(TEST_URL)
        .post("/v0/crawl")
        .set("Content-Type", "application/json")
        .send({ url: blocklistedUrl });
      expect(response.statusCode).toBe(403);
      expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
    });
    it("should return a successful response", async () => {
      const response = await request(TEST_URL)
        .post("/v0/crawl")
        .set("Content-Type", "application/json")
        .send({ url: "https://firecrawl.dev" });
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty("jobId");
      expect(response.body.jobId).toMatch(
        /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/
      );
    });
  });
  describe("POST /v0/crawlWebsitePreview", () => {
    it("should not require authorization", async () => {
      const response = await request(TEST_URL).post("/v0/crawlWebsitePreview");
      expect(response.statusCode).not.toBe(401);
    });
    it("should return an error for a blocklisted URL", async () => {
      const blocklistedUrl = "https://instagram.com/fake-test";
      const response = await request(TEST_URL)
        .post("/v0/crawlWebsitePreview")
        .set("Content-Type", "application/json")
        .send({ url: blocklistedUrl });
      expect(response.statusCode).toBe(403);
      expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
    });
    it("should return a successful response", async () => {
      const response = await request(TEST_URL)
        .post("/v0/crawlWebsitePreview")
        .set("Content-Type", "application/json")
        .send({ url: "https://firecrawl.dev" });
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty("jobId");
      expect(response.body.jobId).toMatch(
        /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/
      );
    });
  });
  describe("POST /v0/search", () => {
    it("should require not authorization", async () => {
      const response = await request(TEST_URL).post("/v0/search");
      expect(response.statusCode).not.toBe(401);
    });
    it("should return no error response with an invalid API key", async () => {
      const response = await request(TEST_URL)
        .post("/v0/search")
        .set("Authorization", `Bearer invalid-api-key`)
        .set("Content-Type", "application/json")
        .send({ query: "test" });
      expect(response.statusCode).not.toBe(401);
    });
    it("should return a successful response without a valid API key", async () => {
      const response = await request(TEST_URL)
        .post("/v0/search")
        .set("Content-Type", "application/json")
        .send({ query: "test" });
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty("success");
      expect(response.body.success).toBe(true);
      expect(response.body).toHaveProperty("data");
    }, 20000);
  });
  describe("GET /v0/crawl/status/:jobId", () => {
    it("should not require authorization", async () => {
      const response = await request(TEST_URL).get("/v0/crawl/status/123");
      expect(response.statusCode).not.toBe(401);
    });
    it("should return Job not found for invalid job ID", async () => {
      const response = await request(TEST_URL).get(
        "/v0/crawl/status/invalidJobId"
      );
      expect(response.statusCode).toBe(404);
    });
    it("should return a successful response for a valid crawl job", async () => {
      const crawlResponse = await request(TEST_URL)
        .post("/v0/crawl")
        .set("Content-Type", "application/json")
        .send({ url: "https://firecrawl.dev" });
      expect(crawlResponse.statusCode).toBe(200);
      const response = await request(TEST_URL).get(
        `/v0/crawl/status/${crawlResponse.body.jobId}`
      );
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty("status");
      expect(response.body.status).toBe("active");
      // wait for 30 seconds
      await new Promise((r) => setTimeout(r, 30000));
      const completedResponse = await request(TEST_URL).get(
        `/v0/crawl/status/${crawlResponse.body.jobId}`
      );
      expect(completedResponse.statusCode).toBe(200);
      expect(completedResponse.body).toHaveProperty("status");
      expect(completedResponse.body.status).toBe("completed");
      expect(completedResponse.body).toHaveProperty("data");
      expect(completedResponse.body.data[0]).toHaveProperty("content");
      expect(completedResponse.body.data[0]).toHaveProperty("markdown");
      expect(completedResponse.body.data[0]).toHaveProperty("metadata");
      expect(completedResponse.body.data[0].content).toContain("🔥 FireCrawl");
    }, 60000); // 60 seconds
  });
  describe("GET /is-production", () => {
    it("should return the production status", async () => {
      const response = await request(TEST_URL).get("/is-production");
      expect(response.statusCode).toBe(200);
      expect(response.body).toHaveProperty("isProduction");
    });
  });
 });
--- a/apps/api/src/tests/e2e_withAuth/index.test.ts
+++ b/apps/api/src/tests/e2e_withAuth/index.test.ts
@ -0,0 +1,328 @@
 import request from "supertest";
 import { app } from "../../index";
 import dotenv from "dotenv";
 dotenv.config();
 // const TEST_URL = 'http://localhost:3002'
 const TEST_URL = "http://127.0.0.1:3002";
  describe("E2E Tests for API Routes", () => {
    beforeAll(() => {
      process.env.USE_DB_AUTHENTICATION = "true";
    });
    afterAll(() => {
      delete process.env.USE_DB_AUTHENTICATION;
    });
    describe("GET /", () => {
      it("should return Hello, world! message", async () => {
        const response = await request(TEST_URL).get("/");
        expect(response.statusCode).toBe(200);
        expect(response.text).toContain("SCRAPERS-JS: Hello, world! Fly.io");
      });
    });
    describe("GET /test", () => {
      it("should return Hello, world! message", async () => {
        const response = await request(TEST_URL).get("/test");
        expect(response.statusCode).toBe(200);
        expect(response.text).toContain("Hello, world!");
      });
    });
    describe("POST /v0/scrape", () => {
      it("should require authorization", async () => {
        const response = await request(app).post("/v0/scrape");
        expect(response.statusCode).toBe(401);
      });
      it("should return an error response with an invalid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/scrape")
          .set("Authorization", `Bearer invalid-api-key`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(401);
      });
      it("should return an error for a blocklisted URL", async () => {
        const blocklistedUrl = "https://facebook.com/fake-test";
        const response = await request(TEST_URL)
          .post("/v0/scrape")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: blocklistedUrl });
        expect(response.statusCode).toBe(403);
        expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
      });
      it("should return a successful response with a valid preview token", async () => {
        const response = await request(TEST_URL)
          .post("/v0/scrape")
          .set("Authorization", `Bearer this_is_just_a_preview_token`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(200);
      }, 10000); // 10 seconds timeout
      it("should return a successful response with a valid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/scrape")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("data");
        expect(response.body.data).toHaveProperty("content");
        expect(response.body.data).toHaveProperty("markdown");
        expect(response.body.data).toHaveProperty("metadata");
        expect(response.body.data.content).toContain("🔥 FireCrawl");
      }, 30000); // 30 seconds timeout
      it('should return a successful response for a valid scrape with PDF file', async () => {
        const response = await request(TEST_URL)
          .post('/v0/scrape')
          .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
          .set('Content-Type', 'application/json')
          .send({ url: 'https://arxiv.org/pdf/astro-ph/9301001.pdf' });
        await new Promise((r) => setTimeout(r, 6000));
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty('data');
        expect(response.body.data).toHaveProperty('content');
        expect(response.body.data).toHaveProperty('metadata');
        expect(response.body.data.content).toContain('We present spectrophotometric observations of the Broad Line Radio Galaxy');
      }, 30000); // 30 seconds
      it('should return a successful response for a valid scrape with PDF file without explicit .pdf extension', async () => {
        const response = await request(TEST_URL)
          .post('/v0/scrape')
          .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
          .set('Content-Type', 'application/json')
          .send({ url: 'https://arxiv.org/pdf/astro-ph/9301001' });
        await new Promise((r) => setTimeout(r, 6000));
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty('data');
        expect(response.body.data).toHaveProperty('content');
        expect(response.body.data).toHaveProperty('metadata');
        expect(response.body.data.content).toContain('We present spectrophotometric observations of the Broad Line Radio Galaxy');
      }, 30000); // 30 seconds
    });
    describe("POST /v0/crawl", () => {
      it("should require authorization", async () => {
        const response = await request(TEST_URL).post("/v0/crawl");
        expect(response.statusCode).toBe(401);
      });
      it("should return an error response with an invalid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/crawl")
          .set("Authorization", `Bearer invalid-api-key`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(401);
      });
      it("should return an error for a blocklisted URL", async () => {
        const blocklistedUrl = "https://twitter.com/fake-test";
        const response = await request(TEST_URL)
          .post("/v0/crawl")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: blocklistedUrl });
        expect(response.statusCode).toBe(403);
        expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
      });
      it("should return a successful response with a valid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/crawl")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("jobId");
        expect(response.body.jobId).toMatch(
          /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/
        );
      });
      // Additional tests for insufficient credits?
    });
    describe("POST /v0/crawlWebsitePreview", () => {
      it("should require authorization", async () => {
        const response = await request(TEST_URL).post(
          "/v0/crawlWebsitePreview"
        );
        expect(response.statusCode).toBe(401);
      });
      it("should return an error response with an invalid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/crawlWebsitePreview")
          .set("Authorization", `Bearer invalid-api-key`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(401);
      });
      it("should return an error for a blocklisted URL", async () => {
        const blocklistedUrl = "https://instagram.com/fake-test";
        const response = await request(TEST_URL)
          .post("/v0/crawlWebsitePreview")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: blocklistedUrl });
        expect(response.statusCode).toBe(403);
        expect(response.body.error).toContain("Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.");
      });
      it("should return a successful response with a valid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/crawlWebsitePreview")
          .set("Authorization", `Bearer this_is_just_a_preview_token`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("jobId");
        expect(response.body.jobId).toMatch(
          /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$/
        );
      });
    });
    describe("POST /v0/search", () => {
      it("should require authorization", async () => {
        const response = await request(TEST_URL).post("/v0/search");
        expect(response.statusCode).toBe(401);
      });
      it("should return an error response with an invalid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/search")
          .set("Authorization", `Bearer invalid-api-key`)
          .set("Content-Type", "application/json")
          .send({ query: "test" });
        expect(response.statusCode).toBe(401);
      });
      it("should return a successful response with a valid API key", async () => {
        const response = await request(TEST_URL)
          .post("/v0/search")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ query: "test" });
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("success");
        expect(response.body.success).toBe(true);
        expect(response.body).toHaveProperty("data");
      }, 30000); // 30 seconds timeout
    });
    describe("GET /v0/crawl/status/:jobId", () => {
      it("should require authorization", async () => {
        const response = await request(TEST_URL).get("/v0/crawl/status/123");
        expect(response.statusCode).toBe(401);
      });
      it("should return an error response with an invalid API key", async () => {
        const response = await request(TEST_URL)
          .get("/v0/crawl/status/123")
          .set("Authorization", `Bearer invalid-api-key`);
        expect(response.statusCode).toBe(401);
      });
      it("should return Job not found for invalid job ID", async () => {
        const response = await request(TEST_URL)
          .get("/v0/crawl/status/invalidJobId")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`);
        expect(response.statusCode).toBe(404);
      });
      it("should return a successful response for a valid crawl job", async () => {
        const crawlResponse = await request(TEST_URL)
          .post("/v0/crawl")
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`)
          .set("Content-Type", "application/json")
          .send({ url: "https://firecrawl.dev" });
        expect(crawlResponse.statusCode).toBe(200);
        const response = await request(TEST_URL)
          .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`);
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("status");
        expect(response.body.status).toBe("active");
        // wait for 30 seconds
        await new Promise((r) => setTimeout(r, 30000));
        const completedResponse = await request(TEST_URL)
          .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
          .set("Authorization", `Bearer ${process.env.TEST_API_KEY}`);
        expect(completedResponse.statusCode).toBe(200);
        expect(completedResponse.body).toHaveProperty("status");
        expect(completedResponse.body.status).toBe("completed");
        expect(completedResponse.body).toHaveProperty("data");
        expect(completedResponse.body.data[0]).toHaveProperty("content");
        expect(completedResponse.body.data[0]).toHaveProperty("markdown");
        expect(completedResponse.body.data[0]).toHaveProperty("metadata");
        expect(completedResponse.body.data[0].content).toContain(
          "🔥 FireCrawl"
        );
      }, 60000); // 60 seconds
      it('should return a successful response for a valid crawl job with PDF files without explicit .pdf extension', async () => {
        const crawlResponse = await request(TEST_URL)
          .post('/v0/crawl')
          .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`)
          .set('Content-Type', 'application/json')
          .send({ url: 'https://arxiv.org/abs/astro-ph/9301001', crawlerOptions: { limit: 10, excludes: [ 'list/*', 'login', 'abs/*', 'static/*', 'about/*', 'archive/*' ] }});
        expect(crawlResponse.statusCode).toBe(200);
        const response = await request(TEST_URL)
          .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
          .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
          expect(response.statusCode).toBe(200);
          expect(response.body).toHaveProperty('status');
          expect(response.body.status).toBe('active');
          // wait for 30 seconds
          await new Promise((r) => setTimeout(r, 30000));
          const completedResponse = await request(TEST_URL)
          .get(`/v0/crawl/status/${crawlResponse.body.jobId}`)
          .set('Authorization', `Bearer ${process.env.TEST_API_KEY}`);
          expect(completedResponse.statusCode).toBe(200);
          expect(completedResponse.body).toHaveProperty('status');
          expect(completedResponse.body.status).toBe('completed');
          expect(completedResponse.body).toHaveProperty('data');
          expect(completedResponse.body.data.length).toBeGreaterThan(1);
          expect(completedResponse.body.data).toEqual(
            expect.arrayContaining([
              expect.objectContaining({
                content: expect.stringContaining('asymmetries might represent, for instance, preferred source orientations to our line of sight.')
              })
            ])
          );
      }, 60000); // 60 seconds
    });
    describe("GET /is-production", () => {
      it("should return the production status", async () => {
        const response = await request(TEST_URL).get("/is-production");
        expect(response.statusCode).toBe(200);
        expect(response.body).toHaveProperty("isProduction");
      });
    });
  });
--- a/apps/api/src/controllers/auth.ts
+++ b/apps/api/src/controllers/auth.ts
@ -0,0 +1,84 @@
 import { parseApi } from "../../src/lib/parseApi";
 import { getRateLimiter } from "../../src/services/rate-limiter";
 import { AuthResponse, RateLimiterMode } from "../../src/types";
 import { supabase_service } from "../../src/services/supabase";
 import { withAuth } from "../../src/lib/withAuth";
 export async function authenticateUser(req, res, mode?: RateLimiterMode) : Promise<AuthResponse> {
  return withAuth(supaAuthenticateUser)(req, res, mode);
 }
 export async function supaAuthenticateUser(
  req,
  res,
  mode?: RateLimiterMode
 ): Promise<{
  success: boolean;
  team_id?: string;
  error?: string;
  status?: number;
 }> {
  const authHeader = req.headers.authorization;
  if (!authHeader) {
    return { success: false, error: "Unauthorized", status: 401 };
  }
  const token = authHeader.split(" ")[1]; // Extract the token from "Bearer <token>"
  if (!token) {
    return {
      success: false,
      error: "Unauthorized: Token missing",
      status: 401,
    };
  }
  try {
    const incomingIP = (req.headers["x-forwarded-for"] ||
      req.socket.remoteAddress) as string;
    const iptoken = incomingIP + token;
    await getRateLimiter(
      token === "this_is_just_a_preview_token" ? RateLimiterMode.Preview : mode
    ).consume(iptoken);
  } catch (rateLimiterRes) {
    console.error(rateLimiterRes);
    return {
      success: false,
      error: "Rate limit exceeded. Too many requests, try again in 1 minute.",
      status: 429,
    };
  }
  if (
    token === "this_is_just_a_preview_token" &&
    (mode === RateLimiterMode.Scrape || mode === RateLimiterMode.Preview || mode === RateLimiterMode.Search)
  ) {
    return { success: true, team_id: "preview" };
    // check the origin of the request and make sure its from firecrawl.dev
    // const origin = req.headers.origin;
    // if (origin && origin.includes("firecrawl.dev")){
    //   return { success: true, team_id: "preview" };
    // }
    // if(process.env.ENV !== "production") {
    //   return { success: true, team_id: "preview" };
    // }
    // return { success: false, error: "Unauthorized: Invalid token", status: 401 };
  }
  const normalizedApi = parseApi(token);
  // make sure api key is valid, based on the api_keys table in supabase
  const { data, error } = await supabase_service
    .from("api_keys")
    .select("*")
    .eq("key", normalizedApi);
  if (error || !data || data.length === 0) {
    return {
      success: false,
      error: "Unauthorized: Invalid token",
      status: 401,
    };
  }
  return { success: true, team_id: data[0].team_id };
 }
--- a/apps/api/src/controllers/crawl-status.ts
+++ b/apps/api/src/controllers/crawl-status.ts
@ -0,0 +1,36 @@
 import { Request, Response } from "express";
 import { authenticateUser } from "./auth";
 import { RateLimiterMode } from "../../src/types";
 import { addWebScraperJob } from "../../src/services/queue-jobs";
 import { getWebScraperQueue } from "../../src/services/queue-service";
 export async function crawlStatusController(req: Request, res: Response) {
  try {
    const { success, team_id, error, status } = await authenticateUser(
      req,
      res,
      RateLimiterMode.CrawlStatus
    );
    if (!success) {
      return res.status(status).json({ error });
    }
    const job = await getWebScraperQueue().getJob(req.params.jobId);
    if (!job) {
      return res.status(404).json({ error: "Job not found" });
    }
    const { current, current_url, total, current_step } = await job.progress();
    res.json({
      status: await job.getState(),
      // progress: job.progress(),
      current: current,
      current_url: current_url,
      current_step: current_step,
      total: total,
      data: job.returnvalue,
    });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/controllers/crawl.ts
+++ b/apps/api/src/controllers/crawl.ts
@ -0,0 +1,83 @@
 import { Request, Response } from "express";
 import { WebScraperDataProvider } from "../../src/scraper/WebScraper";
 import { billTeam } from "../../src/services/billing/credit_billing";
 import { checkTeamCredits } from "../../src/services/billing/credit_billing";
 import { authenticateUser } from "./auth";
 import { RateLimiterMode } from "../../src/types";
 import { addWebScraperJob } from "../../src/services/queue-jobs";
 import { isUrlBlocked } from "../../src/scraper/WebScraper/utils/blocklist";
 export async function crawlController(req: Request, res: Response) {
  try {
    const { success, team_id, error, status } = await authenticateUser(
      req,
      res,
      RateLimiterMode.Crawl
    );
    if (!success) {
      return res.status(status).json({ error });
    }
    const { success: creditsCheckSuccess, message: creditsCheckMessage } =
      await checkTeamCredits(team_id, 1);
    if (!creditsCheckSuccess) {
      return res.status(402).json({ error: "Insufficient credits" });
    }
    const url = req.body.url;
    if (!url) {
      return res.status(400).json({ error: "Url is required" });
    }
    if (isUrlBlocked(url)) {
      return res.status(403).json({ error: "Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it." });
    }
    const mode = req.body.mode ?? "crawl";
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    if (mode === "single_urls" && !url.includes(",")) {
      try {
        const a = new WebScraperDataProvider();
        await a.setOptions({
          mode: "single_urls",
          urls: [url],
          crawlerOptions: {
            returnOnlyUrls: true,
          },
          pageOptions: pageOptions,
        });
        const docs = await a.getDocuments(false, (progress) => {
          job.progress({
            current: progress.current,
            total: progress.total,
            current_step: "SCRAPING",
            current_url: progress.currentDocumentUrl,
          });
        });
        return res.json({
          success: true,
          documents: docs,
        });
      } catch (error) {
        console.error(error);
        return res.status(500).json({ error: error.message });
      }
    }
    const job = await addWebScraperJob({
      url: url,
      mode: mode ?? "crawl", // fix for single urls not working
      crawlerOptions: { ...crawlerOptions },
      team_id: team_id,
      pageOptions: pageOptions,
      origin: req.body.origin ?? "api",
    });
    res.json({ jobId: job.id });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/controllers/crawlPreview.ts
+++ b/apps/api/src/controllers/crawlPreview.ts
@ -0,0 +1,45 @@
 import { Request, Response } from "express";
 import { authenticateUser } from "./auth";
 import { RateLimiterMode } from "../../src/types";
 import { addWebScraperJob } from "../../src/services/queue-jobs";
 import { isUrlBlocked } from "../../src/scraper/WebScraper/utils/blocklist";
 export async function crawlPreviewController(req: Request, res: Response) {
  try {
    const { success, team_id, error, status } = await authenticateUser(
      req,
      res,
      RateLimiterMode.Preview
    );
    if (!success) {
      return res.status(status).json({ error });
    }
    // authenticate on supabase
    const url = req.body.url;
    if (!url) {
      return res.status(400).json({ error: "Url is required" });
    }
    if (isUrlBlocked(url)) {
      return res.status(403).json({ error: "Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it." });
    }
    const mode = req.body.mode ?? "crawl";
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    const job = await addWebScraperJob({
      url: url,
      mode: mode ?? "crawl", // fix for single urls not working
      crawlerOptions: { ...crawlerOptions, limit: 5, maxCrawledLinks: 5 },
      team_id: "preview",
      pageOptions: pageOptions,
      origin: "website-preview",
    });
    res.json({ jobId: job.id });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/controllers/scrape.ts
+++ b/apps/api/src/controllers/scrape.ts
@ -0,0 +1,121 @@
 import { Request, Response } from "express";
 import { WebScraperDataProvider } from "../scraper/WebScraper";
 import { billTeam, checkTeamCredits } from "../services/billing/credit_billing";
 import { authenticateUser } from "./auth";
 import { RateLimiterMode } from "../types";
 import { logJob } from "../services/logging/log_job";
 import { Document } from "../lib/entities";
 import { isUrlBlocked } from "../scraper/WebScraper/utils/blocklist"; // Import the isUrlBlocked function
 export async function scrapeHelper(
  req: Request,
  team_id: string,
  crawlerOptions: any,
  pageOptions: any
 ): Promise<{
  success: boolean;
  error?: string;
  data?: Document;
  returnCode: number;
 }> {
  const url = req.body.url;
  if (!url) {
    return { success: false, error: "Url is required", returnCode: 400 };
  }
  if (isUrlBlocked(url)) {
    return { success: false, error: "Firecrawl currently does not support social media scraping due to policy restrictions. We're actively working on building support for it.", returnCode: 403 };
  }
  const a = new WebScraperDataProvider();
  await a.setOptions({
    mode: "single_urls",
    urls: [url],
    crawlerOptions: {
      ...crawlerOptions,
    },
    pageOptions: pageOptions,
  });
  const docs = await a.getDocuments(false);
  // make sure doc.content is not empty
  const filteredDocs = docs.filter(
    (doc: { content?: string }) => doc.content && doc.content.trim().length > 0
  );
  if (filteredDocs.length === 0) {
    return { success: true, error: "No page found", returnCode: 200 };
  }
  const billingResult = await billTeam(
    team_id,
    filteredDocs.length
  );
  if (!billingResult.success) {
    return {
      success: false,
      error:
        "Failed to bill team. Insufficient credits or subscription not found.",
      returnCode: 402,
    };
  }
  return {
    success: true,
    data: filteredDocs[0],
    returnCode: 200,
  };
 }
 export async function scrapeController(req: Request, res: Response) {
  try {
    // make sure to authenticate user first, Bearer <token>
    const { success, team_id, error, status } = await authenticateUser(
      req,
      res,
      RateLimiterMode.Scrape
    );
    if (!success) {
      return res.status(status).json({ error });
    }
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    const origin = req.body.origin ?? "api";
    try {
      const { success: creditsCheckSuccess, message: creditsCheckMessage } =
        await checkTeamCredits(team_id, 1);
      if (!creditsCheckSuccess) {
        return res.status(402).json({ error: "Insufficient credits" });
      }
    } catch (error) {
      console.error(error);
      return res.status(500).json({ error: "Internal server error" });
    }
    const startTime = new Date().getTime();
    const result = await scrapeHelper(
      req,
      team_id,
      crawlerOptions,
      pageOptions
    );
    const endTime = new Date().getTime();
    const timeTakenInSeconds = (endTime - startTime) / 1000;
    logJob({
      success: result.success,
      message: result.error,
      num_docs: 1,
      docs: [result.data],
      time_taken: timeTakenInSeconds,
      team_id: team_id,
      mode: "scrape",
      url: req.body.url,
      crawlerOptions: crawlerOptions,
      pageOptions: pageOptions,
      origin: origin,
    });
    return res.status(result.returnCode).json(result);
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/controllers/search.ts
+++ b/apps/api/src/controllers/search.ts
@ -0,0 +1,165 @@
 import { Request, Response } from "express";
 import { WebScraperDataProvider } from "../scraper/WebScraper";
 import { billTeam, checkTeamCredits } from "../services/billing/credit_billing";
 import { authenticateUser } from "./auth";
 import { RateLimiterMode } from "../types";
 import { logJob } from "../services/logging/log_job";
 import { PageOptions, SearchOptions } from "../lib/entities";
 import { search } from "../search";
 import { isUrlBlocked } from "../scraper/WebScraper/utils/blocklist";
 export async function searchHelper(
  req: Request,
  team_id: string,
  crawlerOptions: any,
  pageOptions: PageOptions,
  searchOptions: SearchOptions
 ): Promise<{
  success: boolean;
  error?: string;
  data?: any;
  returnCode: number;
 }> {
  const query = req.body.query;
  const advanced = false;
  if (!query) {
    return { success: false, error: "Query is required", returnCode: 400 };
  }
  const tbs = searchOptions.tbs ?? null;
  const filter = searchOptions.filter ?? null;
  let res = await search({
    query: query,
    advanced: advanced,
    num_results: searchOptions.limit ?? 7,
    tbs: tbs,
    filter: filter,
    lang: searchOptions.lang ?? "en",
    country: searchOptions.country ?? "us",
    location: searchOptions.location,
  });
  let justSearch = pageOptions.fetchPageContent === false;
  if (justSearch) {
    return { success: true, data: res, returnCode: 200 };
  }
  res = res.filter((r) => !isUrlBlocked(r.url));
  if (res.length === 0) {
    return { success: true, error: "No search results found", returnCode: 200 };
  }
  // filter out social media links
  const a = new WebScraperDataProvider();
  await a.setOptions({
    mode: "single_urls",
    urls: res.map((r) => r.url),
    crawlerOptions: {
      ...crawlerOptions,
    },
    pageOptions: {
      ...pageOptions,
      onlyMainContent: pageOptions?.onlyMainContent ?? true,
      fetchPageContent: pageOptions?.fetchPageContent ?? true,
      fallback: false,
    },
  });
  const docs = await a.getDocuments(true);
  if (docs.length === 0) {
    return { success: true, error: "No search results found", returnCode: 200 };
  }
  // make sure doc.content is not empty
  const filteredDocs = docs.filter(
    (doc: { content?: string }) => doc.content && doc.content.trim().length > 0
  );
  if (filteredDocs.length === 0) {
    return { success: true, error: "No page found", returnCode: 200 };
  }
  const billingResult = await billTeam(
    team_id,
    filteredDocs.length
  );
  if (!billingResult.success) {
    return {
      success: false,
      error:
        "Failed to bill team. Insufficient credits or subscription not found.",
      returnCode: 402,
    };
  }
  return {
    success: true,
    data: filteredDocs,
    returnCode: 200,
  };
 }
 export async function searchController(req: Request, res: Response) {
  try {
    // make sure to authenticate user first, Bearer <token>
    const { success, team_id, error, status } = await authenticateUser(
      req,
      res,
      RateLimiterMode.Search
    );
    if (!success) {
      return res.status(status).json({ error });
    }
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? {
      onlyMainContent: true,
      fetchPageContent: true,
      fallback: false,
    };
    const origin = req.body.origin ?? "api";
    const searchOptions = req.body.searchOptions ?? { limit: 7 };
    try {
      const { success: creditsCheckSuccess, message: creditsCheckMessage } =
        await checkTeamCredits(team_id, 1);
      if (!creditsCheckSuccess) {
        return res.status(402).json({ error: "Insufficient credits" });
      }
    } catch (error) {
      console.error(error);
      return res.status(500).json({ error: "Internal server error" });
    }
    const startTime = new Date().getTime();
    const result = await searchHelper(
      req,
      team_id,
      crawlerOptions,
      pageOptions,
      searchOptions
    );
    const endTime = new Date().getTime();
    const timeTakenInSeconds = (endTime - startTime) / 1000;
    logJob({
      success: result.success,
      message: result.error,
      num_docs: result.data.length,
      docs: result.data,
      time_taken: timeTakenInSeconds,
      team_id: team_id,
      mode: "search",
      url: req.body.query,
      crawlerOptions: crawlerOptions,
      pageOptions: pageOptions,
      origin: origin,
    });
    return res.status(result.returnCode).json(result);
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/controllers/status.ts
+++ b/apps/api/src/controllers/status.ts
@ -0,0 +1,25 @@
 import { Request, Response } from "express";
 import { getWebScraperQueue } from "../../src/services/queue-service";
 export async function crawlJobStatusPreviewController(req: Request, res: Response) {
  try {
    const job = await getWebScraperQueue().getJob(req.params.jobId);
    if (!job) {
      return res.status(404).json({ error: "Job not found" });
    }
    const { current, current_url, total, current_step } = await job.progress();
    res.json({
      status: await job.getState(),
      // progress: job.progress(),
      current: current,
      current_url: current_url,
      current_step: current_step,
      total: total,
      data: job.returnvalue,
    });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 }
--- a/apps/api/src/index.ts
+++ b/apps/api/src/index.ts
@ -3,20 +3,14 @@ import bodyParser from "body-parser";
 import cors from "cors";
 import "dotenv/config";
 import { getWebScraperQueue } from "./services/queue-service";
-import { addWebScraperJob } from "./services/queue-jobs";
+import { redisClient } from "./services/rate-limiter";
-import { supabase_service } from "./services/supabase";
+import { v0Router } from "./routes/v0";
 import { WebScraperDataProvider } from "./scraper/WebScraper";
 import { billTeam, checkTeamCredits } from "./services/billing/credit_billing";
 import { getRateLimiter, redisClient } from "./services/rate-limiter";
 import { parseApi } from "./lib/parseApi";
 const { createBullBoard } = require("@bull-board/api");
 const { BullAdapter } = require("@bull-board/api/bullAdapter");
 const { ExpressAdapter } = require("@bull-board/express");
 export const app = express();
 global.isProduction = process.env.IS_PRODUCTION === "true";
 app.use(bodyParser.urlencoded({ extended: true }));
@ -46,275 +40,20 @@ app.get("/test", async (req, res) => {
  res.send("Hello, world!");
 });
-async function authenticateUser(req, res, mode?: string): Promise<{ success: boolean, team_id?: string, error?: string, status?: number }> {
+// register router
-  const authHeader = req.headers.authorization;
+app.use(v0Router);
  if (!authHeader) {
    return { success: false, error: "Unauthorized", status: 401 };
  }
  const token = authHeader.split(" ")[1]; // Extract the token from "Bearer <token>"
  if (!token) {
    return { success: false, error: "Unauthorized: Token missing", status: 401 };
  }
  try {
    const incomingIP = (req.headers["x-forwarded-for"] ||
      req.socket.remoteAddress) as string;
    const iptoken = incomingIP + token;
    await getRateLimiter(
      token === "this_is_just_a_preview_token" ? true : false
    ).consume(iptoken);
  } catch (rateLimiterRes) {
    console.error(rateLimiterRes);
    return { success: false, error: "Rate limit exceeded. Too many requests, try again in 1 minute.", status: 429 };
  }
  if (token === "this_is_just_a_preview_token" && mode === "scrape") {
    return { success: true, team_id: "preview" };
  }
  const normalizedApi = parseApi(token);
  // make sure api key is valid, based on the api_keys table in supabase
  const { data, error } = await supabase_service
    .from("api_keys")
    .select("*")
    .eq("key", normalizedApi);
  if (error || !data || data.length === 0) {
    return { success: false, error: "Unauthorized: Invalid token", status: 401 };
  }
  return { success: true, team_id: data[0].team_id };
 }
 app.post("/v0/scrape", async (req, res) => {
  try {
    // make sure to authenticate user first, Bearer <token>
    const { success, team_id, error, status } = await authenticateUser(req, res, "scrape");
    if (!success) {
      return res.status(status).json({ error });
    }
    const crawlerOptions = req.body.crawlerOptions ?? {};
    try {
      const { success: creditsCheckSuccess, message: creditsCheckMessage } =
        await checkTeamCredits(team_id, 1);
      if (!creditsCheckSuccess) {
        return res.status(402).json({ error: "Insufficient credits" });
      }
    } catch (error) {
      console.error(error);
      return res.status(500).json({ error: "Internal server error" });
    }
    // authenticate on supabase
    const url = req.body.url;
    if (!url) {
      return res.status(400).json({ error: "Url is required" });
    }
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    try {
      const a = new WebScraperDataProvider();
      await a.setOptions({
        mode: "single_urls",
        urls: [url],
        crawlerOptions: {
          ...crawlerOptions,
        },
        pageOptions: pageOptions,
      });
      const docs = await a.getDocuments(false);
      // make sure doc.content is not empty
      const filteredDocs = docs.filter(
        (doc: { content?: string }) =>
          doc.content && doc.content.trim().length > 0
      );
      if (filteredDocs.length === 0) {
        return res.status(200).json({ success: true, data: [] });
      }
      const { success, credit_usage } = await billTeam(
        team_id,
        filteredDocs.length
      );
      if (!success) {
        // throw new Error("Failed to bill team, no subscription was found");
        // return {
        //   success: false,
        //   message: "Failed to bill team, no subscription was found",
        //   docs: [],
        // };
        return res
          .status(402)
          .json({ error: "Failed to bill, no subscription was found" });
      }
      return res.json({
        success: true,
        data: filteredDocs[0],
      });
    } catch (error) {
      console.error(error);
      return res.status(500).json({ error: error.message });
    }
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 app.post("/v0/crawl", async (req, res) => {
  try {
    const { success, team_id, error, status } = await authenticateUser(req, res, "crawl");
    if (!success) {
      return res.status(status).json({ error });
    }
    const { success: creditsCheckSuccess, message: creditsCheckMessage } =
      await checkTeamCredits(team_id, 1);
    if (!creditsCheckSuccess) {
      return res.status(402).json({ error: "Insufficient credits" });
    }
    // authenticate on supabase
    const url = req.body.url;
    if (!url) {
      return res.status(400).json({ error: "Url is required" });
    }
    const mode = req.body.mode ?? "crawl";
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    if (mode === "single_urls" && !url.includes(",")) {
      try {
        const a = new WebScraperDataProvider();
        await a.setOptions({
          mode: "single_urls",
          urls: [url],
          crawlerOptions: {
            returnOnlyUrls: true,
          },
          pageOptions: pageOptions,
        });
        const docs = await a.getDocuments(false, (progress) => {
          job.progress({
            current: progress.current,
            total: progress.total,
            current_step: "SCRAPING",
            current_url: progress.currentDocumentUrl,
          });
        });
        return res.json({
          success: true,
          documents: docs,
        });
      } catch (error) {
        console.error(error);
        return res.status(500).json({ error: error.message });
      }
    }
    const job = await addWebScraperJob({
      url: url,
      mode: mode ?? "crawl", // fix for single urls not working
      crawlerOptions: { ...crawlerOptions },
      team_id: team_id,
      pageOptions: pageOptions,
    });
    res.json({ jobId: job.id });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 app.post("/v0/crawlWebsitePreview", async (req, res) => {
  try {
    const { success, team_id, error, status } = await authenticateUser(req, res, "scrape");
    if (!success) {
      return res.status(status).json({ error });
    } 
    // authenticate on supabase
    const url = req.body.url;
    if (!url) {
      return res.status(400).json({ error: "Url is required" });
    }
    const mode = req.body.mode ?? "crawl";
    const crawlerOptions = req.body.crawlerOptions ?? {};
    const pageOptions = req.body.pageOptions ?? { onlyMainContent: false };
    const job = await addWebScraperJob({
      url: url,
      mode: mode ?? "crawl", // fix for single urls not working
      crawlerOptions: { ...crawlerOptions, limit: 5, maxCrawledLinks: 5 },
      team_id: "preview",
      pageOptions: pageOptions,
    });
    res.json({ jobId: job.id });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 app.get("/v0/crawl/status/:jobId", async (req, res) => {
  try {
    const { success, team_id, error, status } = await authenticateUser(req, res, "scrape");
    if (!success) {
      return res.status(status).json({ error });
    }
    const job = await getWebScraperQueue().getJob(req.params.jobId);
    if (!job) {
      return res.status(404).json({ error: "Job not found" });
    }
    const { current, current_url, total, current_step } = await job.progress();
    res.json({
      status: await job.getState(),
      // progress: job.progress(),
      current: current,
      current_url: current_url,
      current_step: current_step,
      total: total,
      data: job.returnvalue,
    });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 app.get("/v0/checkJobStatus/:jobId", async (req, res) => {
  try {
    const job = await getWebScraperQueue().getJob(req.params.jobId);
    if (!job) {
      return res.status(404).json({ error: "Job not found" });
    }
    const { current, current_url, total, current_step } = await job.progress();
    res.json({
      status: await job.getState(),
      // progress: job.progress(),
      current: current,
      current_url: current_url,
      current_step: current_step,
      total: total,
      data: job.returnvalue,
    });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 const DEFAULT_PORT = process.env.PORT ?? 3002;
 const HOST = process.env.HOST ?? "localhost";
 redisClient.connect();
 export function startServer(port = DEFAULT_PORT) {
  const server = app.listen(Number(port), HOST, () => {
    console.log(`Server listening on port ${port}`);
-    console.log(`For the UI, open http://${HOST}:${port}/admin/${process.env.BULL_AUTH_KEY}/queues`);
+    console.log(
      `For the UI, open http://${HOST}:${port}/admin/${process.env.BULL_AUTH_KEY}/queues`
    );
    console.log("");
    console.log("1. Make sure Redis is running on port 6379 by default");
    console.log(
@ -348,7 +87,77 @@ app.get(`/admin/${process.env.BULL_AUTH_KEY}/queues`, async (req, res) => {
  }
 });
 app.get(`/serverHealthCheck`, async (req, res) => {
  try {
    const webScraperQueue = getWebScraperQueue();
    const [waitingJobs] = await Promise.all([
      webScraperQueue.getWaitingCount(),
    ]);
    const noWaitingJobs = waitingJobs === 0;
    // 200 if no active jobs, 503 if there are active jobs
    return res.status(noWaitingJobs ? 200 : 500).json({
      waitingJobs,
    });
  } catch (error) {
    console.error(error);
    return res.status(500).json({ error: error.message });
  }
 });
 app.get('/serverHealthCheck/notify', async (req, res) => {
  if (process.env.SLACK_WEBHOOK_URL) {
    const treshold = 1; // The treshold value for the active jobs
    const timeout = 60000; // 1 minute // The timeout value for the check in milliseconds
    const getWaitingJobsCount = async () => {
      const webScraperQueue = getWebScraperQueue();
      const [waitingJobsCount] = await Promise.all([
        webScraperQueue.getWaitingCount(),
      ]);
      return waitingJobsCount;
    };
    res.status(200).json({ message: "Check initiated" });
    const checkWaitingJobs = async () => {
      try {
        let waitingJobsCount = await getWaitingJobsCount();
        if (waitingJobsCount >= treshold) {
          setTimeout(async () => {
            // Re-check the waiting jobs count after the timeout
            waitingJobsCount = await getWaitingJobsCount(); 
            if (waitingJobsCount >= treshold) {
              const slackWebhookUrl = process.env.SLACK_WEBHOOK_URL;
              const message = {
                text: `⚠️ Warning: The number of active jobs (${waitingJobsCount}) has exceeded the threshold (${treshold}) for more than ${timeout/60000} minute(s).`,
              };
              const response = await fetch(slackWebhookUrl, {
                method: 'POST',
                headers: {
                  'Content-Type': 'application/json',
                },
                body: JSON.stringify(message),
              })
              if (!response.ok) {
                console.error('Failed to send Slack notification')
              }
            }
          }, timeout);
        }
      } catch (error) {
        console.error(error);
      }
    };
    checkWaitingJobs();
  }
 });
 app.get("/is-production", (req, res) => {
  res.send({ isProduction: global.isProduction });
 });
--- a/apps/api/src/lib/entities.ts
+++ b/apps/api/src/lib/entities.ts
@ -11,7 +11,20 @@ export interface Progress {
 export type PageOptions = {
  onlyMainContent?: boolean;
  fallback?: boolean;
  fetchPageContent?: boolean;
 };
 export type SearchOptions = {
  limit?: number;
  tbs?: string;
  filter?: string;
  lang?: string;
  country?: string;
  location?: string;
 };
 export type WebScraperOptions = {
  urls: string[];
  mode: "single_urls" | "sitemap" | "crawl";
@ -28,8 +41,13 @@ export type WebScraperOptions = {
  concurrentRequests?: number;
 };
 export interface DocumentUrl {
  url: string;
 }
 export class Document {
  id?: string;
  url?: string; // Used only in /search for now
  content: string;
  markdown?: string;
  createdAt?: Date;
@ -56,3 +74,20 @@ export class Document {
    this.provider = data.provider || undefined;
  }
 }
 export class SearchResult {
  url: string;
  title: string;
  description: string;
  constructor(url: string, title: string, description: string) {
      this.url = url;
      this.title = title;
      this.description = description;
  }
  toString(): string {
      return `SearchResult(url=${this.url}, title=${this.title}, description=${this.description})`;
  }
 }
--- a/apps/api/src/lib/html-to-markdown.ts
+++ b/apps/api/src/lib/html-to-markdown.ts
@ -1,6 +1,8 @@
 export function parseMarkdown(html: string) {
  var TurndownService = require("turndown");
-  var turndownPluginGfm = require("turndown-plugin-gfm");
+  var turndownPluginGfm = require('joplin-turndown-plugin-gfm')
  const turndownService = new TurndownService();
  turndownService.addRule("inlineLink", {
--- a/apps/api/src/lib/withAuth.ts
+++ b/apps/api/src/lib/withAuth.ts
@ -0,0 +1,24 @@
 import { AuthResponse } from "../../src/types";
 let warningCount = 0;
 export function withAuth<T extends AuthResponse, U extends any[]>(
  originalFunction: (...args: U) => Promise<T>
 ) {
  return async function (...args: U): Promise<T> {
    if (process.env.USE_DB_AUTHENTICATION === "false") {
      if (warningCount < 5) {
        console.warn("WARNING - You're bypassing authentication");
        warningCount++;
      }
      return { success: true } as T;
    } else {
      try {
        return await originalFunction(...args);
      } catch (error) {
        console.error("Error in withAuth function: ", error);
        return { success: false, error: error.message } as T;
      }
    }
  };
 }
--- a/apps/api/src/main/runWebScraper.ts
+++ b/apps/api/src/main/runWebScraper.ts
@ -1,8 +1,9 @@
 import { Job } from "bull";
 import { CrawlResult, WebScraperOptions } from "../types";
 import { WebScraperDataProvider } from "../scraper/WebScraper";
-import { Progress } from "../lib/entities";
+import { DocumentUrl, Progress } from "../lib/entities";
 import { billTeam } from "../services/billing/credit_billing";
 import { Document } from "../lib/entities";
 export async function startWebScraperPipeline({
  job,
@ -24,7 +25,7 @@ export async function startWebScraperPipeline({
      job.moveToFailed(error);
    },
    team_id: job.data.team_id,
-  })) as { success: boolean; message: string; docs: CrawlResult[] };
+  })) as { success: boolean; message: string; docs: Document[] };
 }
 export async function runWebScraper({
  url,
@ -44,7 +45,11 @@ export async function runWebScraper({
  onSuccess: (result: any) => void;
  onError: (error: any) => void;
  team_id: string;
-}): Promise<{ success: boolean; message: string; docs: CrawlResult[] }> {
+}): Promise<{
  success: boolean;
  message: string;
  docs: Document[] | DocumentUrl[];
 }> {
  try {
    const provider = new WebScraperDataProvider();
    if (mode === "crawl") {
@ -64,34 +69,45 @@ export async function runWebScraper({
    }
    const docs = (await provider.getDocuments(false, (progress: Progress) => {
      inProgress(progress);
-    })) as CrawlResult[];
+    })) as Document[];
    if (docs.length === 0) {
      return {
        success: true,
        message: "No pages found",
-        docs: [],
+        docs: []
      };
    }
    // remove docs with empty content
-    const filteredDocs = docs.filter((doc) => doc.content.trim().length > 0);
+    const filteredDocs = crawlerOptions.returnOnlyUrls
-    onSuccess(filteredDocs);
+      ? docs.map((doc) => {
          if (doc.metadata.sourceURL) {
            return { url: doc.metadata.sourceURL };
          }
        })
      : docs.filter((doc) => doc.content.trim().length > 0);
-    const { success, credit_usage } = await billTeam(
+
    const billingResult = await billTeam(
      team_id,
      filteredDocs.length
    );
-    if (!success) {
+
    if (!billingResult.success) {
      // throw new Error("Failed to bill team, no subscription was found");
      return {
        success: false,
        message: "Failed to bill team, no subscription was found",
-        docs: [],
+        docs: []
      };
    }
-    return { success: true, message: "", docs: filteredDocs as CrawlResult[] };
+    // This is where the returnvalue from the job is set
    onSuccess(filteredDocs);
    // this return doesn't matter too much for the job completion result
    return { success: true, message: "", docs: filteredDocs };
  } catch (error) {
    console.error("Error running web scraper", error);
    onError(error);
--- a/apps/api/src/routes/v0.ts
+++ b/apps/api/src/routes/v0.ts
@ -0,0 +1,19 @@
 import express from "express";
 import { crawlController } from "../../src/controllers/crawl";
 import { crawlStatusController } from "../../src/controllers/crawl-status";
 import { scrapeController } from "../../src/controllers/scrape";
 import { crawlPreviewController } from "../../src/controllers/crawlPreview";
 import { crawlJobStatusPreviewController } from "../../src/controllers/status";
 import { searchController } from "../../src/controllers/search";
 export const v0Router = express.Router();
 v0Router.post("/v0/scrape", scrapeController);
 v0Router.post("/v0/crawl", crawlController);
 v0Router.post("/v0/crawlWebsitePreview", crawlPreviewController);
 v0Router.get("/v0/crawl/status/:jobId", crawlStatusController);
 v0Router.get("/v0/checkJobStatus/:jobId", crawlJobStatusPreviewController);
 // Search routes
 v0Router.post("/v0/search", searchController);
--- a/apps/api/src/scraper/WebScraper/index.ts
+++ b/apps/api/src/scraper/WebScraper/index.ts
@ -4,12 +4,9 @@ import { scrapSingleUrl } from "./single_url";
 import { SitemapEntry, fetchSitemapData, getLinksFromSitemap } from "./sitemap";
 import { WebCrawler } from "./crawler";
 import { getValue, setValue } from "../../services/redis";
-import { getImageDescription } from "./utils/gptVision";
+import { getImageDescription } from "./utils/imageDescription";
 import { fetchAndProcessPdf, isUrlAPdf } from "./utils/pdfProcessor";
-import {
+import { replaceImgPathsWithAbsolutePaths, replacePathsWithAbsolutePaths } from "./utils/replacePaths";
  replaceImgPathsWithAbsolutePaths,
  replacePathsWithAbsolutePaths,
 } from "./utils/replacePaths";
 export class WebScraperDataProvider {
  private urls: string[] = [""];
@ -23,6 +20,7 @@ export class WebScraperDataProvider {
  private generateImgAltText: boolean = false;
  private pageOptions?: PageOptions;
  private replaceAllPathsWithAbsolutePaths?: boolean = false;
  private generateImgAltTextModel: "gpt-4-turbo" | "claude-3-opus" = "gpt-4-turbo";
  authorize(): void {
    throw new Error("Method not implemented.");
@ -64,6 +62,7 @@ export class WebScraperDataProvider {
    useCaching: boolean = false,
    inProgress?: (progress: Progress) => void
  ): Promise<Document[]> {
    if (this.urls[0].trim() === "") {
      throw new Error("Url is required");
    }
@ -80,11 +79,16 @@ export class WebScraperDataProvider {
        });
        let links = await crawler.start(inProgress, 5, this.limit);
        if (this.returnOnlyUrls) {
          inProgress({
            current: links.length,
            total: links.length,
            status: "COMPLETED",
            currentDocumentUrl: this.urls[0],
          });
          return links.map((url) => ({
            content: "",
            markdown: "",
            metadata: { sourceURL: url },
            provider: "web",
            type: "text",
          }));
        }
@ -466,7 +470,7 @@ export class WebScraperDataProvider {
                imageUrl,
                backText,
                frontText
-              );
+              , this.generateImgAltTextModel);
            }
            document.content = document.content.replace(
--- a/apps/api/src/scraper/WebScraper/single_url.ts
+++ b/apps/api/src/scraper/WebScraper/single_url.ts
@ -4,12 +4,34 @@ import { extractMetadata } from "./utils/metadata";
 import dotenv from "dotenv";
 import { Document, PageOptions } from "../../lib/entities";
 import { parseMarkdown } from "../../lib/html-to-markdown";
 import { parseTablesToMarkdown } from "./utils/parseTable";
 import { excludeNonMainTags } from "./utils/excludeTags";
-// import puppeteer from "puppeteer";
+import { urlSpecificParams } from "./utils/custom/website_params";
 dotenv.config();
 export async function generateRequestParams(
  url: string,
  wait_browser: string = "domcontentloaded",
  timeout: number = 15000
 ): Promise<any> {
  const defaultParams = {
    url: url,
    params: { timeout: timeout, wait_browser: wait_browser },
    headers: { "ScrapingService-Request": "TRUE" },
  };
  try {
    const urlKey = new URL(url).hostname;
    if (urlSpecificParams.hasOwnProperty(urlKey)) {
      return { ...defaultParams, ...urlSpecificParams[urlKey] };
    } else {
      return defaultParams;
    }
  } catch (error) {
    console.error(`Error generating URL key: ${error}`);
    return defaultParams;
  }
 }
 export async function scrapWithCustomFirecrawl(
  url: string,
  options?: any
@ -25,15 +47,18 @@ export async function scrapWithCustomFirecrawl(
 export async function scrapWithScrapingBee(
  url: string,
-  wait_browser: string = "domcontentloaded"
+  wait_browser: string = "domcontentloaded",
  timeout: number = 15000
 ): Promise<string> {
  try {
    const client = new ScrapingBeeClient(process.env.SCRAPING_BEE_API_KEY);
-    const response = await client.get({
+    const clientParams = await generateRequestParams(
-      url: url,
+      url,
-      params: { timeout: 15000, wait_browser: wait_browser },
+      wait_browser,
-      headers: { "ScrapingService-Request": "TRUE" },
+      timeout
-    });
+    );
    const response = await client.get(clientParams);
    if (response.status !== 200 && response.status !== 404) {
      console.error(
@ -112,7 +137,11 @@ export async function scrapSingleUrl(
        break;
      case "scrapingBee":
        if (process.env.SCRAPING_BEE_API_KEY) {
-          text = await scrapWithScrapingBee(url);
+          text = await scrapWithScrapingBee(
            url,
            "domcontentloaded",
            pageOptions.fallback === false ? 7000 : 15000
          );
        }
        break;
      case "playwright":
@ -142,6 +171,7 @@ export async function scrapSingleUrl(
        break;
    }
    let cleanedHtml = removeUnwantedElements(text, pageOptions);
    return [await parseMarkdown(cleanedHtml), text];
  };
@ -154,6 +184,17 @@ export async function scrapSingleUrl(
    // }
    let [text, html] = await attemptScraping(urlToScrap, "scrapingBee");
    // Basically means that it is using /search endpoint
    if (pageOptions.fallback === false) {
      const soup = cheerio.load(html);
      const metadata = extractMetadata(soup, urlToScrap);
      return {
        url: urlToScrap,
        content: text,
        markdown: text,
        metadata: { ...metadata, sourceURL: urlToScrap },
      } as Document;
    }
    if (!text || text.length < 100) {
      console.log("Falling back to playwright");
      [text, html] = await attemptScraping(urlToScrap, "playwright");
--- a/apps/api/src/scraper/WebScraper/utils/blocklist.ts
+++ b/apps/api/src/scraper/WebScraper/utils/blocklist.ts
@ -0,0 +1,27 @@
 const socialMediaBlocklist = [
  'facebook.com',
  'twitter.com',
  'instagram.com',
  'linkedin.com',
  'pinterest.com',
  'snapchat.com',
  'tiktok.com',
  'reddit.com',
  'tumblr.com',
  'flickr.com',
  'whatsapp.com',
  'wechat.com',
  'telegram.org',
 ];
 const allowedUrls = [
  'linkedin.com/pulse'
 ];
 export function isUrlBlocked(url: string): boolean {
  if (allowedUrls.some(allowedUrl => url.includes(allowedUrl))) {
    return false;
  }
  return socialMediaBlocklist.some(domain => url.includes(domain));
 }
--- a/apps/api/src/scraper/WebScraper/utils/custom/website_params.ts
+++ b/apps/api/src/scraper/WebScraper/utils/custom/website_params.ts
@ -0,0 +1,42 @@
 export const urlSpecificParams = {
  "platform.openai.com": {
    params: {
      wait_browser: "networkidle2",
      block_resources: false,
    },
    headers: {
      "User-Agent":
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
      "sec-fetch-site": "same-origin",
      "sec-fetch-mode": "cors",
      "sec-fetch-dest": "empty",
      referer: "https://www.google.com/",
      "accept-language": "en-US,en;q=0.9",
      "accept-encoding": "gzip, deflate, br",
      accept:
        "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    },
    cookies: {
      __cf_bm:
        "mC1On8P2GWT3A5UeSYH6z_MP94xcTAdZ5jfNi9IT2U0-1714327136-1.0.1.1-ILAP5pSX_Oo9PPo2iHEYCYX.p9a0yRBNLr58GHyrzYNDJ537xYpG50MXxUYVdfrD.h3FV5O7oMlRKGA0scbxaQ",
    },
  },
  "support.greenpay.me":{
    params: {
        wait_browser: "networkidle2",
        block_resources: false,
      },
      headers: {
        "User-Agent":
          "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "sec-fetch-site": "same-origin",
        "sec-fetch-mode": "cors",
        "sec-fetch-dest": "empty",
        referer: "https://www.google.com/",
        "accept-language": "en-US,en;q=0.9",
        "accept-encoding": "gzip, deflate, br",
        accept:
          "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
      },
  }
 };
--- a/apps/api/src/scraper/WebScraper/utils/gptVision.ts
+++ b/apps/api/src/scraper/WebScraper/utils/gptVision.ts
@ -1,41 +0,0 @@
 export async function getImageDescription(
  imageUrl: string,
  backText: string,
  frontText: string
 ): Promise<string> {
  const { OpenAI } = require("openai");
  const openai = new OpenAI();
  try {
    const response = await openai.chat.completions.create({
      model: "gpt-4-turbo",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "text",
              text:
                "What's in the image? You need to answer with the content for the alt tag of the image. To help you with the context, the image is in the following text: " +
                backText +
                " and the following text: " +
                frontText +
                ". Be super concise.",
            },
            {
              type: "image_url",
              image_url: {
                url: imageUrl,
              },
            },
          ],
        },
      ],
    });
    return response.choices[0].message.content;
  } catch (error) {
    console.error("Error generating image alt text:", error?.message);
    return "";
  }
 }
--- a/apps/api/src/scraper/WebScraper/utils/imageDescription.ts
+++ b/apps/api/src/scraper/WebScraper/utils/imageDescription.ts
@ -0,0 +1,88 @@
 import Anthropic from '@anthropic-ai/sdk';
 import axios from 'axios';
 export async function getImageDescription(
  imageUrl: string,
  backText: string,
  frontText: string,
  model: string = "gpt-4-turbo"
 ): Promise<string> {
  try {
    const prompt = "What's in the image? You need to answer with the content for the alt tag of the image. To help you with the context, the image is in the following text: " +
      backText +
      " and the following text: " +
      frontText +
      ". Be super concise."
    switch (model) {
      case 'claude-3-opus': {
        if (!process.env.ANTHROPIC_API_KEY) {
          throw new Error("No Anthropic API key provided");
        }
        const imageRequest = await axios.get(imageUrl, { responseType: 'arraybuffer' });
        const imageMediaType = 'image/png';
        const imageData = Buffer.from(imageRequest.data, 'binary').toString('base64');
        const anthropic = new Anthropic();
        const response = await anthropic.messages.create({
          model: "claude-3-opus-20240229",
          max_tokens: 1024,
          messages: [
            {
              role: "user",
              content: [
                {
                  type: "image",
                  source: {
                    type: "base64",
                    media_type: imageMediaType,
                    data: imageData,
                  },
                },
                {
                  type: "text",
                  text: prompt
                }
              ],
            }
          ]
        });
        return response.content[0].text;
      }
      default: {
        if (!process.env.OPENAI_API_KEY) {
          throw new Error("No OpenAI API key provided");
        }
        const { OpenAI } = require("openai");
        const openai = new OpenAI();
        const response = await openai.chat.completions.create({
          model: "gpt-4-turbo",
          messages: [
            {
              role: "user",
              content: [
                {
                  type: "text",
                  text: prompt,
                },
                {
                  type: "image_url",
                  image_url: {
                    url: imageUrl,
                  },
                },
              ],
            },
          ],
        });
        return response.choices[0].message.content;
      }
    }
  } catch (error) {
    console.error("Error generating image alt text:", error?.message);
    return "";
  }
 }
--- a/apps/api/src/scraper/WebScraper/utils/metadata.ts
+++ b/apps/api/src/scraper/WebScraper/utils/metadata.ts
@ -1,4 +1,3 @@
 // import * as cheerio from 'cheerio';
 import { CheerioAPI } from "cheerio";
 interface Metadata {
  title?: string;
@ -8,6 +7,14 @@ interface Metadata {
  robots?: string;
  ogTitle?: string;
  ogDescription?: string;
  ogUrl?: string;
  ogImage?: string;
  ogAudio?: string;
  ogDeterminer?: string;
  ogLocale?: string;
  ogLocaleAlternate?: string[];
  ogSiteName?: string;
  ogVideo?: string;
  dctermsCreated?: string;
  dcDateCreated?: string;
  dcDate?: string;
@ -17,7 +24,6 @@ interface Metadata {
  dctermsSubject?: string;
  dcSubject?: string;
  dcDescription?: string;
  ogImage?: string;
  dctermsKeywords?: string;
  modifiedTime?: string;
  publishedTime?: string;
@ -33,6 +39,14 @@ export function extractMetadata(soup: CheerioAPI, url: string): Metadata {
  let robots: string | null = null;
  let ogTitle: string | null = null;
  let ogDescription: string | null = null;
  let ogUrl: string | null = null;
  let ogImage: string | null = null;
  let ogAudio: string | null = null;
  let ogDeterminer: string | null = null;
  let ogLocale: string | null = null;
  let ogLocaleAlternate: string[] | null = null;
  let ogSiteName: string | null = null;
  let ogVideo: string | null = null;
  let dctermsCreated: string | null = null;
  let dcDateCreated: string | null = null;
  let dcDate: string | null = null;
@ -42,7 +56,6 @@ export function extractMetadata(soup: CheerioAPI, url: string): Metadata {
  let dctermsSubject: string | null = null;
  let dcSubject: string | null = null;
  let dcDescription: string | null = null;
  let ogImage: string | null = null;
  let dctermsKeywords: string | null = null;
  let modifiedTime: string | null = null;
  let publishedTime: string | null = null;
@ -62,11 +75,18 @@ export function extractMetadata(soup: CheerioAPI, url: string): Metadata {
    robots = soup('meta[name="robots"]').attr("content") || null;
    ogTitle = soup('meta[property="og:title"]').attr("content") || null;
    ogDescription = soup('meta[property="og:description"]').attr("content") || null;
    ogUrl = soup('meta[property="og:url"]').attr("content") || null;
    ogImage = soup('meta[property="og:image"]').attr("content") || null;
    ogAudio = soup('meta[property="og:audio"]').attr("content") || null;
    ogDeterminer = soup('meta[property="og:determiner"]').attr("content") || null;
    ogLocale = soup('meta[property="og:locale"]').attr("content") || null;
    ogLocaleAlternate = soup('meta[property="og:locale:alternate"]').map((i, el) => soup(el).attr("content")).get() || null;
    ogSiteName = soup('meta[property="og:site_name"]').attr("content") || null;
    ogVideo = soup('meta[property="og:video"]').attr("content") || null;
    articleSection = soup('meta[name="article:section"]').attr("content") || null;
    articleTag = soup('meta[name="article:tag"]').attr("content") || null;
    publishedTime = soup('meta[property="article:published_time"]').attr("content") || null;
    modifiedTime = soup('meta[property="article:modified_time"]').attr("content") || null;
    ogImage = soup('meta[property="og:image"]').attr("content") || null;
    dctermsKeywords = soup('meta[name="dcterms.keywords"]').attr("content") || null;
    dcDescription = soup('meta[name="dc.description"]').attr("content") || null;
    dcSubject = soup('meta[name="dc.subject"]').attr("content") || null;
@ -90,6 +110,14 @@ export function extractMetadata(soup: CheerioAPI, url: string): Metadata {
    ...(robots ? { robots } : {}),
    ...(ogTitle ? { ogTitle } : {}),
    ...(ogDescription ? { ogDescription } : {}),
    ...(ogUrl ? { ogUrl } : {}),
    ...(ogImage ? { ogImage } : {}),
    ...(ogAudio ? { ogAudio } : {}),
    ...(ogDeterminer ? { ogDeterminer } : {}),
    ...(ogLocale ? { ogLocale } : {}),
    ...(ogLocaleAlternate ? { ogLocaleAlternate } : {}),
    ...(ogSiteName ? { ogSiteName } : {}),
    ...(ogVideo ? { ogVideo } : {}),
    ...(dctermsCreated ? { dctermsCreated } : {}),
    ...(dcDateCreated ? { dcDateCreated } : {}),
    ...(dcDate ? { dcDate } : {}),
@ -99,7 +127,6 @@ export function extractMetadata(soup: CheerioAPI, url: string): Metadata {
    ...(dctermsSubject ? { dctermsSubject } : {}),
    ...(dcSubject ? { dcSubject } : {}),
    ...(dcDescription ? { dcDescription } : {}),
    ...(ogImage ? { ogImage } : {}),
    ...(dctermsKeywords ? { dctermsKeywords } : {}),
    ...(modifiedTime ? { modifiedTime } : {}),
    ...(publishedTime ? { publishedTime } : {}),
--- a/apps/api/src/search/googlesearch.ts
+++ b/apps/api/src/search/googlesearch.ts
@ -0,0 +1,115 @@
 import axios from 'axios';
 import * as cheerio from 'cheerio';
 import * as querystring from 'querystring';
 import { SearchResult } from '../../src/lib/entities';
 const _useragent_list = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.62',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0'
 ];
 function get_useragent(): string {
    return _useragent_list[Math.floor(Math.random() * _useragent_list.length)];
 }
 async function _req(term: string, results: number, lang: string, country: string, start: number, proxies: any, timeout: number, tbs: string = null, filter: string = null) {
    const params = {
        "q": term,
        "num": results,  // Number of results to return
        "hl": lang,
        "gl": country,
        "start": start,
    };
    if (tbs) {
        params["tbs"] = tbs;
    }
    if (filter) {
        params["filter"] = filter;
    }
    try {
        const resp = await axios.get("https://www.google.com/search", {
            headers: {
                "User-Agent": get_useragent()
            },
            params: params,
            proxy: proxies,
            timeout: timeout,
        });
        return resp;
    } catch (error) {
        if (error.response && error.response.status === 429) {
            throw new Error('Google Search: Too many requests, try again later.');
        }
        throw error;
    }
 }
 export async function google_search(term: string, advanced = false, num_results = 7, tbs = null, filter = null, lang = "en", country = "us", proxy = null, sleep_interval = 0, timeout = 5000, ) :Promise<SearchResult[]> {
    const escaped_term = querystring.escape(term);
    let proxies = null;
    if (proxy) {
        if (proxy.startsWith("https")) {
            proxies = {"https": proxy};
        } else {
            proxies = {"http": proxy};
        }
    }
    // TODO: knowledge graph, answer box, etc.
    let start = 0;
    let results : SearchResult[] = [];
    let attempts = 0;
    const maxAttempts = 20; // Define a maximum number of attempts to prevent infinite loop
    while (start < num_results && attempts < maxAttempts) {
        try {
            const resp = await _req(escaped_term, num_results - start, lang, country, start, proxies, timeout, tbs, filter);
            const $ = cheerio.load(resp.data);
            const result_block = $("div.g");
            if (result_block.length === 0) {
                start += 1;
                attempts += 1;
            } else {
                attempts = 0; // Reset attempts if we have results
            }
            result_block.each((index, element) => {
                const linkElement = $(element).find("a");
                const link = linkElement && linkElement.attr("href") ? linkElement.attr("href") : null;
                const title = $(element).find("h3");
                const ogImage = $(element).find("img").eq(1).attr("src");
                const description_box = $(element).find("div[style='-webkit-line-clamp:2']");
                const answerBox = $(element).find(".mod").text();
                if (description_box) {
                    const description = description_box.text();
                    if (link && title && description) {
                        start += 1;
                        results.push(new SearchResult(link, title.text(), description));
                    }
                }
            });
            await new Promise(resolve => setTimeout(resolve, sleep_interval * 1000));
        } catch (error) {
            if (error.message === 'Too many requests') {
                console.warn('Too many requests, breaking the loop');
                break;
            }
            throw error;
        }
        if (start === 0) {
            return results;
        }
    }
    if (attempts >= maxAttempts) {
        console.warn('Max attempts reached, breaking the loop');
    }
    return results
 }
--- a/apps/api/src/search/index.ts
+++ b/apps/api/src/search/index.ts
@ -0,0 +1,54 @@
 import { SearchResult } from "../../src/lib/entities";
 import { google_search } from "./googlesearch";
 import { serper_search } from "./serper";
 export async function search({
  query,
  advanced = false,
  num_results = 7,
  tbs = null,
  filter = null,
  lang = "en",
  country = "us",
  location = undefined,
  proxy = null,
  sleep_interval = 0,
  timeout = 5000,
 }: {
  query: string;
  advanced?: boolean;
  num_results?: number;
  tbs?: string;
  filter?: string;
  lang?: string;
  country?: string;
  location?: string;
  proxy?: string;
  sleep_interval?: number;
  timeout?: number;
 }) : Promise<SearchResult[]> {
  try {
    if (process.env.SERPER_API_KEY ) {
      return await serper_search(query, {num_results, tbs, filter, lang, country, location});
    }
    return await google_search(
      query,
      advanced,
      num_results,
      tbs,
      filter,
      lang,
      country,
      proxy,
      sleep_interval,
      timeout
    );
  } catch (error) {
    console.error("Error in search function: ", error);
    return []
  }
  // if process.env.SERPER_API_KEY is set, use serper
 }
--- a/apps/api/src/search/serper.ts
+++ b/apps/api/src/search/serper.ts
@ -0,0 +1,45 @@
 import axios from "axios";
 import dotenv from "dotenv";
 import { SearchResult } from "../../src/lib/entities";
 dotenv.config();
 export async function serper_search(q, options: {
    tbs?: string;
    filter?: string;
    lang?: string;
    country?: string;
    location?: string;
    num_results: number;
    page?: number;
 }): Promise<SearchResult[]> {
  let data = JSON.stringify({
    q: q,
    hl: options.lang,
    gl: options.country,
    location: options.location,
    tbs: options.tbs,
    num: options.num_results,
    page: options.page ?? 1,
  });
  let config = {
    method: "POST",
    url: "https://google.serper.dev/search",
    headers: {
      "X-API-KEY": process.env.SERPER_API_KEY,
      "Content-Type": "application/json",
    },
    data: data,
  };
  const response = await axios(config);
  if (response && response.data && Array.isArray(response.data.organic)) {
    return response.data.organic.map((a) => ({
      url: a.link,
      title: a.title,
      description: a.snippet,
    }));
  }else{
    return [];
  }
 }
--- a/apps/api/src/services/billing/credit_billing.ts
+++ b/apps/api/src/services/billing/credit_billing.ts
@ -1,7 +1,12 @@
 import { withAuth } from "../../lib/withAuth";
 import { supabase_service } from "../supabase";
-const FREE_CREDITS = 100;
+const FREE_CREDITS = 300;
 export async function billTeam(team_id: string, credits: number) {
  return withAuth(supaBillTeam)(team_id, credits);
 }
 export async function supaBillTeam(team_id: string, credits: number) {
  if (team_id === "preview") {
    return { success: true, message: "Preview team, no credits used" };
  }
@ -13,7 +18,6 @@ export async function billTeam(team_id: string, credits: number) {
  // created_at: The timestamp of the API usage.
  // 1. get the subscription
  const { data: subscription } = await supabase_service
    .from("subscriptions")
    .select("*")
@ -21,52 +25,162 @@ export async function billTeam(team_id: string, credits: number) {
    .eq("status", "active")
    .single();
  // 2. Check for available coupons
  const { data: coupons } = await supabase_service
    .from("coupons")
    .select("id, credits")
    .eq("team_id", team_id)
    .eq("status", "active");
  let couponCredits = 0;
  if (coupons && coupons.length > 0) {
    couponCredits = coupons.reduce((total, coupon) => total + coupon.credits, 0);
  }
  let sortedCoupons = coupons.sort((a, b) => b.credits - a.credits);
  // using coupon credits:
  if (couponCredits > 0) {
    // if there is no subscription and they have enough coupon credits
    if (!subscription) {
-    const { data: credit_usage } = await supabase_service
+      // using only coupon credits:
-      .from("credit_usage")
+      // if there are enough coupon credits
-      .insert([
+      if (couponCredits >= credits) {
-        {
+        // remove credits from coupon credits
-          team_id,
+        let usedCredits = credits;
-          credits_used: credits,
+        while (usedCredits > 0) {
-          created_at: new Date(),
+          // update coupons
-        },
+          if (sortedCoupons[0].credits < usedCredits) {
-      ])
+            usedCredits = usedCredits - sortedCoupons[0].credits;
-      .select();
+            // update coupon credits
            await supabase_service
            .from("coupons")
            .update({
              credits: 0
            })
            .eq("id", sortedCoupons[0].id);
            sortedCoupons.shift();
-    return { success: true, credit_usage };
+          } else {
            // update coupon credits
            await supabase_service
            .from("coupons")
            .update({
              credits: sortedCoupons[0].credits - usedCredits
            })
            .eq("id", sortedCoupons[0].id);
            usedCredits = 0;
          }
        }
-  // 2. add the credits to the credits_usage
+        return await createCreditUsage({ team_id, credits: 0 });
  const { data: credit_usage } = await supabase_service
    .from("credit_usage")
    .insert([
      {
        team_id,
        subscription_id: subscription.id,
        credits_used: credits,
        created_at: new Date(),
      },
    ])
    .select();
-  return { success: true, credit_usage };
+      // not enough coupon credits and no subscription
      } else {
        // update coupon credits
        const usedCredits = credits - couponCredits;
        for (let i = 0; i < sortedCoupons.length; i++) {
          await supabase_service
            .from("coupons")
            .update({
              credits: 0
            })
            .eq("id", sortedCoupons[i].id);
        }
        return await createCreditUsage({ team_id, credits: usedCredits });
      }
    }
    // with subscription
    // using coupon + subscription credits:
    if (credits > couponCredits) {
      // update coupon credits
      for (let i = 0; i < sortedCoupons.length; i++) {
        await supabase_service
          .from("coupons")
          .update({
            credits: 0
          })
          .eq("id", sortedCoupons[i].id);
      }
      const usedCredits = credits - couponCredits;
      return await createCreditUsage({ team_id, subscription_id: subscription.id, credits: usedCredits });
    } else { // using only coupon credits
      let usedCredits = credits;
      while (usedCredits > 0) {
        // update coupons
        if (sortedCoupons[0].credits < usedCredits) {
          usedCredits = usedCredits - sortedCoupons[0].credits;
          // update coupon credits
          await supabase_service
          .from("coupons")
          .update({
            credits: 0
          })
          .eq("id", sortedCoupons[0].id);
          sortedCoupons.shift();
        } else {
          // update coupon credits
          await supabase_service
          .from("coupons")
          .update({
            credits: sortedCoupons[0].credits - usedCredits
          })
          .eq("id", sortedCoupons[0].id);
          usedCredits = 0;
        }
      }
      return await createCreditUsage({ team_id, subscription_id: subscription.id, credits: 0 });
    }
  }
  // not using coupon credits
  if (!subscription) {
    return await createCreditUsage({ team_id, credits });
  }
  return await createCreditUsage({ team_id, subscription_id: subscription.id, credits });
 }
 // if team has enough credits for the operation, return true, else return false
 export async function checkTeamCredits(team_id: string, credits: number) {
  return withAuth(supaCheckTeamCredits)(team_id, credits);
 }
 // if team has enough credits for the operation, return true, else return false
 export async function supaCheckTeamCredits(team_id: string, credits: number) {
  if (team_id === "preview") {
    return { success: true, message: "Preview team, no credits used" };
  }
-  // 1. Retrieve the team's active subscription based on the team_id.
+
-  const { data: subscription, error: subscriptionError } =
+  // Retrieve the team's active subscription
-    await supabase_service
+  const { data: subscription, error: subscriptionError } = await supabase_service
    .from("subscriptions")
    .select("id, price_id, current_period_start, current_period_end")
    .eq("team_id", team_id)
    .eq("status", "active")
    .single();
  // Check for available coupons
  const { data: coupons } = await supabase_service
    .from("coupons")
    .select("credits")
    .eq("team_id", team_id)
    .eq("status", "active");
  let couponCredits = 0;
  if (coupons && coupons.length > 0) {
    couponCredits = coupons.reduce((total, coupon) => total + coupon.credits, 0);
  }
  // Free credits, no coupons
  if (subscriptionError || !subscription) {
    // If there is no active subscription but there are available coupons
    if (couponCredits >= credits) {
      return { success: true, message: "Sufficient credits available" };
    }
    const { data: creditUsages, error: creditUsageError } =
      await supabase_service
        .from("credit_usage")
@ -98,20 +212,7 @@ export async function checkTeamCredits(team_id: string, credits: number) {
    return { success: true, message: "Sufficient credits available" };
  }
-  // 2. Get the price_id from the subscription.
+  // Calculate the total credits used by the team within the current billing period
  const { data: price, error: priceError } = await supabase_service
    .from("prices")
    .select("credits")
    .eq("id", subscription.price_id)
    .single();
  if (priceError) {
    throw new Error(
      `Failed to retrieve price for price_id: ${subscription.price_id}`
    );
  }
  // 4. Calculate the total credits used by the team within the current billing period.
  const { data: creditUsages, error: creditUsageError } = await supabase_service
    .from("credit_usage")
    .select("credits_used")
@ -120,18 +221,27 @@ export async function checkTeamCredits(team_id: string, credits: number) {
    .lte("created_at", subscription.current_period_end);
  if (creditUsageError) {
-    throw new Error(
+    throw new Error(`Failed to retrieve credit usage for subscription_id: ${subscription.id}`);
      `Failed to retrieve credit usage for subscription_id: ${subscription.id}`
    );
  }
-  const totalCreditsUsed = creditUsages.reduce(
+  const totalCreditsUsed = creditUsages.reduce((acc, usage) => acc + usage.credits_used, 0);
    (acc, usage) => acc + usage.credits_used,
    0
  );
-  // 5. Compare the total credits used with the credits allowed by the plan.
+  // Adjust total credits used by subtracting coupon value
-  if (totalCreditsUsed + credits > price.credits) {
+  const adjustedCreditsUsed = Math.max(0, totalCreditsUsed - couponCredits);
  // Get the price details
  const { data: price, error: priceError } = await supabase_service
    .from("prices")
    .select("credits")
    .eq("id", subscription.price_id)
    .single();
  if (priceError) {
    throw new Error(`Failed to retrieve price for price_id: ${subscription.price_id}`);
  }
  // Compare the adjusted total credits used with the credits allowed by the plan
  if (adjustedCreditsUsed + credits > price.credits) {
    return { success: false, message: "Insufficient credits, please upgrade!" };
  }
@ -150,9 +260,18 @@ export async function countCreditsAndRemainingForCurrentBillingPeriod(
      .eq("team_id", team_id)
      .single();
-  if (subscriptionError || !subscription) {
+  const { data: coupons } = await supabase_service
-    // throw new Error(`Failed to retrieve subscription for team_id: ${team_id}`);
+    .from("coupons")
    .select("credits")
    .eq("team_id", team_id)
    .eq("status", "active");
  let couponCredits = 0;
  if (coupons && coupons.length > 0) {
    couponCredits = coupons.reduce((total, coupon) => total + coupon.credits, 0);
  }
  if (subscriptionError || !subscription) {
    // Free
    const { data: creditUsages, error: creditUsageError } =
      await supabase_service
@ -160,13 +279,9 @@ export async function countCreditsAndRemainingForCurrentBillingPeriod(
        .select("credits_used")
        .is("subscription_id", null)
        .eq("team_id", team_id);
    // .gte("created_at", subscription.current_period_start)
    // .lte("created_at", subscription.current_period_end);
    if (creditUsageError || !creditUsages) {
-      throw new Error(
+      throw new Error(`Failed to retrieve credit usage for team_id: ${team_id}`);
        `Failed to retrieve credit usage for subscription_id: ${subscription.id}`
      );
    }
    const totalCreditsUsed = creditUsages.reduce(
@ -174,26 +289,10 @@ export async function countCreditsAndRemainingForCurrentBillingPeriod(
      0
    );
-    // 4. Calculate remaining credits.
+    const remainingCredits = FREE_CREDITS + couponCredits - totalCreditsUsed;
-    const remainingCredits = FREE_CREDITS - totalCreditsUsed;
+    return { totalCreditsUsed: totalCreditsUsed, remainingCredits, totalCredits: FREE_CREDITS + couponCredits };
    return { totalCreditsUsed, remainingCredits, totalCredits: FREE_CREDITS };
  }
  // 2. Get the price_id from the subscription to retrieve the total credits available.
  const { data: price, error: priceError } = await supabase_service
    .from("prices")
    .select("credits")
    .eq("id", subscription.price_id)
    .single();
  if (priceError || !price) {
    throw new Error(
      `Failed to retrieve price for price_id: ${subscription.price_id}`
    );
  }
  // 3. Calculate the total credits used by the team within the current billing period.
  const { data: creditUsages, error: creditUsageError } = await supabase_service
    .from("credit_usage")
    .select("credits_used")
@ -202,18 +301,42 @@ export async function countCreditsAndRemainingForCurrentBillingPeriod(
    .lte("created_at", subscription.current_period_end);
  if (creditUsageError || !creditUsages) {
-    throw new Error(
+    throw new Error(`Failed to retrieve credit usage for subscription_id: ${subscription.id}`);
      `Failed to retrieve credit usage for subscription_id: ${subscription.id}`
    );
  }
-  const totalCreditsUsed = creditUsages.reduce(
+  const totalCreditsUsed = creditUsages.reduce((acc, usage) => acc + usage.credits_used, 0);
    (acc, usage) => acc + usage.credits_used,
    0
  );
-  // 4. Calculate remaining credits.
+  const { data: price, error: priceError } = await supabase_service
-  const remainingCredits = price.credits - totalCreditsUsed;
+    .from("prices")
    .select("credits")
    .eq("id", subscription.price_id)
    .single();
-  return { totalCreditsUsed, remainingCredits, totalCredits: price.credits };
+  if (priceError || !price) {
    throw new Error(`Failed to retrieve price for price_id: ${subscription.price_id}`);
  }
  const remainingCredits = price.credits + couponCredits - totalCreditsUsed;
  return {
    totalCreditsUsed,
    remainingCredits,
    totalCredits: price.credits
  };
 }
 async function createCreditUsage({ team_id, subscription_id, credits }: { team_id: string, subscription_id?: string, credits: number }) {
  const { data: credit_usage } = await supabase_service
    .from("credit_usage")
    .insert([
      {
        team_id,
        credits_used: credits,
        subscription_id: subscription_id || null,
        created_at: new Date(),
      },
    ])
    .select();
  return { success: true, credit_usage };
 }
--- a/apps/api/src/services/logging/log_job.ts
+++ b/apps/api/src/services/logging/log_job.ts
@ -0,0 +1,34 @@
 import { supabase_service } from "../supabase";
 import { FirecrawlJob } from "../../types";
 import "dotenv/config";
 export async function logJob(job: FirecrawlJob) {
  try {
    // Only log jobs in production
    if (process.env.ENV !== "production") {
      return;
    }
    const { data, error } = await supabase_service
      .from("firecrawl_jobs")
      .insert([
        {
          success: job.success,
          message: job.message,
          num_docs: job.num_docs,
          docs: job.docs,
          time_taken: job.time_taken,
          team_id: job.team_id === "preview" ? null : job.team_id,
          mode: job.mode,
          url: job.url,
          crawler_options: job.crawlerOptions,
          page_options: job.pageOptions,
          origin: job.origin,
        },
      ]);
    if (error) {
      console.error("Error logging job:\n", error);
    }
  } catch (error) {
    console.error("Error logging job:\n", error);
  }
 }
--- a/apps/api/src/services/logtail.ts
+++ b/apps/api/src/services/logtail.ts
@ -1,4 +1,19 @@
-const { Logtail } = require("@logtail/node");
+import { Logtail } from "@logtail/node";
-//dot env
+import "dotenv/config";
-require("dotenv").config();
+
-export const logtail = new Logtail(process.env.LOGTAIL_KEY);
+// A mock Logtail class to handle cases where LOGTAIL_KEY is not provided
 class MockLogtail {
  info(message: string, context?: Record<string, any>): void {
    console.log(message, context);
  }
  error(message: string, context: Record<string, any> = {}): void {
    console.error(message, context);
  }
 }
 // Using the actual Logtail class if LOGTAIL_KEY exists, otherwise using the mock class
 // Additionally, print a warning to the terminal if LOGTAIL_KEY is not provided
 export const logtail = process.env.LOGTAIL_KEY ? new Logtail(process.env.LOGTAIL_KEY) : (() => {
  console.warn("LOGTAIL_KEY is not provided - your events will not be logged. Using MockLogtail as a fallback. see logtail.ts for more.");
  return new MockLogtail();
 })();
--- a/apps/api/src/services/queue-worker.ts
+++ b/apps/api/src/services/queue-worker.ts
@ -4,6 +4,7 @@ import "dotenv/config";
 import { logtail } from "./logtail";
 import { startWebScraperPipeline } from "../main/runWebScraper";
 import { callWebhook } from "./webhook";
 import { logJob } from "./logging/log_job";
 getWebScraperQueue().process(
  Math.floor(Number(process.env.NUM_WORKERS_PER_QUEUE ?? 8)),
@ -15,7 +16,11 @@ getWebScraperQueue().process(
        current_step: "SCRAPING",
        current_url: "",
      });
      const start = Date.now();
      const { success, message, docs } = await startWebScraperPipeline({ job });
      const end = Date.now();
      const timeTakenInSeconds = (end - start) / 1000;
      const data = {
        success: success,
@ -29,6 +34,20 @@ getWebScraperQueue().process(
      };
      await callWebhook(job.data.team_id, data);
      await logJob({
        success: success,
        message: message,
        num_docs: docs.length,
        docs: docs,
        time_taken: timeTakenInSeconds,
        team_id: job.data.team_id,
        mode: "crawl",
        url: job.data.url,
        crawlerOptions: job.data.crawlerOptions,
        pageOptions: job.data.pageOptions,
        origin: job.data.origin,
      });
      done(null, data);
    } catch (error) {
      if (error instanceof CustomError) {
@ -55,6 +74,19 @@ getWebScraperQueue().process(
          "Something went wrong... Contact help@mendable.ai or try again." /* etc... */,
      };
      await callWebhook(job.data.team_id, data);
      await logJob({
        success: false,
        message: typeof error === 'string' ? error : (error.message ?? "Something went wrong... Contact help@mendable.ai"),
        num_docs: 0,
        docs: [],
        time_taken: 0,
        team_id: job.data.team_id,
        mode: "crawl",
        url: job.data.url,
        crawlerOptions: job.data.crawlerOptions,
        pageOptions: job.data.pageOptions,
        origin: job.data.origin,
      });
      done(null, data);
    }
  }
--- a/apps/api/src/services/rate-limiter.ts
+++ b/apps/api/src/services/rate-limiter.ts
@ -1,5 +1,6 @@
 import { RateLimiterRedis } from "rate-limiter-flexible";
 import * as redis from "redis";
 import { RateLimiterMode } from "../../src/types";
 const MAX_REQUESTS_PER_MINUTE_PREVIEW = 5;
 const MAX_CRAWLS_PER_MINUTE_STARTER = 2;
@ -8,6 +9,9 @@ const MAX_CRAWLS_PER_MINUTE_SCALE = 20;
 const MAX_REQUESTS_PER_MINUTE_ACCOUNT = 20;
 const MAX_REQUESTS_PER_MINUTE_CRAWL_STATUS = 120;
 export const redisClient = redis.createClient({
@ -29,6 +33,13 @@ export const serverRateLimiter = new RateLimiterRedis({
  duration: 60, // Duration in seconds
 });
 export const crawlStatusRateLimiter = new RateLimiterRedis({
  storeClient: redisClient,
  keyPrefix: "middleware",
  points: MAX_REQUESTS_PER_MINUTE_CRAWL_STATUS,
  duration: 60, // Duration in seconds
 });
 export function crawlRateLimit(plan: string){
  if(plan === "standard"){
@ -56,10 +67,15 @@ export function crawlRateLimit(plan: string){
 }
-export function getRateLimiter(preview: boolean){
+
-  if(preview){
+
 export function getRateLimiter(mode: RateLimiterMode){
  switch(mode) {
    case RateLimiterMode.Preview:
      return previewRateLimiter;
-  }else{
+    case RateLimiterMode.CrawlStatus:
      return crawlStatusRateLimiter;
    default:
      return serverRateLimiter;
  }
 }
--- a/apps/api/src/services/supabase.ts
+++ b/apps/api/src/services/supabase.ts
@ -1,6 +1,56 @@
-import { createClient } from "@supabase/supabase-js";
+import { createClient, SupabaseClient } from "@supabase/supabase-js";
-export const supabase_service = createClient<any>(
+// SupabaseService class initializes the Supabase client conditionally based on environment variables.
-  process.env.SUPABASE_URL,
+class SupabaseService {
-  process.env.SUPABASE_SERVICE_TOKEN,
+  private client: SupabaseClient | null = null;
  constructor() {
    const supabaseUrl = process.env.SUPABASE_URL;
    const supabaseServiceToken = process.env.SUPABASE_SERVICE_TOKEN;
    // Only initialize the Supabase client if both URL and Service Token are provided.
    if (process.env.USE_DB_AUTHENTICATION === "false") {
      // Warn the user that Authentication is disabled by setting the client to null
      console.warn(
        "\x1b[33mAuthentication is disabled. Supabase client will not be initialized.\x1b[0m"
      );
      this.client = null;
    } else if (!supabaseUrl || !supabaseServiceToken) {
      console.error(
        "\x1b[31mSupabase environment variables aren't configured correctly. Supabase client will not be initialized. Fix ENV configuration or disable DB authentication with USE_DB_AUTHENTICATION env variable\x1b[0m"
      );
    } else {
      this.client = createClient(supabaseUrl, supabaseServiceToken);
    }
  }
  // Provides access to the initialized Supabase client, if available.
  getClient(): SupabaseClient | null {
    return this.client;
  }
 }
 // Using a Proxy to handle dynamic access to the Supabase client or service methods.
 // This approach ensures that if Supabase is not configured, any attempt to use it will result in a clear error.
 export const supabase_service: SupabaseClient = new Proxy(
  new SupabaseService(),
  {
    get: function (target, prop, receiver) {
      const client = target.getClient();
      // If the Supabase client is not initialized, intercept property access to provide meaningful error feedback.
      if (client === null) {
        console.error(
          "Attempted to access Supabase client when it's not configured."
        );
        return () => {
          throw new Error("Supabase client is not configured.");
        };
      }
      // Direct access to SupabaseService properties takes precedence.
      if (prop in target) {
        return Reflect.get(target, prop, receiver);
      }
      // Otherwise, delegate access to the Supabase client.
      return Reflect.get(client, prop, receiver);
    },
  }
 ) as unknown as SupabaseClient;
--- a/apps/api/src/services/webhook.ts
+++ b/apps/api/src/services/webhook.ts
@ -1,6 +1,7 @@
 import { supabase_service } from "./supabase";
 export const callWebhook = async (teamId: string, data: any) => {
  try {
  const { data: webhooksData, error } = await supabase_service
    .from('webhooks')
    .select('url')
@ -38,4 +39,8 @@ export const callWebhook = async (teamId: string, data: any) => {
      error: data.error || undefined,
    }),
    });
  } catch (error) {
    console.error(`Error sending webhook for team ID: ${teamId}`, error.message);
  }
 };
--- a/apps/api/src/types.ts
+++ b/apps/api/src/types.ts
@ -22,7 +22,37 @@ export interface WebScraperOptions {
  crawlerOptions: any;
  pageOptions: any;
  team_id: string;
  origin?: string;
 }
 export interface FirecrawlJob {
  success: boolean;
  message: string;
  num_docs: number;
  docs: any[];
  time_taken: number;
  team_id: string;
  mode: string;
  url: string;
  crawlerOptions?: any;
  pageOptions?: any;
  origin: string;
 }
 export enum RateLimiterMode {
  Crawl = "crawl",
  CrawlStatus = "crawl-status",
  Scrape = "scrape",
  Preview = "preview",
  Search = "search",
 }
 export interface AuthResponse {
  success: boolean;
  team_id?: string;
  error?: string;
  status?: number;
 }
--- a/apps/js-sdk/firecrawl/build/index.js
+++ b/apps/js-sdk/firecrawl/build/index.js
@ -61,13 +61,50 @@ export default class FirecrawlApp {
            return { success: false, error: 'Internal server error.' };
        });
    }
    /**
     * Searches for a query using the Firecrawl API.
     * @param {string} query - The query to search for.
     * @param {Params | null} params - Additional parameters for the search request.
     * @returns {Promise<SearchResponse>} The response from the search operation.
     */
    search(query_1) {
        return __awaiter(this, arguments, void 0, function* (query, params = null) {
            const headers = {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${this.apiKey}`,
            };
            let jsonData = { query };
            if (params) {
                jsonData = Object.assign(Object.assign({}, jsonData), params);
            }
            try {
                const response = yield axios.post('https://api.firecrawl.dev/v0/search', jsonData, { headers });
                if (response.status === 200) {
                    const responseData = response.data;
                    if (responseData.success) {
                        return responseData;
                    }
                    else {
                        throw new Error(`Failed to search. Error: ${responseData.error}`);
                    }
                }
                else {
                    this.handleError(response, 'search');
                }
            }
            catch (error) {
                throw new Error(error.message);
            }
            return { success: false, error: 'Internal server error.' };
        });
    }
    /**
     * Initiates a crawl job for a URL using the Firecrawl API.
     * @param {string} url - The URL to crawl.
     * @param {Params | null} params - Additional parameters for the crawl request.
     * @param {boolean} waitUntilDone - Whether to wait for the crawl job to complete.
     * @param {number} timeout - Timeout in seconds for job status checks.
-     * @returns {Promise<CrawlResponse>} The response from the crawl operation.
+     * @returns {Promise<CrawlResponse | any>} The response from the crawl operation.
     */
    crawlUrl(url_1) {
        return __awaiter(this, arguments, void 0, function* (url, params = null, waitUntilDone = true, timeout = 2) {
--- a/apps/js-sdk/firecrawl/jest.config.cjs
+++ b/apps/js-sdk/firecrawl/jest.config.cjs
@ -0,0 +1,5 @@
 /** @type {import('ts-jest').JestConfigWithTsJest} */
 module.exports = {
  preset: 'ts-jest',
  testEnvironment: 'node',
 };
--- a/apps/js-sdk/firecrawl/package-lock.json
+++ b/apps/js-sdk/firecrawl/package-lock.json
--- a/apps/js-sdk/firecrawl/package.json
+++ b/apps/js-sdk/firecrawl/package.json
@ -1,11 +1,14 @@
 {
  "name": "@mendable/firecrawl-js",
-  "version": "0.0.10",
+  "version": "0.0.16",
  "description": "JavaScript SDK for Firecrawl API",
  "main": "build/index.js",
  "types": "types/index.d.ts",
  "type": "module",
  "scripts": {
-    "test": "echo \"Error: no test specified\" && exit 1"
+    "build": "tsc",
    "publish": "npm run build && npm publish --access public",
    "test": "jest src/**/*.test.ts"
  },
  "repository": {
    "type": "git",
@ -14,17 +17,18 @@
  "author": "Mendable.ai",
  "license": "MIT",
  "dependencies": {
-    "axios": "^1.6.8",
+    "axios": "^1.6.8"
    "dotenv": "^16.4.5"
  },
  "bugs": {
    "url": "https://github.com/mendableai/firecrawl/issues"
  },
  "homepage": "https://github.com/mendableai/firecrawl#readme",
  "devDependencies": {
    "@jest/globals": "^29.7.0",
    "@types/axios": "^0.14.0",
    "@types/dotenv": "^8.2.0",
    "@types/node": "^20.12.7",
    "jest": "^29.7.0",
    "ts-jest": "^29.1.2",
    "typescript": "^5.4.5"
  },
  "keywords": [
--- a/apps/js-sdk/firecrawl/src/tests/fixtures/scrape.json
+++ b/apps/js-sdk/firecrawl/src/tests/fixtures/scrape.json
--- a/apps/js-sdk/firecrawl/src/tests/index.test.ts
+++ b/apps/js-sdk/firecrawl/src/tests/index.test.ts
@ -0,0 +1,48 @@
 import { describe, test, expect, jest } from '@jest/globals';
 import axios from 'axios';
 import FirecrawlApp from '../index';
 import { readFile } from 'fs/promises';
 import { join } from 'path';
 // Mock jest and set the type
 jest.mock('axios');
 const mockedAxios = axios as jest.Mocked<typeof axios>;
 // Get the fixure data from the JSON file in ./fixtures
 async function loadFixture(name: string): Promise<string> {
  return await readFile(join(__dirname, 'fixtures', `${name}.json`), 'utf-8')
 }
 describe('the firecrawl JS SDK', () => {
  test('Should require an API key to instantiate FirecrawlApp', async () => {
    const fn = () => {
      new FirecrawlApp({ apiKey: undefined });
    };
    expect(fn).toThrow('No API key provided');
  });
  test('Should return scraped data from a /scrape API call', async () => {
    const mockData = await loadFixture('scrape');
    mockedAxios.post.mockResolvedValue({
      status: 200,
      data: JSON.parse(mockData),
    });
    const apiKey = 'YOUR_API_KEY'
    const app = new FirecrawlApp({ apiKey });
    // Scrape a single URL
    const url = 'https://mendable.ai';
    const scrapedData = await app.scrapeUrl(url);
    expect(mockedAxios.post).toHaveBeenCalledTimes(1);
    expect(mockedAxios.post).toHaveBeenCalledWith(
      expect.stringMatching(/^https:\/\/api.firecrawl.dev/),
      expect.objectContaining({ url }),
      expect.objectContaining({ headers: expect.objectContaining({'Authorization': `Bearer ${apiKey}`}) }),
    )
    expect(scrapedData.success).toBe(true);
    expect(scrapedData.data.metadata.title).toEqual('Mendable');
  });
 })
--- a/apps/js-sdk/firecrawl/src/index.ts
+++ b/apps/js-sdk/firecrawl/src/index.ts
@ -1,6 +1,4 @@
 import axios, { AxiosResponse, AxiosRequestHeaders } from 'axios';
 import dotenv from 'dotenv';
 dotenv.config();
 /**
 * Configuration interface for FirecrawlApp.
@ -25,6 +23,14 @@ export interface ScrapeResponse {
  error?: string;
 }
 /**
 * Response interface for searching operations.
 */
 export interface SearchResponse {
  success: boolean;
  data?: any;
  error?: string;
 }
 /**
 * Response interface for crawling operations.
 */
@ -57,7 +63,7 @@ export default class FirecrawlApp {
   * @param {FirecrawlAppConfig} config - Configuration options for the FirecrawlApp instance.
   */
  constructor({ apiKey = null }: FirecrawlAppConfig) {
-    this.apiKey = apiKey || process.env.FIRECRAWL_API_KEY || '';
+    this.apiKey = apiKey || '';
    if (!this.apiKey) {
      throw new Error('No API key provided');
    }
@ -96,15 +102,48 @@ export default class FirecrawlApp {
    return { success: false, error: 'Internal server error.' };
  }
  /**
   * Searches for a query using the Firecrawl API.
   * @param {string} query - The query to search for.
   * @param {Params | null} params - Additional parameters for the search request.
   * @returns {Promise<SearchResponse>} The response from the search operation.
   */
  async search(query: string, params: Params | null = null): Promise<SearchResponse> {
    const headers: AxiosRequestHeaders = {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${this.apiKey}`,
    } as AxiosRequestHeaders;
    let jsonData: Params = { query };
    if (params) {
      jsonData = { ...jsonData, ...params };
    }
    try {
      const response: AxiosResponse = await axios.post('https://api.firecrawl.dev/v0/search', jsonData, { headers });
      if (response.status === 200) {
        const responseData = response.data;
        if (responseData.success) {
          return responseData; 
        } else {
          throw new Error(`Failed to search. Error: ${responseData.error}`);
        }
      } else {
        this.handleError(response, 'search');
      }
    } catch (error: any) {
      throw new Error(error.message);
    }
    return { success: false, error: 'Internal server error.' };
  }
  /**
   * Initiates a crawl job for a URL using the Firecrawl API.
   * @param {string} url - The URL to crawl.
   * @param {Params | null} params - Additional parameters for the crawl request.
   * @param {boolean} waitUntilDone - Whether to wait for the crawl job to complete.
   * @param {number} timeout - Timeout in seconds for job status checks.
-   * @returns {Promise<CrawlResponse>} The response from the crawl operation.
+   * @returns {Promise<CrawlResponse | any>} The response from the crawl operation.
   */
-  async crawlUrl(url: string, params: Params | null = null, waitUntilDone: boolean = true, timeout: number = 2): Promise<CrawlResponse> {
+  async crawlUrl(url: string, params: Params | null = null, waitUntilDone: boolean = true, timeout: number = 2): Promise<CrawlResponse | any> {
    const headers = this.prepareHeaders();
    let jsonData: Params = { url };
    if (params) {
--- a/apps/js-sdk/firecrawl/tsconfig.json
+++ b/apps/js-sdk/firecrawl/tsconfig.json
@ -49,7 +49,7 @@
    // "maxNodeModuleJsDepth": 1,                        /* Specify the maximum folder depth used for checking JavaScript files from 'node_modules'. Only applicable with 'allowJs'. */
    /* Emit */
-    // "declaration": true,                              /* Generate .d.ts files from TypeScript and JavaScript files in your project. */
+    "declaration": true,                              /* Generate .d.ts files from TypeScript and JavaScript files in your project. */
    // "declarationMap": true,                           /* Create sourcemaps for d.ts files. */
    // "emitDeclarationOnly": true,                      /* Only output d.ts files and not JavaScript files. */
    // "sourceMap": true,                                /* Create source map files for emitted JavaScript files. */
@ -70,7 +70,7 @@
    // "noEmitHelpers": true,                            /* Disable generating custom helper functions like '__extends' in compiled output. */
    // "noEmitOnError": true,                            /* Disable emitting files if any type checking errors are reported. */
    // "preserveConstEnums": true,                       /* Disable erasing 'const enum' declarations in generated code. */
-    // "declarationDir": "./",                           /* Specify the output directory for generated declaration files. */
+    "declarationDir": "./types",                           /* Specify the output directory for generated declaration files. */
    // "preserveValueImports": true,                     /* Preserve unused imported values in the JavaScript output that would otherwise be removed. */
    /* Interop Constraints */
@ -105,5 +105,7 @@
    /* Completeness */
    // "skipDefaultLibCheck": true,                      /* Skip type checking .d.ts files that are included with TypeScript. */
    "skipLibCheck": true                                 /* Skip type checking all .d.ts files. */
-  }
+  },
  "include": ["src/**/*"],
  "exclude": ["node_modules", "dist", "**/__tests__/*"]
 }
--- a/apps/js-sdk/firecrawl/types/index.d.ts
+++ b/apps/js-sdk/firecrawl/types/index.d.ts
@ -0,0 +1,122 @@
 import { AxiosResponse, AxiosRequestHeaders } from 'axios';
 /**
 * Configuration interface for FirecrawlApp.
 */
 export interface FirecrawlAppConfig {
    apiKey?: string | null;
 }
 /**
 * Generic parameter interface.
 */
 export interface Params {
    [key: string]: any;
 }
 /**
 * Response interface for scraping operations.
 */
 export interface ScrapeResponse {
    success: boolean;
    data?: any;
    error?: string;
 }
 /**
 * Response interface for searching operations.
 */
 export interface SearchResponse {
    success: boolean;
    data?: any;
    error?: string;
 }
 /**
 * Response interface for crawling operations.
 */
 export interface CrawlResponse {
    success: boolean;
    jobId?: string;
    data?: any;
    error?: string;
 }
 /**
 * Response interface for job status checks.
 */
 export interface JobStatusResponse {
    success: boolean;
    status: string;
    jobId?: string;
    data?: any;
    error?: string;
 }
 /**
 * Main class for interacting with the Firecrawl API.
 */
 export default class FirecrawlApp {
    private apiKey;
    /**
     * Initializes a new instance of the FirecrawlApp class.
     * @param {FirecrawlAppConfig} config - Configuration options for the FirecrawlApp instance.
     */
    constructor({ apiKey }: FirecrawlAppConfig);
    /**
     * Scrapes a URL using the Firecrawl API.
     * @param {string} url - The URL to scrape.
     * @param {Params | null} params - Additional parameters for the scrape request.
     * @returns {Promise<ScrapeResponse>} The response from the scrape operation.
     */
    scrapeUrl(url: string, params?: Params | null): Promise<ScrapeResponse>;
    /**
     * Searches for a query using the Firecrawl API.
     * @param {string} query - The query to search for.
     * @param {Params | null} params - Additional parameters for the search request.
     * @returns {Promise<SearchResponse>} The response from the search operation.
     */
    search(query: string, params?: Params | null): Promise<SearchResponse>;
    /**
     * Initiates a crawl job for a URL using the Firecrawl API.
     * @param {string} url - The URL to crawl.
     * @param {Params | null} params - Additional parameters for the crawl request.
     * @param {boolean} waitUntilDone - Whether to wait for the crawl job to complete.
     * @param {number} timeout - Timeout in seconds for job status checks.
     * @returns {Promise<CrawlResponse | any>} The response from the crawl operation.
     */
    crawlUrl(url: string, params?: Params | null, waitUntilDone?: boolean, timeout?: number): Promise<CrawlResponse | any>;
    /**
     * Checks the status of a crawl job using the Firecrawl API.
     * @param {string} jobId - The job ID of the crawl operation.
     * @returns {Promise<JobStatusResponse>} The response containing the job status.
     */
    checkCrawlStatus(jobId: string): Promise<JobStatusResponse>;
    /**
     * Prepares the headers for an API request.
     * @returns {AxiosRequestHeaders} The prepared headers.
     */
    prepareHeaders(): AxiosRequestHeaders;
    /**
     * Sends a POST request to the specified URL.
     * @param {string} url - The URL to send the request to.
     * @param {Params} data - The data to send in the request.
     * @param {AxiosRequestHeaders} headers - The headers for the request.
     * @returns {Promise<AxiosResponse>} The response from the POST request.
     */
    postRequest(url: string, data: Params, headers: AxiosRequestHeaders): Promise<AxiosResponse>;
    /**
     * Sends a GET request to the specified URL.
     * @param {string} url - The URL to send the request to.
     * @param {AxiosRequestHeaders} headers - The headers for the request.
     * @returns {Promise<AxiosResponse>} The response from the GET request.
     */
    getRequest(url: string, headers: AxiosRequestHeaders): Promise<AxiosResponse>;
    /**
     * Monitors the status of a crawl job until completion or failure.
     * @param {string} jobId - The job ID of the crawl operation.
     * @param {AxiosRequestHeaders} headers - The headers for the request.
     * @param {number} timeout - Timeout in seconds for job status checks.
     * @returns {Promise<any>} The final job status or data.
     */
    monitorJobStatus(jobId: string, headers: AxiosRequestHeaders, timeout: number): Promise<any>;
    /**
     * Handles errors from API responses.
     * @param {AxiosResponse} response - The response from the API.
     * @param {string} action - The action being performed when the error occurred.
     */
    handleError(response: AxiosResponse, action: string): void;
 }
--- a/apps/js-sdk/package-lock.json
+++ b/apps/js-sdk/package-lock.json
@ -9,14 +9,14 @@
      "version": "1.0.0",
      "license": "ISC",
      "dependencies": {
-        "@mendable/firecrawl-js": "^0.0.8",
+        "@mendable/firecrawl-js": "^0.0.15",
        "axios": "^1.6.8"
      }
    },
    "node_modules/@mendable/firecrawl-js": {
-      "version": "0.0.8",
+      "version": "0.0.15",
-      "resolved": "https://registry.npmjs.org/@mendable/firecrawl-js/-/firecrawl-js-0.0.8.tgz",
+      "resolved": "https://registry.npmjs.org/@mendable/firecrawl-js/-/firecrawl-js-0.0.15.tgz",
-      "integrity": "sha512-dD7eA5X6UT8CM3z7qCqHgA4YbCsdwmmlaT/L0/ozM6gGvb0PnJMoB+e51+n4lAW8mxXOvHGbq9nrgBT1wEhhhw==",
+      "integrity": "sha512-e3iCCrLIiEh+jEDerGV9Uhdkn8ymo+sG+k3osCwPg51xW1xUdAnmlcHrcJoR43RvKXdvD/lqoxg8odUEsqyH+w==",
      "dependencies": {
        "axios": "^1.6.8",
        "dotenv": "^16.4.5"
--- a/apps/js-sdk/package.json
+++ b/apps/js-sdk/package.json
@ -11,7 +11,7 @@
  "author": "",
  "license": "ISC",
  "dependencies": {
-    "@mendable/firecrawl-js": "^0.0.8",
+    "@mendable/firecrawl-js": "^0.0.15",
    "axios": "^1.6.8"
  }
 }
--- a/apps/playwright-service/main.py
+++ b/apps/playwright-service/main.py
@ -1,28 +1,36 @@
-from fastapi import FastAPI, Response
+from fastapi import FastAPI
-from playwright.async_api import async_playwright
+from playwright.async_api import async_playwright, Browser
 import os
 from fastapi.responses import JSONResponse
 from pydantic import BaseModel
 app = FastAPI()
 from pydantic import BaseModel
 class UrlModel(BaseModel):
    url: str
@app.post("/html")  # Kept as POST to accept body parameters
 async def root(body: UrlModel):  # Using Pydantic model for request body
    async with async_playwright() as p:
        browser = await p.chromium.launch()
-        context = await browser.new_context()
+browser: Browser = None
        page = await context.new_page()
        await page.goto(body.url)  # Adjusted to use the url from the request body model
        page_content = await page.content()  # Get the HTML content of the page
@app.on_event("startup")
 async def startup_event():
    global browser
    playwright = await async_playwright().start()
    browser = await playwright.chromium.launch()
@app.on_event("shutdown")
 async def shutdown_event():
    await browser.close()
@app.post("/html")
 async def root(body: UrlModel):
    context = await browser.new_context()
    page = await context.new_page()
    await page.goto(body.url)
    page_content = await page.content()
    await context.close()
    json_compatible_item_data = {"content": page_content}
    return JSONResponse(content=json_compatible_item_data)
--- a/apps/python-sdk/README.md
+++ b/apps/python-sdk/README.md
@ -47,6 +47,15 @@ url = 'https://example.com'
 scraped_data = app.scrape_url(url)
 ```
 ### Search for a query
 Used to search the web, get the most relevant results, scrap each page and return the markdown.
 ```python
 query = 'what is mendable?'
 search_result = app.search(query)
 ```
 ### Crawling a Website
 To crawl a website, use the `crawl_url` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
--- a/apps/python-sdk/build/lib/firecrawl/firecrawl.py
+++ b/apps/python-sdk/build/lib/firecrawl/firecrawl.py
@ -33,6 +33,32 @@ class FirecrawlApp:
        else:
            raise Exception(f'Failed to scrape URL. Status code: {response.status_code}')
    def search(self, query, params=None):
        headers = {
            'Content-Type': 'application/json',
            'Authorization': f'Bearer {self.api_key}'
        }
        json_data = {'query': query}
        if params:
            json_data.update(params)
        response = requests.post(
            'https://api.firecrawl.dev/v0/search',
            headers=headers,
            json=json_data
        )
        if response.status_code == 200:
            response = response.json()
            if response['success'] == True:
                return response['data']
            else:
                raise Exception(f'Failed to search. Error: {response["error"]}')
        elif response.status_code in [402, 409, 500]:
            error_message = response.json().get('error', 'Unknown error occurred')
            raise Exception(f'Failed to search. Status code: {response.status_code}. Error: {error_message}')
        else:
            raise Exception(f'Failed to search. Status code: {response.status_code}')
    def crawl_url(self, url, params=None, wait_until_done=True, timeout=2):
        headers = self._prepare_headers()
        json_data = {'url': url}
--- a/apps/python-sdk/dist/firecrawl-py-0.0.5.tar.gz
+++ b/apps/python-sdk/dist/firecrawl-py-0.0.5.tar.gz
--- a/apps/python-sdk/dist/firecrawl-py-0.0.6.tar.gz
+++ b/apps/python-sdk/dist/firecrawl-py-0.0.6.tar.gz
--- a/apps/python-sdk/dist/firecrawl_py-0.0.5-py3-none-any.whl
+++ b/apps/python-sdk/dist/firecrawl_py-0.0.5-py3-none-any.whl
--- a/apps/python-sdk/dist/firecrawl_py-0.0.6-py3-none-any.whl
+++ b/apps/python-sdk/dist/firecrawl_py-0.0.6-py3-none-any.whl
--- a/apps/python-sdk/firecrawl/pycache/init.cpython-311.pyc
+++ b/apps/python-sdk/firecrawl/pycache/init.cpython-311.pyc
--- a/apps/python-sdk/firecrawl/pycache/firecrawl.cpython-311.pyc
+++ b/apps/python-sdk/firecrawl/pycache/firecrawl.cpython-311.pyc
--- a/apps/python-sdk/firecrawl/firecrawl.py
+++ b/apps/python-sdk/firecrawl/firecrawl.py
@ -1,5 +1,6 @@
 import os
 import requests
 import time
 class FirecrawlApp:
    def __init__(self, api_key=None):
@ -33,6 +34,32 @@ class FirecrawlApp:
        else:
            raise Exception(f'Failed to scrape URL. Status code: {response.status_code}')
    def search(self, query, params=None):
        headers = {
            'Content-Type': 'application/json',
            'Authorization': f'Bearer {self.api_key}'
        }
        json_data = {'query': query}
        if params:
            json_data.update(params)
        response = requests.post(
            'https://api.firecrawl.dev/v0/search',
            headers=headers,
            json=json_data
        )
        if response.status_code == 200:
            response = response.json()
            if response['success'] == True:
                return response['data']
            else:
                raise Exception(f'Failed to search. Error: {response["error"]}')
        elif response.status_code in [402, 409, 500]:
            error_message = response.json().get('error', 'Unknown error occurred')
            raise Exception(f'Failed to search. Status code: {response.status_code}. Error: {error_message}')
        else:
            raise Exception(f'Failed to search. Status code: {response.status_code}')
    def crawl_url(self, url, params=None, wait_until_done=True, timeout=2):
        headers = self._prepare_headers()
        json_data = {'url': url}
@ -62,11 +89,23 @@ class FirecrawlApp:
            'Authorization': f'Bearer {self.api_key}'
        }
-    def _post_request(self, url, data, headers):
+    def _post_request(self, url, data, headers, retries=3, backoff_factor=0.5):
-        return requests.post(url, headers=headers, json=data)
+        for attempt in range(retries):
            response = requests.post(url, headers=headers, json=data)
            if response.status_code == 502:
                time.sleep(backoff_factor * (2 ** attempt))
            else:
                return response
        return response
-    def _get_request(self, url, headers):
+    def _get_request(self, url, headers, retries=3, backoff_factor=0.5):
-        return requests.get(url, headers=headers)
+        for attempt in range(retries):
            response = requests.get(url, headers=headers)
            if response.status_code == 502:
                time.sleep(backoff_factor * (2 ** attempt))
            else:
                return response
        return response
    def _monitor_job_status(self, job_id, headers, timeout):
        import time
--- a/apps/python-sdk/firecrawl_py.egg-info/PKG-INFO
+++ b/apps/python-sdk/firecrawl_py.egg-info/PKG-INFO
@ -1,7 +1,7 @@
 Metadata-Version: 2.1
 Name: firecrawl-py
-Version: 0.0.5
+Version: 0.0.6
 Summary: Python SDK for Firecrawl API
-Home-page: https://github.com/mendableai/firecrawl-py
+Home-page: https://github.com/mendableai/firecrawl
 Author: Mendable.ai
 Author-email: nick@mendable.ai
--- a/apps/python-sdk/setup.py
+++ b/apps/python-sdk/setup.py
@ -2,8 +2,8 @@ from setuptools import setup, find_packages
 setup(
    name='firecrawl-py',
-    version='0.0.5',
+    version='0.0.6',
-    url='https://github.com/mendableai/firecrawl-py',
+    url='https://github.com/mendableai/firecrawl',
    author='Mendable.ai',
    author_email='nick@mendable.ai',
    description='Python SDK for Firecrawl API',
--- a/tutorials/contradiction-testing-using-llms.mdx
+++ b/tutorials/contradiction-testing-using-llms.mdx
@ -0,0 +1,78 @@
 # Build an agent that check your website for contradictions
 Learn how to use Firecrawl and Claude to scrape your website's data and look for contradictions and inconsistencies in a few lines of code. When you are shipping fast, data is bound to get stale, with FireCrawl and LLMs you can make sure your public web data is always consistent! We will be using Opus's huge 200k context window and Firecrawl's parellization, making this process accurate and fast.
 ## Setup
 Install our python dependencies, including anthropic and firecrawl-py.
 ```bash
 pip install firecrawl-py anthropic
 ```
 ## Getting your Claude and Firecrawl API Keys
 To use Claude Opus and Firecrawl, you will need to get your API keys. You can get your Anthropic API key from [here](https://www.anthropic.com/) and your Firecrawl API key from [here](https://firecrawl.dev).
 ## Load website with Firecrawl
 To be able to get all the data from our website page put it into an easy to read format for the LLM, we will use [FireCrawl](https://firecrawl.dev). It handles by-passing JS-blocked websites, extracting the main content, and outputting in a LLM-readable format for increased accuracy.
 Here is how we will scrape a website url using Firecrawl-py
 ```python
 from firecrawl import FirecrawlApp
 app = FirecrawlApp(api_key="YOUR-KEY")
 crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*','usecases/*']}})
 print(crawl_result)
 ```
 With all of the web data we want scraped and in a clean format, we can move onto the next step.
 ## Combination and Generation
 Now that we have the website data, let's pair up every page and run every combination through Opus for analysis.
 ```python
 from itertools import combinations
 page_combinations = []
 for first_page, second_page in combinations(crawl_result, 2):
    combined_string = "First Page:\n" + first_page['markdown'] + "\n\nSecond Page:\n" + second_page['markdown']
    page_combinations.append(combined_string)
 import anthropic
 client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    api_key="YOUR-KEY",
 )
 final_output = []
 for page_combination in page_combinations:
    prompt = "Here are two pages from a companies website, your job is to find any contradictions or differences in opinion between the two pages, this could be caused by outdated information or other. If you find any contradictions, list them out and provide a brief explanation of why they are contradictory or differing. Make sure the explanation is specific and concise. It is okay if you don't find any contradictions, just say 'No contradictions found' and nothing else. Here are the pages: " + "\n\n".join(page_combination)
    message = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1000,
        temperature=0.0,
        system="You are an assistant that helps find contradictions or differences in opinion between pages in a company website and knowledge base. This could be caused by outdated information in the knowledge base.",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    final_output.append(message.content)
 ```
 ## That's about it!
 You have now built an agent that looks at your website and spots any inconsistencies it might have.
 If you have any questions or need help, feel free to reach out to us at [Firecrawl](https://firecrawl.dev).
--- a/tutorials/data-extraction-using-llms.mdx
+++ b/tutorials/data-extraction-using-llms.mdx
@ -0,0 +1,92 @@
 # Extract website data using LLMs
 Learn how to use Firecrawl and Groq to extract structured data from a web page in a few lines of code. With Groq fast inference speeds and firecrawl parellization, you can extract data from web pages *super* fast.
 ## Setup
 Install our python dependencies, including groq and firecrawl-py. 
 ```bash
 pip install groq firecrawl-py
 ```
 ## Getting your Groq and Firecrawl API Keys
 To use Groq and Firecrawl, you will need to get your API keys. You can get your Groq API key from [here](https://groq.com) and your Firecrawl API key from [here](https://firecrawl.dev).   
 ## Load website with Firecrawl
 To be able to get all the data from a website page and make sure it is in the cleanest format, we will use [FireCrawl](https://firecrawl.dev). It handles by-passing JS-blocked websites, extracting the main content, and outputting in a LLM-readable format for increased accuracy.
 Here is how we will scrape a website url using Firecrawl. We will also set a `pageOptions` for only extracting the main content (`onlyMainContent: True`) of the website page - excluding the navs, footers, etc.
 ```python
 from firecrawl import FirecrawlApp  # Importing the FireCrawlLoader
 url = "https://about.fb.com/news/2024/04/introducing-our-open-mixed-reality-ecosystem/"
 firecrawl = FirecrawlApp(
    api_key="fc-YOUR_FIRECRAWL_API_KEY",
 )
 page_content = firecrawl.scrape_url(url=url,  # Target URL to crawl
    params={
        "pageOptions":{
            "onlyMainContent": True # Ignore navs, footers, etc.
        }
    })
 print(page_content)
 ```
 Perfect, now we have clean data from the website - ready to be fed to the LLM for data extraction.
 ## Extraction and Generation
 Now that we have the website data, let's use Groq to pull out the information we need. We'll use Groq Llama 3 model in JSON mode and pick out certain fields from the page content.
 We are using LLama 3 8b model for this example. Feel free to use bigger models for improved results.
 ```python
 import json
 from groq import Groq
 client = Groq(
    api_key="gsk_YOUR_GROQ_API_KEY",  # Note: Replace 'API_KEY' with your actual Groq API key
 )
 # Here we define the fields we want to extract from the page content
 extract = ["summary","date","companies_building_with_quest","title_of_the_article","people_testimonials"]
 completion = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[
        {
            "role": "system",
            "content": "You are a legal advisor who extracts information from documents in JSON."
        },
        {
            "role": "user",
            # Here we pass the page content and the fields we want to extract
            "content": f"Extract the following information from the provided documentation:\Page content:\n\n{page_content}\n\nInformation to extract: {extract}"
        }
    ],
    temperature=0,
    max_tokens=1024,
    top_p=1,
    stream=False,
    stop=None,
    # We set the response format to JSON object
    response_format={"type": "json_object"}
 )
 # Pretty print the JSON response
 dataExtracted = json.dumps(str(completion.choices[0].message.content), indent=4)
 print(dataExtracted)
 ```
 ## And Voila!
 You have now built a data extraction bot using Groq and Firecrawl. You can now use this bot to extract structured data from any website.
 If you have any questions or need help, feel free to reach out to us at [Firecrawl](https://firecrawl.dev).
--- a/tutorials/rag-llama3.mdx
+++ b/tutorials/rag-llama3.mdx
@ -0,0 +1,91 @@
 ---
 title: "Build a 'Chat with website' using Groq Llama 3"
 description: "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
 ---
 ## Setup
 Install our python dependencies, including langchain, groq, faiss, ollama, and firecrawl-py. 
 ```bash
 pip install --upgrade --quiet langchain langchain-community groq faiss-cpu ollama firecrawl-py
 ```
 We will be using Ollama for the embeddings, you can download Ollama [here](https://ollama.com/). But feel free to use any other embeddings you prefer.
 ## Load website with Firecrawl
 To be able to get all the data from a website and make sure it is in the cleanest format, we will use FireCrawl. Firecrawl integrates very easily with Langchain as a document loader.
 Here is how you can load a website with FireCrawl:
 ```python
 from langchain_community.document_loaders import FireCrawlLoader  # Importing the FireCrawlLoader
 url = "https://firecrawl.dev"
 loader = FireCrawlLoader(
    api_key="fc-YOUR_API_KEY", # Note: Replace 'YOUR_API_KEY' with your actual FireCrawl API key
    url=url,  # Target URL to crawl
    mode="crawl"  # Mode set to 'crawl' to crawl all accessible subpages
 )
 docs = loader.load()
 ```
 ## Setup the Vectorstore
 Next, we will setup the vectorstore. The vectorstore is a data structure that allows us to store and query embeddings. We will use the Ollama embeddings and the FAISS vectorstore.
 We split the documents into chunks of 1000 characters each, with a 200 character overlap. This is to ensure that the chunks are not too small and not too big - and that it can fit into the LLM model when we query it.
 ```python
 from langchain_community.embeddings import OllamaEmbeddings
 from langchain_text_splitters import RecursiveCharacterTextSplitter
 from langchain_community.vectorstores import FAISS
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
 splits = text_splitter.split_documents(docs)
 vectorstore = FAISS.from_documents(documents=splits, embedding=OllamaEmbeddings())
 ```
 ## Retrieval and Generation
 Now that our documents  are loaded and the vectorstore is setup, we can, based on user's question, do a similarity search to retrieve the most relevant documents. That way we can use these documents to be fed to the LLM model.
 ```python
 question = "What is firecrawl?"
 docs = vectorstore.similarity_search(query=question)
 ```
 ## Generation
 Last but not least, you can use the Groq to generate a response to a question based on the documents we have loaded.
 ```python
 from groq import Groq
 client = Groq(
    api_key="YOUR_GROQ_API_KEY",
 )
 completion = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[
        {
            "role": "user",
            "content": f"You are a friendly assistant. Your job is to answer the users question based on the documentation provided below:\nDocs:\n\n{docs}\n\nQuestion: {question}"
        }
    ],
    temperature=1,
    max_tokens=1024,
    top_p=1,
    stream=False,
    stop=None,
 )
 print(completion.choices[0].message)
 ```
 ## And Voila!
 You have now built a 'Chat with your website' bot using Llama 3, Groq Llama 3, Langchain, and Firecrawl. You can now use this bot to answer questions based on the documentation of your website.
 If you have any questions or need help, feel free to reach out to us at [Firecrawl](https://firecrawl.dev).