Update README.md
This commit is contained in:
parent
aa6b84c5fa
commit
1033034201
123
README.md
123
README.md
@ -215,8 +215,6 @@ curl -X POST https://api.firecrawl.dev/v0/scrape \
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Coming soon to the Langchain and LLama Index integrations.
|
|
||||||
|
|
||||||
## Using Python SDK
|
## Using Python SDK
|
||||||
|
|
||||||
### Installing Python SDK
|
### Installing Python SDK
|
||||||
@ -250,7 +248,7 @@ scraped_data = app.scrape_url(url)
|
|||||||
|
|
||||||
### Extracting structured data from a URL
|
### Extracting structured data from a URL
|
||||||
|
|
||||||
With LLM extraction, you can easily extract structured data from any URL. We support pydantic schemas to make it easier for you too. Here is how you to use it:
|
With LLM extraction, you can easily extract structured data from any URL. We support pydanti schemas to make it easier for you too. Here is how you to use it:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
class ArticleSchema(BaseModel):
|
class ArticleSchema(BaseModel):
|
||||||
@ -283,6 +281,125 @@ query = 'What is Mendable?'
|
|||||||
search_result = app.search(query)
|
search_result = app.search(query)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Using the Node SDK
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
To install the Firecrawl Node SDK, you can use npm:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm install @mendable/firecrawl-js
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
1. Get an API key from [firecrawl.dev](https://firecrawl.dev)
|
||||||
|
2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class.
|
||||||
|
|
||||||
|
|
||||||
|
### Scraping a URL
|
||||||
|
|
||||||
|
To scrape a single URL with error handling, use the `scrapeUrl` method. It takes the URL as a parameter and returns the scraped data as a dictionary.
|
||||||
|
|
||||||
|
```js
|
||||||
|
try {
|
||||||
|
const url = 'https://example.com';
|
||||||
|
const scrapedData = await app.scrapeUrl(url);
|
||||||
|
console.log(scrapedData);
|
||||||
|
|
||||||
|
} catch (error) {
|
||||||
|
console.error(
|
||||||
|
'Error occurred while scraping:',
|
||||||
|
error.message
|
||||||
|
);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Crawling a Website
|
||||||
|
|
||||||
|
To crawl a website with error handling, use the `crawlUrl` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
|
||||||
|
|
||||||
|
```js
|
||||||
|
const crawlUrl = 'https://example.com';
|
||||||
|
const params = {
|
||||||
|
crawlerOptions: {
|
||||||
|
excludes: ['blog/'],
|
||||||
|
includes: [], // leave empty for all pages
|
||||||
|
limit: 1000,
|
||||||
|
},
|
||||||
|
pageOptions: {
|
||||||
|
onlyMainContent: true
|
||||||
|
}
|
||||||
|
};
|
||||||
|
const waitUntilDone = true;
|
||||||
|
const timeout = 5;
|
||||||
|
const crawlResult = await app.crawlUrl(
|
||||||
|
crawlUrl,
|
||||||
|
params,
|
||||||
|
waitUntilDone,
|
||||||
|
timeout
|
||||||
|
);
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Checking Crawl Status
|
||||||
|
|
||||||
|
To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the job ID as a parameter and returns the current status of the crawl job.
|
||||||
|
|
||||||
|
```js
|
||||||
|
const status = await app.checkCrawlStatus(jobId);
|
||||||
|
console.log(status);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extracting structured data from a URL
|
||||||
|
|
||||||
|
With LLM extraction, you can easily extract structured data from any URL. We support zod schema to make it easier for you too. Here is how you to use it:
|
||||||
|
|
||||||
|
```js
|
||||||
|
import FirecrawlApp from "@mendable/firecrawl-js";
|
||||||
|
import { z } from "zod";
|
||||||
|
|
||||||
|
const app = new FirecrawlApp({
|
||||||
|
apiKey: "fc-YOUR_API_KEY",
|
||||||
|
});
|
||||||
|
|
||||||
|
// Define schema to extract contents into
|
||||||
|
const schema = z.object({
|
||||||
|
top: z
|
||||||
|
.array(
|
||||||
|
z.object({
|
||||||
|
title: z.string(),
|
||||||
|
points: z.number(),
|
||||||
|
by: z.string(),
|
||||||
|
commentsURL: z.string(),
|
||||||
|
})
|
||||||
|
)
|
||||||
|
.length(5)
|
||||||
|
.describe("Top 5 stories on Hacker News"),
|
||||||
|
});
|
||||||
|
const scrapeResult = await app.scrapeUrl("https://firecrawl.dev", {
|
||||||
|
extractorOptions: { extractionSchema: schema },
|
||||||
|
});
|
||||||
|
console.log(scrapeResult.data["llm_extraction"]);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search for a query
|
||||||
|
|
||||||
|
With the `search` method, you can search for a query in a search engine and get the top results along with the page content for each result. The method takes the query as a parameter and returns the search results.
|
||||||
|
|
||||||
|
```js
|
||||||
|
const query = 'what is mendable?';
|
||||||
|
const searchResults = await app.search(query, {
|
||||||
|
pageOptions: {
|
||||||
|
fetchPageContent: true // Fetch the page content for each search result
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.
|
We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.
|
||||||
|
Loading…
Reference in New Issue
Block a user