0

load tests for scrape route

This commit is contained in:
rafaelsideguide 2024-05-22 09:30:32 -03:00
parent 75f4e34d8e
commit 068a240ab4
15 changed files with 5150 additions and 8 deletions

View File

@ -17,20 +17,20 @@ kill_timeout = '5s'
[http_service] [http_service]
internal_port = 8080 internal_port = 8080
force_https = true force_https = true
auto_stop_machines = false auto_stop_machines = true
auto_start_machines = true auto_start_machines = true
min_machines_running = 2 min_machines_running = 2
processes = ['app'] processes = ['app']
[http_service.concurrency] [http_service.concurrency]
type = "requests" type = "requests"
hard_limit = 1000 hard_limit = 100
soft_limit = 1000 soft_limit = 50
[[services]] [[services]]
protocol = 'tcp' protocol = 'tcp'
internal_port = 8080 internal_port = 8080
processes = ['app'] processes = ['worker']
[[services.ports]] [[services.ports]]
port = 80 port = 80
@ -43,8 +43,8 @@ kill_timeout = '5s'
[services.concurrency] [services.concurrency]
type = 'connections' type = 'connections'
hard_limit = 1000 hard_limit = 25
soft_limit = 1000 soft_limit = 20
[[vm]] [[vm]]
size = 'performance-1x' size = 'performance-1x'

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,98 @@
# Scraping Load Testing - Test #1
## Summary
The load test successfully processed 600 requests in 60 seconds with all requests returning HTTP 200 status codes. The average response time was 1380.1 ms, with CPU utilization peaking at around 50% on both machines, indicating sufficient CPU resources. However, there was a significant increase in memory usage post-test, which did not return to pre-test levels, suggesting a potential memory leak. Further investigation and additional load tests are recommended to address this issue and optimize the system's performance.
## Table of Contents
- [Scraping Load Testing - Test #1](#scraping-load-testing---test-1)
- [Summary](#summary)
- [Table of Contents](#table-of-contents)
- [Test environment](#test-environment)
- [Machines](#machines)
- [Load #1 - 600 reqs 60 secs (initial load only)](#load-1---600-reqs-60-secs-initial-load-only)
- [Archillery Report](#archillery-report)
- [CPU Utilization](#cpu-utilization)
- [Memory Utilization](#memory-utilization)
- [Conclusions and Next Steps](#conclusions-and-next-steps)
- [Conclusions](#conclusions)
- [Next Steps](#next-steps)
## Test environment
### Machines
| Machine | Size/CPU |
|---|---|
| e286de4f711e86 mia (app) | performance-cpu-1x@2048MB |
| 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB |
---
## Load #1 - 600 reqs 60 secs (initial load only)
```yml
# load-test.yml
- duration: 60
arrivalRate: 10 # Initial load
```
### Archillery Report
Date: 10:49:39(-0300)
| Metric | Value |
|---------------------------------------------|---------|
| http.codes.200 | 600 |
| http.downloaded_bytes | 0 |
| http.request_rate | 10/sec |
| http.requests | 600 |
| http.response_time.min | 984 |
| http.response_time.max | 2267 |
| http.response_time.mean | 1380.1 |
| http.response_time.median | 1353.1 |
| http.response_time.p95 | 1755 |
| http.response_time.p99 | 2059.5 |
| http.responses | 600 |
| vusers.completed | 600 |
| vusers.created | 600 |
| vusers.created_by_name.Scrape a URL | 600 |
| vusers.failed | 0 |
| vusers.session_length.min | 1053.7 |
| vusers.session_length.max | 2332.6 |
| vusers.session_length.mean | 1447.4 |
| vusers.session_length.median | 1436.8 |
| vusers.session_length.p95 | 1863.5 |
| vusers.session_length.p99 | 2143.5 |
### CPU Utilization
![](./assets/CPU-utilization-report-test-1.png)
Both machines peaked at around 50% CPU utilization.
### Memory Utilization
![](./assets/memory-utilization-report-test-1.png)
| Machine | Before | After Load Test |
|---|---|---|
| e286de4f711e86 | 295 MiB | 358 MiB |
| 73d8dd909c1189 | 296 MiB | 355 MiB |
Notice that the memory utilization has not re-stabilished to the pre-test values during the check window, which may indicate a memory leak problem.
---
## Conclusions and Next Steps
### Conclusions
1. **Performance:** The system handled 600 requests in 60 seconds with a mean response time of 1380.1 ms. All requests were successful (HTTP 200).
2. **CPU Utilization:** Both machines peaked at around 50% CPU utilization, indicating that the CPU resources were sufficient for the load.
3. **Memory Utilization:** There was a noticeable increase in memory usage on both machines post-test, and the memory did not re-stabilize to pre-test levels, suggesting a potential memory leak.
### Next Steps
1. **Investigate Memory Leak:** Conduct a detailed analysis to identify and fix the potential memory leak. This may involve profiling the application and reviewing the code for memory management issues.
2. **Additional Load Tests:** Perform additional load tests with varying request rates and durations to further assess the system's performance and stability.
3. **Optimize Performance:** Based on the findings, optimize the application to improve response times and resource utilization.
4. **Monitor in Production:** Implement monitoring in the production environment to ensure that similar issues do not occur under real-world conditions.
5. **Documentation:** Update the documentation with the findings and any changes made to the system as a result of this test.
By following these steps, we can ensure that the system is robust, efficient, and ready to handle production workloads.

View File

@ -0,0 +1,93 @@
# Scraping Load Testing - Test #2
## Summary
The load test encountered significant issues, processing 9000 requests with 5473 timeouts and a 61.6% failure rate. The average response time was 3682.1 ms, with a peak response time of 9919 ms. Both machines reached 100% CPU utilization, leading to severe performance bottlenecks and high failure rates. This indicates the need for substantial optimizations, autoscaling, and further investigation.
## Table of Contents
- [Scraping Load Testing - Test #2](#scraping-load-testing---test-2)
- [Summary](#summary)
- [Table of Contents](#table-of-contents)
- [Test environment](#test-environment)
- [Machines](#machines)
- [Load #2 - 9000 reqs 7 mins 11 secs (4 phases)](#load-2---9000-reqs-7-mins-11-secs-4-phases)
- [Archillery Report](#archillery-report)
- [Metrics](#metrics)
- [Conclusions and Next Steps](#conclusions-and-next-steps)
- [Conclusions](#conclusions)
- [Next Steps](#next-steps)
## Test environment
### Machines
| Machine | Size/CPU |
|---|---|
| e286de4f711e86 mia (app) | performance-cpu-1x@2048MB |
| 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB |
---
## Load #2 - 9000 reqs 7 mins 11 secs (4 phases)
```yml
# load-test.yml
- duration: 60
arrivalRate: 10 # Initial load
- duration: 120
arrivalRate: 20 # Increased load
- duration: 180
arrivalRate: 30 # Peak load
- duration: 60
arrivalRate: 10 # Cool down
```
### Archillery Report
Date: 13:50:08(-0300)
| Metric | Value |
|---------------------------------------------|---------|
| errors.ETIMEDOUT | 5473 |
| errors.Failed capture or match | 73 |
| http.codes.200 | 3454 |
| http.codes.401 | 64 |
| http.codes.402 | 9 |
| http.downloaded_bytes | 0 |
| http.request_rate | 21/sec |
| http.requests | 9000 |
| http.response_time.min | 929 |
| http.response_time.max | 9919 |
| http.response_time.mean | 3682.1 |
| http.response_time.median | 3395.5 |
| http.response_time.p95 | 8024.5 |
| http.response_time.p99 | 9607.1 |
| http.responses | 3527 |
| vusers.completed | 3454 |
| vusers.created | 9000 |
| vusers.created_by_name.Scrape a URL | 9000 |
| vusers.failed | 5546 |
| vusers.session_length.min | 1127.6 |
| vusers.session_length.max | 9982.2 |
| vusers.session_length.mean | 3730.6 |
| vusers.session_length.median | 3464.1 |
| vusers.session_length.p95 | 7865.6 |
| vusers.session_length.p99 | 9607.1 |
### Metrics
![](./assets/metrics-test-2.png)
Both machines reached 100% CPU utilization, which led to a significant number of request failures (61.6% failure rate).
---
## Conclusions and Next Steps
### Conclusions
1. **Performance:** The system struggled with 9000 requests, resulting in 5473 timeouts and a mean response time of 3682.1 ms.
2. **CPU Utilization:** Both machines experienced 100% CPU utilization, causing severe performance degradation and high failure rates.
### Next Steps
Implement an autoscaling solution on Fly.io and conduct tests using the same configurations.

View File

@ -0,0 +1,107 @@
# Scraping Load Testing - Test #3
## Summary
The load test involved setting up an autoscaling option and adjusting the hard and soft limits for the Fly.io configuration. The test environment consisted of 5 machines, with 3 machines automatically scaling up during the test. Despite the scaling, there were 653 timeouts (7.3%) and 2 HTTP 502 responses (0.02%). The average response time was 3037.2 ms, with a peak response time of 9941 ms. Further adjustments to the soft limit are recommended to improve performance and reduce errors.
## Table of Contents
- [Scraping Load Testing - Test #3](#scraping-load-testing---test-3)
- [Summary](#summary)
- [Table of Contents](#table-of-contents)
- [Test environment](#test-environment)
- [Machines](#machines)
- [Load Test Phases](#load-test-phases)
- [Configuration](#configuration)
- [Results](#results)
- [Metrics](#metrics)
- [Conclusions and Next Steps](#conclusions-and-next-steps)
- [Conclusions](#conclusions)
- [Next Steps](#next-steps)
## Test environment
### Machines
| Machine | Size/CPU | Status |
|---|---|---|
| e286de4f711e86 mia (app) | performance-cpu-1x@2048MB | always on |
| 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB | always on |
| 6e82050c726358 mia (app) | performance-cpu-1x@2048MB | paused |
| 4d89505a6e5038 mia (app) | performance-cpu-1x@2048MB | paused |
| 48ed6e6b74e378 mia (app) | performance-cpu-1x@2048MB | paused |
---
## Load Test Phases
### Configuration
```toml
# fly.staging.toml
[http_service.concurrency]
type = "requests"
hard_limit = 100
soft_limit = 75
```
```yml
# load-test.yml
- duration: 60
arrivalRate: 10 # Initial load
- duration: 120
arrivalRate: 20 # Increased load
- duration: 180
arrivalRate: 30 # Peak load
- duration: 60
arrivalRate: 10 # Cool down
```
### Results
Date: 14:53:32(-0300)
| Metric | Value |
|---------------------------------------------|---------|
| errors.ETIMEDOUT | 653 |
| errors.Failed capture or match | 2 |
| http.codes.200 | 8345 |
| http.codes.502 | 2 |
| http.downloaded_bytes | 0 |
| http.request_rate | 11/sec |
| http.requests | 9000 |
| http.response_time.min | 979 |
| http.response_time.max | 9941 |
| http.response_time.mean | 3037.2 |
| http.response_time.median | 2059.5 |
| http.response_time.p95 | 7709.8 |
| http.response_time.p99 | 9416.8 |
| http.responses | 8347 |
| vusers.completed | 8345 |
| vusers.created | 9000 |
| vusers.created_by_name.Scrape a URL | 9000 |
| vusers.failed | 655 |
| vusers.session_length.min | 1044.5 |
| vusers.session_length.max | 9998.8 |
| vusers.session_length.mean | 3109.7 |
| vusers.session_length.median | 2143.5 |
| vusers.session_length.p95 | 7709.8 |
| vusers.session_length.p99 | 9416.8 |
### Metrics
![](./assets/metrics-test-3.png)
---
## Conclusions and Next Steps
### Conclusions
1. **Performance:** The system handled 9000 requests with a mean response time of 3037.2 ms. There were 653 timeouts and 2 HTTP 502 responses.
2. **Autoscaling:** Three machines automatically scaled up during the test, but the scaling was not sufficient to prevent all errors.
3. **Response Times:** The peak response time was 9941 ms, indicating that the system struggled under peak load conditions.
### Next Steps
1. **Adjust Soft Limit:** Change the soft limit to 100 and the hard limit to 50 to test if machines will start faster and reduce the number of 502 errors.
2. **Further Load Tests:** Conduct additional load tests with the new configuration to assess improvements.
By following these steps, we can enhance the system's performance and reliability under varying load conditions.

View File

@ -0,0 +1,103 @@
# Scraping Load Testing - Test #4
## Summary
The load test was conducted with the Fly.io configuration set to a hard limit of 100 and a soft limit of 50. The test involved four phases with varying arrival rates. Despite the adjustments, there were 1329 timeouts (14.8%) but no HTTP 502 responses. The average response time was 3547.9 ms, with a peak response time of 9935 ms. Further adjustments to the artillery timeout configuration are recommended to improve performance.
## Table of Contents
- [Scraping Load Testing - Test #4](#scraping-load-testing---test-4)
- [Summary](#summary)
- [Table of Contents](#table-of-contents)
- [Test environment](#test-environment)
- [Machines](#machines)
- [Load Test Phases](#load-test-phases)
- [Configuration](#configuration)
- [Results](#results)
- [Metrics](#metrics)
- [Conclusions and Next Steps](#conclusions-and-next-steps)
- [Conclusions](#conclusions)
- [Next Steps](#next-steps)
## Test environment
### Machines
| Machine | Size/CPU | Status |
|---|---|---|
| e286de4f711e86 mia (app) | performance-cpu-1x@2048MB | always on |
| 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB | always on |
| 6e82050c726358 mia (app) | performance-cpu-1x@2048MB | paused |
| 4d89505a6e5038 mia (app) | performance-cpu-1x@2048MB | paused |
| 48ed6e6b74e378 mia (app) | performance-cpu-1x@2048MB | paused |
---
## Load Test Phases
### Configuration
```toml
# fly.staging.toml
[http_service.concurrency]
type = "requests"
hard_limit = 100
soft_limit = 50
```
```yml
# load-test.yml
- duration: 60
arrivalRate: 10 # Initial load
- duration: 120
arrivalRate: 20 # Increased load
- duration: 180
arrivalRate: 30 # Peak load
- duration: 60
arrivalRate: 10 # Cool down
```
### Results
Date: 15:43:26(-0300)
| Metric | Value |
|---------------------------------------------|---------|
| errors.ETIMEDOUT | 1329 |
| http.codes.200 | 7671 |
| http.downloaded_bytes | 0 |
| http.request_rate | 23/sec |
| http.requests | 9000 |
| http.response_time.min | 999 |
| http.response_time.max | 9935 |
| http.response_time.mean | 3547.9 |
| http.response_time.median | 2836.2 |
| http.response_time.p95 | 8352 |
| http.response_time.p99 | 9607.1 |
| http.responses | 7671 |
| vusers.completed | 7671 |
| vusers.created | 9000 |
| vusers.created_by_name.Scrape a URL | 9000 |
| vusers.failed | 1329 |
| vusers.session_length.min | 1063.4 |
| vusers.session_length.max | 10006.8 |
| vusers.session_length.mean | 3616 |
| vusers.session_length.median | 2893.5 |
| vusers.session_length.p95 | 8352 |
| vusers.session_length.p99 | 9607.1 |
## Metrics
![](./assets/metrics-test-4.png)
---
## Conclusions and Next Steps
### Conclusions
1. **Performance:** The system handled 9000 requests with a mean response time of 3547.9 ms. There were 1329 timeouts but no HTTP 502 responses.
2. **Response Times:** The peak response time was 9935 ms, indicating that the system struggled under peak load conditions.
### Next Steps
1. **Adjust Timeout Configuration:** Change the artillery timeout configuration to reduce the number of timeouts.
2. **Further Load Tests:** Conduct additional load tests with the new timeout configuration to assess improvements.
By following these steps, we can enhance the system's performance and reliability under varying load conditions.

View File

@ -0,0 +1,94 @@
# Scraping Load Testing - Test #5
## Summary
The load test was conducted with a higher timeout configuration to address previous timeout issues. The test involved 9000 requests with a timeout set to 30 seconds. The system handled the load well, with only 4 HTTP 502 responses (0.04%). The average response time was 5661.8 ms, with a peak response time of 18924 ms. Further analysis is recommended to optimize response times.
## Table of Contents
- [Scraping Load Testing - Test #5](#scraping-load-testing---test-5)
- [Summary](#summary)
- [Table of Contents](#table-of-contents)
- [Test environment](#test-environment)
- [Machines](#machines)
- [Load Test Configuration](#load-test-configuration)
- [Configuration](#configuration)
- [Results](#results)
- [Metrics](#metrics)
- [Conclusions and Next Steps](#conclusions-and-next-steps)
- [Conclusions](#conclusions)
- [Next Steps](#next-steps)
## Test environment
### Machines
| Machine | Size/CPU | Status |
|---|---|---|
| e286de4f711e86 mia (app) | performance-cpu-1x@2048MB | always on |
| 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB | always on |
| 6e82050c726358 mia (app) | performance-cpu-1x@2048MB | paused |
| 4d89505a6e5038 mia (app) | performance-cpu-1x@2048MB | paused |
| 48ed6e6b74e378 mia (app) | performance-cpu-1x@2048MB | paused |
---
## Load Test Configuration
### Configuration
```yml
http:
timeout: 30
```
### Results
Date: 15:59:50(-0300)
| Metric | Value |
|---------------------------------------------|---------|
| errors.Failed capture or match | 4 |
| http.codes.200 | 8996 |
| http.codes.502 | 4 |
| http.downloaded_bytes | 0 |
| http.request_rate | 23/sec |
| http.requests | 9000 |
| http.response_time.min | 62 |
| http.response_time.max | 18924 |
| http.response_time.mean | 5661.8 |
| http.response_time.median | 5378.9 |
| http.response_time.p95 | 11050.8 |
| http.response_time.p99 | 12968.3 |
| http.responses | 9000 |
| vusers.completed | 8996 |
| vusers.created | 9000 |
| vusers.created_by_name.Scrape a URL | 9000 |
| vusers.failed | 4 |
| vusers.session_length.min | 1079.2 |
| vusers.session_length.max | 18980.3 |
| vusers.session_length.mean | 5734.4 |
| vusers.session_length.median | 5487.5 |
| vusers.session_length.p95 | 11050.8 |
| vusers.session_length.p99 | 12968.3 |
### Metrics
![](./assets/metrics-test-5.png)
---
## Conclusions and Next Steps
### Conclusions
1. **Performance:** The system handled 9000 requests with a mean response time of 5661.8 ms. There were only 4 HTTP 502 responses which represent a 0.04% failure rate.
2. **Response Times:** The peak response time was 18924 ms, indicating that while the system handled the load, there is room for optimization.
### Next Steps
2. **Testing Scraping Strategies:** Conduct further testing on the Playwright instance to ensure it can handle increased load and identify any potential bottlenecks.
3. **Load Testing Other Functionalities:** Evaluate the performance of other critical routes, such as the crawl route, through additional load tests to ensure comprehensive system reliability.
4. **Optimize Response Times:** Investigate and implement strategies to reduce the peak response time from 18924 ms. This could involve optimizing database queries, improving server configurations, or enhancing caching mechanisms.
5. **Error Handling Improvements:** Analyze the causes of the 4 HTTP 502 responses and implement robust error handling and recovery mechanisms to minimize such occurrences in future tests.
6. **Scalability Assessment:** Assess the system's scalability by gradually increasing the load beyond 9000 requests to determine its breaking point and plan for necessary infrastructure upgrades.
By following these steps, we can further enhance the system's performance and reliability under varying load conditions.

View File

@ -1,8 +1,16 @@
config: config:
target: "https://staging-firecrawl-scraper-js.fly.dev/v0" target: "https://staging-firecrawl-scraper-js.fly.dev/v0"
http:
timeout: 30
phases: phases:
- duration: 60 - duration: 60
arrivalRate: 10 arrivalRate: 10 # Initial load
- duration: 120
arrivalRate: 20 # Increased load
- duration: 180
arrivalRate: 30 # Peak load
- duration: 60
arrivalRate: 10 # Cool down
defaults: defaults:
headers: headers:
Authorization: "Bearer {{ $env.TEST_API_KEY }}" Authorization: "Bearer {{ $env.TEST_API_KEY }}"

View File

@ -4,7 +4,7 @@
"description": "", "description": "",
"scripts": { "scripts": {
"test:suite": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false", "test:suite": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false",
"test:load": "artillery run load-test.yml", "test:load": "artillery run --output ./load-test-results/test-1-5/assets/test-run-report.json load-test.yml",
"test:scrape": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathPattern=tests/scrape.test.ts", "test:scrape": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathPattern=tests/scrape.test.ts",
"test:crawl": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathPattern=tests/crawl.test.ts" "test:crawl": "npx jest --detectOpenHandles --forceExit --openHandlesTimeout=120000 --watchAll=false --testPathPattern=tests/crawl.test.ts"
}, },