crawl load tests 6 and 7

2024-05-22 18:20:24 -03:00 · 2024-05-22 18:20:24 -03:00 · aa6df4305e
commit aa6df4305e
parent 73f1d09d39
9 changed files with 53287 additions and 692 deletions
--- a/apps/api/src/scraper/WebScraper/single_url.ts
+++ b/apps/api/src/scraper/WebScraper/single_url.ts
@ -193,7 +193,7 @@ function getScrapingFallbackOrder(defaultScraper?: string) {
    }
  });
-  const defaultOrder = ["scrapingBee", "fire-engine", "playwright", "scrapingBeeLoad", "fetch"];
+  const defaultOrder = ["fire-engine", "scrapingBee", "playwright", "scrapingBeeLoad", "fetch"];
  const filteredDefaultOrder = defaultOrder.filter((scraper: typeof baseScrapers[number]) => availableScrapers.includes(scraper));
  const uniqueScrapers = new Set(defaultScraper ? [defaultScraper, ...filteredDefaultOrder, ...availableScrapers] : [...filteredDefaultOrder, ...availableScrapers]);
  const scrapersInOrder = Array.from(uniqueScrapers);
--- a/apps/test-suite/load-test-results/test-run-report.json
+++ b/apps/test-suite/load-test-results/test-run-report.json
--- a/apps/test-suite/load-test-results/tests-6-7/assets/metrics-fire-engine-2-test-7.png
+++ b/apps/test-suite/load-test-results/tests-6-7/assets/metrics-fire-engine-2-test-7.png
--- a/apps/test-suite/load-test-results/tests-6-7/assets/metrics-fire-engine-test-7.png
+++ b/apps/test-suite/load-test-results/tests-6-7/assets/metrics-fire-engine-test-7.png
--- a/apps/test-suite/load-test-results/tests-6-7/assets/metrics-test-6.png
+++ b/apps/test-suite/load-test-results/tests-6-7/assets/metrics-test-6.png
--- a/apps/test-suite/load-test-results/tests-6-7/assets/metrics-test-7.png
+++ b/apps/test-suite/load-test-results/tests-6-7/assets/metrics-test-7.png
--- a/apps/test-suite/load-test-results/tests-6-7/load-test-6.md
+++ b/apps/test-suite/load-test-results/tests-6-7/load-test-6.md
@ -0,0 +1,104 @@
 # Load Testing Crawl Routes - Test #6
 ## Summary
 The load test was conducted with a duration of 10 minutes and an arrival rate of 10 requests per second. The system handled the load well, with no failed requests. The average response time was 838.1 ms, with a peak response time of 1416 ms. Further analysis is recommended to optimize response times and assess the impact of higher loads.
 ## Table of Contents
 - [Load Testing Crawl Routes - Test #6](#load-testing-crawl-routes---test-6)
  - [Summary](#summary)
  - [Table of Contents](#table-of-contents)
  - [Test environment](#test-environment)
    - [Machines](#machines)
  - [Load Test Configuration](#load-test-configuration)
    - [Configuration](#configuration)
    - [Results](#results)
    - [Metrics](#metrics)
  - [Conclusions and Next Steps](#conclusions-and-next-steps)
    - [Conclusions](#conclusions)
    - [Next Steps](#next-steps)
 ## Test environment
 ### Machines
 | Machine | Size/CPU | Status |
 |---|---|---|
 | 06e825d0da2387 mia (worker) | performance-cpu-1x@2048MB | always on |
 | 178134db566489 mia (worker) | performance-cpu-1x@2048MB | always on |
 | 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB | always on |
 | e286de4f711e86 mia (app) | performance-cpu-1x@2048MB | always on |
 Other app machines with autoscaling shouldn't start during crawl tests.
 ---
 ## Load Test Configuration
 ### Configuration
 ```yml
 # load-test.yml
  - duration: 10
    arrivalRate: 10
 ```
 ### Results
 Date: 16:00:06(-0300)
 | Metric                                      | Value   |
 |---------------------------------------------|---------|
 | http.codes.200                              | 200     |
 | http.downloaded_bytes                       | 0       |
 | http.request_rate                           | 10/sec  |
 | http.requests                               | 200     |
 | http.response_time.min                      | 687     |
 | http.response_time.max                      | 1416    |
 | http.response_time.mean                     | 838.1   |
 | http.response_time.median                   | 788.5   |
 | http.response_time.p95                      | 1085.9  |
 | http.response_time.p99                      | 1274.3  |
 | http.responses                              | 200     |
 | vusers.completed                            | 100     |
 | vusers.created                              | 100     |
 | vusers.created_by_name.Crawl a URL          | 100     |
 | vusers.failed                               | 0       |
 | vusers.session_length.min                   | 11647.5 |
 | vusers.session_length.max                   | 12310   |
 | vusers.session_length.mean                  | 11812.7 |
 | vusers.session_length.median                | 11734.2 |
 | vusers.session_length.p95                   | 11971.2 |
 | vusers.session_length.p99                   | 12213.1 |
 ### Metrics
 ![](./assets/metrics-test-6.png)
 **CPU Utilization:**
 - **App machines:** Less than 2.3% CPU utilization with no changes in memory utilization.
 - **Worker machines:** High CPU utilization for over 4 minutes and 45 seconds, with 56% (peaking at 75.8%) on 178134db566489 and 40% (peaking at 62.7%) on 06e825d0da2387.
 **Memory Utilization:**
 - **App machines:** No relevant changes during the tests.
 - **Worker machines:** 
  - 06e825d0da2387: From 359MiB to over 388MiB during 4 minutes and 45 seconds (peaking at 461MiB).
  - 178134db566489: From 366MiB to over 449MiB during 4 minutes and 45 seconds (peaking at 523MiB).
 ---
 ## Conclusions and Next Steps
 ### Conclusions
 1. **Performance:** The system handled 200 requests with a mean response time of 838.1 ms. There were no failed requests.
 2. **Response Times:** The peak response time was 1416 ms, indicating that while the system handled the load, there is room for optimization.
 ### Next Steps
 1. **Higher Load Testing:** Conduct further testing with higher loads to assess the system's performance under increased stress.
 2. **Optimize Response Times:** Investigate and implement strategies to reduce the peak response time from 1416 ms. This could involve optimizing database queries, improving server configurations, or enhancing caching mechanisms.
 3. **Scalability Assessment:** Assess the system's scalability by gradually increasing the load beyond the current configuration to determine its breaking point and plan for necessary infrastructure upgrades.
 By following these steps, we can further enhance the system's performance and reliability under varying load conditions.
--- a/apps/test-suite/load-test-results/tests-6-7/load-test-7.md
+++ b/apps/test-suite/load-test-results/tests-6-7/load-test-7.md
@ -0,0 +1,125 @@
 # Load Testing Crawl Routes - Test #7
 ## Summary
 This load test, conducted over a period of 7 minutes with an extended observation, aimed to evaluate the system's performance under variable loads. Although the system was able to queue all requests successfully and no requests failed, the test was prematurely terminated due to a critical failure in the fire-engine machines after 22 minutes. This incident revealed significant vulnerabilities in handling sustained loads, specifically related to resource management.
 ## Table of Contents
 - [Load Testing Crawl Routes - Test #7](#load-testing-crawl-routes---test-7)
  - [Summary](#summary)
  - [Table of Contents](#table-of-contents)
  - [Test environment](#test-environment)
    - [Machines](#machines)
  - [Load Test Configuration](#load-test-configuration)
    - [Configuration](#configuration)
    - [Results](#results)
    - [Metrics](#metrics)
  - [Conclusions and Next Steps](#conclusions-and-next-steps)
    - [Conclusions](#conclusions)
    - [Next Steps](#next-steps)
 ## Test environment
 ### Machines
 | Machine | Size/CPU | Status |
 |---|---|---|
 | 06e825d0da2387 mia (worker) | performance-cpu-1x@2048MB | always on |
 | 178134db566489 mia (worker) | performance-cpu-1x@2048MB | always on |
 | 73d8dd909c1189 mia (app) | performance-cpu-1x@2048MB | always on |
 | e286de4f711e86 mia (app) | performance-cpu-1x@2048MB | always on |
 fire-engine machines:
 | Machine | Size/CPU | Status |
 |---|---|---|
 | 2874d0db0e5258 mia app | performance-cpu-2x@4096MB | always on |
 | 48ed194f7de258 mia app | performance-cpu-2x@4096MB | always on |
 | 56830d45f70218 sjc app | performance-cpu-2x@4096MB | initialized during the test |
 ---
 ## Load Test Configuration
 ### Configuration
 ```yml
 phases:
  - duration: 60
    arrivalRate: 1  # Initial load
  - duration: 120
    arrivalRate: 2  # Increased load
  - duration: 180
    arrivalRate: 3  # Peak load
  - duration: 60
    arrivalRate: 1  # Cool down
 ```
 using fire-engine as default scraping strategy
 ```yml
 NUM_WORKERS_PER_QUEUE=8
 ```
 ### Results
 Date: 17:31:33(-0300)
 | Metric                                      | Value   |
 |---------------------------------------------|---------|
 | http.codes.200                              | 1800    |
 | http.downloaded_bytes                       | 0       |
 | http.request_rate                           | 3/sec   |
 | http.requests                               | 1800    |
 | http.response_time.min                      | 711     |
 | http.response_time.max                      | 5829    |
 | http.response_time.mean                     | 849.2   |
 | http.response_time.median                   | 804.5   |
 | http.response_time.p95                      | 1043.3  |
 | http.response_time.p99                      | 1274.3  |
 | http.responses                              | 1800    |
 | vusers.completed                            | 900     |
 | vusers.created                              | 900     |
 | vusers.created_by_name.Crawl a URL          | 900     |
 | vusers.failed                               | 0       |
 | vusers.session_length.min                   | 11637   |
 | vusers.session_length.max                   | 16726.1 |
 | vusers.session_length.mean                  | 11829.5 |
 | vusers.session_length.median                | 11734.2 |
 | vusers.session_length.p95                   | 12213.1 |
 | vusers.session_length.p99                   | 12213.1 |
 ### Metrics
 ![](./assets/metrics-fire-engine-test-7.png)
 ![](./assets/metrics-fire-engine-2-test-7.png)
 **CPU Utilization:**
 - **Fire-engine mia machines:** Reached 100% after 22 minutes of processing the queue. The sjc machine was not requested during the test.
 - **Worker machines:** Maintained CPU utilization above 71% during the load testing time. Memory utilization was unaffected.
 **Memory Utilization:**
 - **Fire-engine mia machines:** utilization reached 100% after 22 minutes of processing the queue.
 ![](./assets/metrics-test-7.png)
 ---
 ## Conclusions and Next Steps
 ### Conclusions
 1. **Request Handling:** The system effectively managed to queue all requests, demonstrating its capability to handle the initial setup of traffic without any failures.
 2. **Critical Failures:** The abrupt failure of the fire-engine machines part-way through the test underscores a significant stability issue, directly impacting the ability to continue operations under load.
 3. **Resource Management Deficiencies:** The failure was linked to insufficient resource management, particularly memory handling, which necessitates immediate attention to prevent future disruptions.
 ### Next Steps
 1. **Increase Workers per Machine:** The number of workers per worker machine will be increased from 8 to 12. This change aims to enhance the processing capability of each machine, potentially reducing response times and handling larger volumes of requests more efficiently.
 2. **Implement Autoscaling:** Introduce autoscaling capabilities to dynamically adjust the number of active machines based on the current load. This will help in maintaining optimal performance and prevent system overloads by automatically scaling resources up during peak demands and down during low usage periods.
 3. **Enhanced Resource Management:** With the increase in workers and the implementation of autoscaling, it is crucial to optimize resource management strategies. This involves improving memory handling and cleanup processes to ensure that resource allocation and recovery are efficient and effective, particularly under sustained high loads.
 4. **Extended Duration Testing:** Conduct further tests with extended durations to evaluate the impact of the increased number of workers and autoscaling on system stability and performance. These tests should focus on assessing how well the system sustains operational efficiency over longer periods and under varying load conditions.
 5. **Monitor and Optimize:** Continuously monitor system performance during the new tests, particularly focusing on the effects of the increased worker count and autoscaling. Use the gathered data to optimize configurations and troubleshoot any new issues that arise, ensuring the system is fine-tuned for both high performance and reliability.
 By following these steps, we can further enhance the system's performance and reliability under varying load conditions.
--- a/apps/test-suite/load-test.yml
+++ b/apps/test-suite/load-test.yml
@ -3,21 +3,17 @@ config:
  http:
    timeout: 30
  phases:
-    # /scrape
+    - duration: 60
-    # - duration: 60
+      arrivalRate: 1  # Initial load
-    #   arrivalRate: 10  # Initial load
+    - duration: 120
-    # - duration: 120
+      arrivalRate: 2  # Increased load
-    #   arrivalRate: 20  # Increased load
+    - duration: 180
-    # - duration: 180
+      arrivalRate: 3  # Peak load
-    #   arrivalRate: 30  # Peak load
+    - duration: 60
-    # - duration: 60
+      arrivalRate: 1  # Cool down
    #   arrivalRate: 10  # Cool down
    # /crawl
    - duration: 10
      arrivalRate: 1
  defaults:
    headers:
-      Authorization: "Bearer {{ $env.TEST_API_KEY }}"
+      Authorization: "Bearer YOUR_API_KEY"
 scenarios:
  # - name: Scrape a URL
  #   flow:
@ -36,7 +32,7 @@ scenarios:
      - post:
          url: "/crawl"
          json:
-            url: "https://spider.cloud"
+            url: "https://rsseau.fr"
            crawlerOptions:
              limit: 100
            pageOptions: