网页抓取 Google 搜索结果:教程

2026/4/27

网页抓取 Google 搜索结果:教程(2026)
Artur Hvalei's Profile Image
Artur Hvalei

Technical Support Specialist, Octo Browser

Google SERP 抓取可以帮助你了解用户实际看到哪些网站和内容、哪些关键词带来流量,以及哪些摘要格式表现最佳。在本文中,我们将介绍数据收集方法,按准确性和复杂性进行比较,并突出适用于从基础监控到大规模分析等任务的最佳解决方案。另外,你还会找到一个可直接使用的抓取脚本。

内容

保持匿名,充分利用多账户功能,借助市面上最优质的反检测浏览器实现您的目标。

为什么抓取 Google 搜索结果

Google 是一个全球性的消费者需求和竞争对手活动数据库。分析搜索引擎结果页(SERP)可以提供关键洞察:网站对关键词的实际排名、竞争对手的标题和元描述、富摘要的存在及其格式,以及来自“People Also Ask”模块和搜索建议的数据。这些数据可帮助公司和营销人员:

  • 跟踪排名和可见度:分析 SEO 表现并监控随时间的进展。

  • 研究竞争对手:了解他们的关键词和内容策略,并识别市场空白。

  • 发现细分领域和趋势:找到新的关键词和查询,以创建相关内容。

  • 分析广告:研究竞争对手的广告、标题、文案和策略。

因此,这些洞察对于 SEO 专家、营销人员、分析师、企业主以及在线营销工具开发者最具价值。

数据收集工具和方法

1. 第三方 SERP API(付费服务)

这些是专门的 API,负责处理数据收集中的所有技术复杂性。你发送请求后,会收到包含搜索结果、广告和其他元素的结构化 Json。服务提供商会管理代理轮换、解决 CAPTCHA,并渲染 JavaScript,交付可直接使用的数据。

  • 优点:易于集成、可扩展、由服务商处理封禁问题、数据结构清晰。

  • 缺点:规模化成本高(例如,Bright Data 起价为每 1,000 次请求 1 美元)、供应商锁定、处理延迟。

2. 官方 Google API(Custom Search JSON API)

这是一种通过将 Google Search 嵌入你的网站来访问搜索数据的合法方式。然而,它在本质上是不同的,因为它不会模拟真实用户搜索,也不会返回带有广告和动态元素的“实时” SERP。结果通常不够及时,并且结构也不同。

  • 优点:合法、稳定、易于使用,包含免费额度(每天 100 次请求)。

  • 缺点:不会返回真实的 SERP 数据。该 API 提供的是来自一组有限预定义站点的结构化结果,而不是用户看到的真实搜索页面。它有配额和限制,因此不适合大规模排名跟踪或竞争分析。

3. 直接 HTTP 请求(抓取)

这种方法模拟标准浏览器请求。你的脚本(Python、Node.js 等)向 Google Search URL 发送 GET 请求并接收 HTML 代码,然后需要对其进行解析。为了避免被检测,你需要使用 代理 并模拟和轮换 浏览器头

  • 优点:对流程有完全控制、成本低(只需要服务器和代理)、灵活性高。

  • 缺点:复杂且脆弱。Google 会积极阻止非浏览器请求,因此需要持续解决验证码并轮换指纹。即使是带有 TLS 和头部模拟的高级方案也可能失败。Google 布局的任何变化都可能使你的解析器失效。

4. 浏览器自动化(Puppeteer、Playwright、Selenium)

这种方法模拟真实用户行为:打开浏览器、输入查询、点击和滚动。它能完美模仿人类交互,但需要更多计算资源。像 Puppeteer 这样的库可以控制 Chrome 实例,从动态页面收集数据。

  • 优点:可以绕过复杂防护、执行 JavaScript、数据准确性最高(你抓取到的就是用户所见)、灵活且强大。

  • 缺点:资源消耗高(CPU、内存)、比直接 HTTP 请求更慢、对于大规模项目来说配置和维护复杂。

为什么代理和反检测浏览器至关重要

Google 会主动保护其数据,并积极阻止自动化请求。两大主要障碍是 验证码 和基于 IP 的封禁,这些通常在请求超过限制时触发。

  • 代理充当中介,隐藏你的真实 IP 地址。核心策略是 代理轮换,即定期更换 IP,以模拟来自不同用户的流量并避免触发反机器人系统。

  • 反检测浏览器解决的是一个更高级的问题:掩盖数字指纹。它们允许你伪装 User-Agent、屏幕分辨率、媒体设备、GPU 设置等环境参数。这会为每个新配置文件创建一个逼真的指纹,这对于绕过那些分析设备指纹的系统至关重要。将反检测浏览器与高质量代理结合使用,可以创建成千上万个独特的“用户”,并大规模收集数据。

Octo Browser 在 Google SERP 抓取中的能力

Octo Browser 包含一个 API,可实现数据收集过程的完全自动化。Octo 还提供了带有请求示例的详细 API 文档

文档中包含用于集成 Puppeteer、Playwright 和 Selenium 的代码片段,这些工具通过 CDP 协议控制浏览器。

实用建议

  1. 仔细研究官方 API 文档

  2. 查看与 API 使用相关的 常见问题

  3. 阅读关于使用 Octo API 的 详细文章

  4. Octo Browser 中的 API 请求会按订阅级别限制,但可以提高。使用检查响应头中 API 限额的函数。忽略 HTTP 429 错误可能会延长封禁时长。如果你在一个账户下使用多个设备进行自动化,请实现集中式请求跟踪(例如使用 Redis)。

  5. 不要使用未打补丁的自动化库版本,因为它们包含可被检测到的漏洞。对于 Puppeteer/Playwright,请使用 rebrowser 补丁。对于 Selenium,请使用 undetected-chromedriver。

  6. 使用最能模拟人类行为的函数和库:鼠标点击、悬停、光标移动、输入、滚动、导航流程以及随机动作。

  7. 使用本地缓存保存配置文件,以减少代理流量。这可以通过在创建配置文件时传入 "local_cache": true 来实现,也可以通过 --disk-cache-dir 使用共享缓存目录,例如 flags:["--disk-cache-dir=C:/Cache"]

  8. 在配置文件设置中限制图片加载,以节省代理流量。可在创建配置文件时设置 "images_load_limit": 10240,将图片限制为不大于 10,240 字节。

抓取方法比较

方法

成本

复杂度

封禁风险

数据质量

付费 SERP API

高(起价每 1,000 次请求 1 美元)

极低

官方 API

低 / 免费

低(不是真实 SERP 数据)

HTTP 请求

中等(需要代理)

非常高

使用反检测浏览器进行自动化

中等(需要订阅和代理)

中等

极低

最高

用于抓取 Google SERP 的现成脚本

下面是一个可与 Octo Browser API 配合使用的抓取脚本示例。你可以将此脚本或其中的一部分作为构建完整项目的起点,并根据需要进行调整。

  1. 下载并安装 VS Code。

  2. 下载并安装 Node.js。

  3. 在方便的位置创建一个文件夹,并例如将其命名为 octo_scraper

  4. 在 VS Code 中打开这个文件夹。

  5. 创建一个 .js 文件。最好根据其功能命名,例如 google_scraping.js

  6. 将脚本代码粘贴到文件中。

  7. 在代码中的 config 变量里,把你的代理添加到 proxies 数组中。

  8. 在同一位置,将你的搜索查询添加到 google_search_queries 数组中。在这个脚本示例中,查询数量必须大于或等于代理数量。你可以轻松修改抓取逻辑以适应你的需求。

In the code, in the config variable, add your proxies to the proxies array.

注意:每个数组元素都必须用引号括起来。元素之间用逗号分隔。

  1. 打开终端并运行命令 npm i rebrowser-puppeteer axios fkill 来安装 Node.js 依赖。

Open the terminal and run the command npm i rebrowser-puppeteer axios fkill to install the Node.js dependencies.
  1. . 如果 VS Code 显示错误,请以管理员身份打开 Windows PowerShell,输入命令 Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned,然后确认。接着重复上一步。

  2. . 启动 Octo Browser。

  3. . 在 Visual Studio 中运行程序(Ctrl/Cmd + F5),等待脚本完成。

  4. . 抓取器会为每个添加的代理创建 一次性配置文件,并按顺序执行指定查询。脚本会模拟真实用户行为,以绕过 Google 的反欺诈系统。

  5. . 你可以在调试控制台中监控过程。如果出现验证码,脚本会关闭该配置文件并启动一个新的。

You can monitor the process in the debug console. If a CAPTCHA appears, the script will close the profile and launch a new one.
  1. . 搜索结果将保存在项目目录中的 search_results 文件夹里。

Search results will be saved in the search_results folder in the project directory.

脚本代码

const axios = require('axios');
const puppeteer = require('rebrowser-puppeteer');
const fs = require('fs').promises;
const path = require('path');

const config = {
    octo_local_api_base_url: `http://localhost:58888/api/profiles`, //change port if you don't use default 58888
    headless_mode: false,
    proxies: [
        "socks5://login:password@127.0.0.1:50000", //paste your proxies
        "socks5://login:password@127.0.0.1:50000"
    ],
    google_search_queries: ["nodejs", "sidwudraq", "arch linux"] //change queries
}

// ============= HELPER FUNCTIONS =============
function random_range(min, max) {
    return min + Math.random() * (max - min);
}

async function sleep(seconds) {
    return new Promise(resolve => setTimeout(resolve, seconds * 1000));
}

async function human_delay(min_ms = 50, max_ms = 200) {
    const mu = Math.log((min_ms + max_ms) / 2);
    const sigma = random_range(0.3, 0.6);
    let delay = Math.exp(mu + sigma * (Math.random() - 0.5) * 2);
    delay = Math.min(max_ms, Math.max(min_ms, delay));
    await new Promise(resolve => setTimeout(resolve, delay));
}

async function kill_browser(pid) {
    const { default: fkill } = await import('fkill');
    await fkill(pid, { force: true });
    console.log(`✅ Process with PID ${pid} successfully stopped.`);
}

// ============= BEZIER CURVES FOR HUMAN-LIKE MOVEMENT =============
function bezier_curve(t, p0, p1, p2, p3) {
    const mt = 1 - t;
    const mt2 = mt * mt;
    const t2 = t * t;

    const x = mt2 * mt * p0.x + 3 * mt2 * t * p1.x + 3 * mt * t2 * p2.x + t2 * t * p3.x;
    const y = mt2 * mt * p0.y + 3 * mt2 * t * p1.y + 3 * mt * t2 * p2.y + t2 * t * p3.y;

    return { x, y };
}

function generate_bezier_points(start, end) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const angle = Math.atan2(end.y - start.y, end.x - start.x);

    const deviation = random_range(distance * 0.2, distance * 0.5);
    const angle_variation = random_range(-Math.PI / 3, Math.PI / 3);

    const p1 = {
        x: start.x + Math.cos(angle + angle_variation) * deviation,
        y: start.y + Math.sin(angle + angle_variation) * deviation
    };

    const p2 = {
        x: end.x - Math.cos(angle - angle_variation) * deviation,
        y: end.y - Math.sin(angle - angle_variation) * deviation
    };

    return [start, p1, p2, end];
}

function generate_trajectory(start, end, steps = null) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const actual_steps = steps || Math.max(20, Math.min(100, Math.floor(distance / 3)));

    const bezier_points = generate_bezier_points(start, end);
    const trajectory = [];

    for (let i = 0; i <= actual_steps; i++) {
        const t = i / actual_steps;
        const eased_t = Math.pow(t, 1 + Math.random() * 0.3);
        const point = bezier_curve(eased_t, ...bezier_points);

        const jitter = {
            x: (Math.random() - 0.5) * random_range(0.5, 2),
            y: (Math.random() - 0.5) * random_range(0.5, 2)
        };

        trajectory.push({
            x: Math.round(point.x + jitter.x),
            y: Math.round(point.y + jitter.y)
        });
    }

    return trajectory;
}

// ============= HUMAN-LIKE CLICK =============
async function human_click(page, selector_or_element, options = {}) {
    const {
        move_speed = 1.0,
        random_overshoot = true,
        click_delay = null,
        force_visible = true
    } = options;

    const element = typeof selector_or_element === 'string'
        ? await page.$(selector_or_element)
        : selector_or_element;

    if (!element) {
        throw new Error(`Element not found: ${selector_or_element}`);
    }

    if (force_visible) {
        await element.scrollIntoView();
        await human_delay(100, 300);
    }

    const current_mouse = await page.evaluate(() => ({
        x: window.mouseX || window.innerWidth / 2,
        y: window.mouseY || window.innerHeight / 2
    }));

    const box = await element.boundingBox();
    if (!box) throw new Error('Could not get element coordinates');

    const target = {
        x: box.x + random_range(box.width * 0.2, box.width * 0.8),
        y: box.y + random_range(box.height * 0.2, box.height * 0.8)
    };

    if (random_overshoot && Math.random() < 0.3) {
        const overshoot_x = (Math.random() - 0.5) * random_range(10, 30);
        const overshoot_y = (Math.random() - 0.5) * random_range(10, 30);

        const overshoot_target = {
            x: target.x + overshoot_x,
            y: target.y + overshoot_y
        };

        const overshoot_trajectory = generate_trajectory(current_mouse, overshoot_target);
        for (const point of overshoot_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }

        const return_trajectory = generate_trajectory(overshoot_target, target);
        for (const point of return_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }
    } else {
        const trajectory = generate_trajectory(current_mouse, target);
        for (const point of trajectory) {
            await page.mouse.move(point.x, point.y);
            const delay = Math.max(1, Math.min(5, 10 / move_speed));
            await human_delay(delay * 0.5, delay * 1.5);
        }
    }

    const final_delay = click_delay !== null ? click_delay : random_range(80, 250);
    await human_delay(final_delay * 0.8, final_delay * 1.2);

    if (Math.random() < 0.15) {
        const micro_offset_x = (Math.random() - 0.5) * random_range(1, 4);
        const micro_offset_y = (Math.random() - 0.5) * random_range(1, 4);
        await page.mouse.move(target.x + micro_offset_x, target.y + micro_offset_y);
        await human_delay(10, 30);
    }

    await page.mouse.down();
    await human_delay(random_range(50, 150));

    if (Math.random() < 0.2) {
        await page.mouse.move(
            target.x + (Math.random() - 0.5) * 2,
            target.y + (Math.random() - 0.5) * 2
        );
    }

    await page.mouse.up();
    await human_delay(50, 150);

    await page.evaluate(({ x, y }) => {
        window.mouseX = x;
        window.mouseY = y;
    }, target);

    return { success: true, position: target };
}

// ============= HUMAN-LIKE TEXT INPUT =============
async function human_type(page, selector, text, options = {}) {
    const {
        typing_speed = null,
        random_mistakes = false,
        backspace_fix = false
    } = options;

    const element = typeof selector === 'string'
        ? await page.$(selector)
        : selector;

    if (!element) {
        throw new Error(`Element not found: ${selector}`);
    }

    await human_click(page, element, { pre_hover: true });

    // Clear the field
    await page.keyboard.down('Control');
    await page.keyboard.press('a');
    await page.keyboard.up('Control');
    await page.keyboard.press('Backspace');
    await human_delay(100, 200);

    for (let i = 0; i < text.length; i++) {
        const char = text[i];

        let delay;
        if (typing_speed) {
            delay = typing_speed;
        } else {
            const base_delay = random_range(50, 200);
            const is_space = char === ' ';
            delay = is_space ? base_delay * 2 : base_delay;
        }

        if (random_mistakes && Math.random() < 0.02) {
            const wrong_char = String.fromCharCode(
                char.charCodeAt(0) + (Math.random() > 0.5 ? 1 : -1)
            );
            await page.keyboard.type(wrong_char, { delay: delay * 0.5 });
            await human_delay(100, 200);

            if (backspace_fix) {
                await page.keyboard.press('Backspace');
                await human_delay(50, 100);
            } else {
                continue;
            }
        }

        await page.keyboard.type(char, { delay: delay });
    }

    await human_delay(100, 300);
    return true;
}

// ============= HUMAN-LIKE SCROLL =============
async function human_scroll(page, options = {}) {
    const {
        scrolls = null,
        min_scroll = 300,
        max_scroll = 800
    } = options;

    const num_scrolls = scrolls || Math.floor(random_range(3, 8));

    for (let i = 0; i < num_scrolls; i++) {
        const scroll_distance = random_range(min_scroll, max_scroll);
        await page.evaluate((distance) => {
            window.scrollBy({
                top: distance,
                behavior: 'smooth'
            });
        }, scroll_distance);

        await human_delay(800, 2000);

        if (Math.random() < 0.2) {
            const back_distance = random_range(100, 300);
            await page.evaluate((distance) => {
                window.scrollBy({
                    top: -distance,
                    behavior: 'smooth'
                });
            }, back_distance);
            await human_delay(500, 1000);
        }
    }
}

// ============= DISTRIBUTE QUERIES AMONG PROFILES =============
function distribute_queries(queries, numProxies) {
    const total = queries.length;
    const baseCount = Math.floor(total / numProxies);
    const remainder = total % numProxies;

    const batches = [];
    let start = 0;
    for (let i = 0; i < numProxies; i++) {
        const count = baseCount + (i < remainder ? 1 : 0);
        const batch = queries.slice(start, start + count);
        batches.push(batch);
        start += count;
    }
    return batches;
}

// ============= PARSE GOOGLE RESULTS =============
async function parse_search_results(page, query) {
    return await page.evaluate((query) => {
        const results = [];

        // Find all result containers
        const organic_results = document.querySelectorAll('div.tF2Cxc');

        console.log(`Found ${organic_results.length} result containers`);

        organic_results.forEach((result, index) => {
            try {
                // Title
                const title_element = result.querySelector('h3.LC20lb.MBeuO.DKV0Md');
                const title = title_element ? title_element.innerText : '';

                // Link
                let link_element = result.querySelector('a');
                let link = link_element ? link_element.href : '';

                // Clean Google redirect
                if (link && link.includes('/url?q=')) {
                    const url_match = link.match(/\/url\?q=([^&]+)/);
                    if (url_match) {
                        link = decodeURIComponent(url_match[1]);
                    }
                }

                // Description
                let desc_element = result.querySelector('div.VwiC3b.yXK7lf.p4wth.r025kc.Hdw6tb');
                let description = desc_element ? desc_element.innerText : '';

                // Fallback selector
                if (!description) {
                    const fallback_desc = result.querySelector('div.VwiC3b');
                    description = fallback_desc ? fallback_desc.innerText : '';
                }

                if (title && title.trim() && link) {
                    results.push({
                        position: results.length + 1,
                        title: title.trim(),
                        link: link,
                        description: description.trim().substring(0, 500)
                    });
                }
            } catch (error) {
                console.error(`Error parsing result ${index}:`, error);
            }
        });

        console.log(`Successfully parsed ${results.length} results`);

        return {
            query: query,
            timestamp: new Date().toISOString(),
            total_results: results.length,
            results: results
        };
    }, query);
}

// ============= SAVE RESULTS TO FILE =============
async function save_results_to_file(query, data, is_appending = false) {
    const filename = `${query.replace(/[^a-z0-9]/gi, '_').toLowerCase()}_results.txt`;
    const filepath = path.join(__dirname, 'search_results', filename);

    // Create directory if needed
    await fs.mkdir(path.join(__dirname, 'search_results'), { recursive: true });

    let content = '';

    if (!is_appending) {
        content += `=== GOOGLE SEARCH RESULTS ===\n`;
        content += `Query: ${data.query}\n`;
        content += `Time: ${data.timestamp}\n`;
        content += `Total results: ${data.total_results}\n`;
        content += `${'='.repeat(80)}\n\n`;
    }

    for (const result of data.results) {
        content += `${result.position}. ${result.title}\n`;
        content += `   URL: ${result.link}\n`;
        content += `   Description: ${result.description.substring(0, 200)}...\n`;
        content += `   ${'-'.repeat(80)}\n`;
    }

    content += `\n📄 Page saved: ${new Date().toISOString()}\n`;
    content += `${'='.repeat(80)}\n\n`;

    await fs.writeFile(filepath, content, { flag: is_appending ? 'a' : 'w' });
    console.log(`✅ Results saved to: ${filepath}`);
    return filepath;
}

// ============= OPEN RANDOM RESULT PAGE =============
async function open_random_result(page, results) {
    if (!results || results.length === 0) {
        console.log('No results to open');
        return false;
    }

    // Choose a random result (usually not the first)
    let result_index = 0;
    if (results.length > 1) {
        result_index = Math.random() < 0.7
            ? Math.floor(random_range(1, Math.min(5, results.length)))
            : Math.floor(random_range(0, results.length));
    }

    const selected_result = results[result_index];
    console.log(`Opening result ${result_index + 1}: ${selected_result.title.substring(0, 50)}...`);

    try {
        // Check for captcha before opening
        const has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected, not opening result');
            return false;
        }

        // Open in a new tab
        const new_page = await page.browser().newPage();
        await new_page.goto(selected_result.link, {
            waitUntil: 'domcontentloaded',
            timeout: 20000
        });
        await human_delay(2000, 4000);

        // Check for captcha on the opened page
        const page_has_captcha = await check_for_captcha(new_page);
        if (page_has_captcha) {
            console.log('🚫 Captcha detected on opened page');
            await new_page.close();
            return false;
        }

        // Scroll on the opened page
        await human_scroll(new_page, { scrolls: random_range(2, 5) });
        await human_delay(1500, 3000);

        // Close the tab
        await new_page.close();
        console.log(`✅ Page viewed and closed`);

        return true;
    } catch (error) {
        console.log(`❌ Error opening page: ${error.message}`);
        return false;
    }
}

// ============= CAPTCHA CHECK =============
async function check_for_captcha(page) {
    const captcha_selectors = [
        '#captcha-form',
        '.g-recaptcha',
        'iframe[src*="recaptcha"]',
        'form[action*="captcha"]',
        '#captcha',
        '.captcha',
        'div[jsname="Jai8Rc"]',
        'form[action*="sorry"]'
    ];

    for (const selector of captcha_selectors) {
        const element = await page.$(selector);
        if (element) return true;
    }

    const current_url = page.url();
    if (current_url.includes('sorry') || current_url.includes('captcha')) {
        return true;
    }

    const page_text = await page.evaluate(() => document.body.innerText);
    const captcha_keywords = ['captcha', 'robot', 'verify', 'unusual traffic', 'confirm', 'not a robot'];

    for (const keyword of captcha_keywords) {
        if (page_text.toLowerCase().includes(keyword)) {
            return true;
        }
    }

    return false;
}

// ============= MAIN SEARCH FUNCTION =============
async function google_search_human(page, query, results_data, retry_count = 0) {
    const max_retries = 2;

    console.log(`🔍 Searching: ${query}${retry_count > 0 ? ` (attempt ${retry_count + 1})` : ''}`);

    try {
        // Go to Google homepage
        await page.goto('https://www.google.com', {
            waitUntil: 'domcontentloaded',
            timeout: 30000
        });
        await human_delay(1000, 2000);

        // Check for captcha
        let has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected!');
            return { error: 'captcha', query: query };
        }

        // Accept cookies if present
        try {
            const cookie_button = await page.$('#L2AGLb');
            if (cookie_button) {
                await human_click(page, cookie_button);
                console.log('✅ Cookies accepted');
                await human_delay(500, 1000);
            }
        } catch (error) {
            console.log('No cookie button');
        }

        // Enter search query
        const search_input = await page.$('textarea[name="q"], input[name="q"]');
        if (!search_input) {
            throw new Error('Search input not found');
        }

        await human_type(page, search_input, query, {
            random_mistakes: true,
            backspace_fix: true
        });

        await human_delay(500, 1000);

        // Check for captcha before submitting
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected before submission!');
            return { error: 'captcha', query: query };
        }

        // Press Enter
        console.log('📤 Submitting query...');

        await Promise.all([
            page.waitForNavigation({
                waitUntil: 'domcontentloaded',
                timeout: 15000
            }).catch(e => {
                console.log(`⚠️ Navigation warning: ${e.message}`);
                return null;
            }),
            page.keyboard.press('Enter'),
            human_delay(500, 1000)
        ]);

        // Check for captcha after search
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected after search!');
            return { error: 'captcha', query: query };
        }

        console.log('⏳ Waiting for results to load...');

        // Wait for results to appear
        try {
            await page.waitForSelector('div.tF2Cxc', {
                timeout: 15000,
                visible: true
            });
            console.log('✅ Results loaded');
        } catch (error) {
            console.log('⚠️ Results not found, continuing...');
        }

        await human_delay(1500, 2500);

        // Scroll through results
        console.log('📜 Scrolling through results...');
        await human_scroll(page, { scrolls: random_range(4, 8) });

        // Parse results
        console.log('📊 Parsing results...');
        const parsed_results = await parse_search_results(page, query);

        if (parsed_results.results.length === 0 && retry_count < max_retries) {
            console.log('⚠️ No results found, retrying...');
            await human_delay(2000, 3000);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        // Save results
        const is_appending = results_data.has_results;
        await save_results_to_file(query, parsed_results, is_appending);
        results_data.has_results = true;
        results_data.all_results.push(...parsed_results.results);

        // Open 1-2 random result pages
        if (parsed_results.results.length > 0) {
            const pages_to_open = Math.floor(random_range(1, Math.min(3, parsed_results.results.length)));
            console.log(`📖 Opening ${pages_to_open} result pages...`);

            for (let i = 0; i < pages_to_open; i++) {
                await open_random_result(page, parsed_results.results);
                await human_delay(1000, 2000);

                // Return to results page
                const current_url = page.url();
                if (!current_url.includes('google.com/search')) {
                    try {
                        await page.goBack({ waitUntil: 'domcontentloaded', timeout: 10000 });
                        await human_delay(1000, 1500);
                    } catch (error) {
                        console.log('⚠️ Could not go back');
                        await page.reload({ waitUntil: 'domcontentloaded' });
                    }
                }
            }
        }

        console.log(`✅ Search "${query}" completed, found ${parsed_results.results.length} results`);
        return { success: true, query: query, results: parsed_results.results };

    } catch (error) {
        console.error(`❌ Error during search "${query}": ${error.message}`);

        const has_captcha = await check_for_captcha(page).catch(() => false);
        if (has_captcha) {
            console.log('🚫 Error caused by captcha');
            return { error: 'captcha', query: query };
        }

        if (retry_count < max_retries) {
            console.log(`🔄 Retrying in 5 seconds...`);
            await sleep(5);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        return { error: 'timeout', query: query };
    }
}

// ============= OCTO FUNCTIONS =============
async function check_limits(response) {
    function parse_int_safe(value) {
        const parsed = parseInt(value, 10);
        return isNaN(parsed) ? 0 : parsed;
    }
    const ratelimit_header = response.headers.ratelimit;
    if (!ratelimit_header) {
        console.warn('No ratelimit header found!');
        return;
    }
    const limit_entries = ratelimit_header.split(',').map(entry => entry.trim());
    for (const entry of limit_entries) {
        const name_match = entry.match(/^([^;]+)/);
        const r_match = entry.match(/;r=(\d+)/);
        const t_match = entry.match(/;t=(\d+)/);
        if (!r_match || !t_match) {
            console.warn(`Invalid ratelimit format: ${entry}`);
            continue;
        }
        const limit_name = name_match ? name_match[1] : 'unknown_limit';
        const remaining_quantity = parse_int_safe(r_match[1]);
        const window_seconds = parse_int_safe(t_match[1]);
        if (remaining_quantity < 5) {
            const wait_time = window_seconds + 1;
            console.log(`Waiting ${wait_time} seconds due to ${limit_name} limit`);
            await sleep(wait_time);
        }
    }
}

function parse_proxy(proxy) {
    const regex = /^(\w+):\/\/(?:([^:]+):([^@]+)@)?([^:]+):(\d+)$/;
    const match = proxy.match(regex);
    if (!match) return null;
    const [, type, login, password, host, port] = match;
    return { type, host, port, login: login || null, password: password || null };
}

async function octo_one_time_profile(config, proxy) {
    const one_time_profile_config = {
        method: "post",
        url: `${config.octo_local_api_base_url}/one_time/start`,
        headers: {
            'Content-Type': 'application/json'
        },
        data: {
            "profile_data": {
                "fingerprint": {
                    "os": Math.random() < 0.5 ? "win" : "mac"
                },
                "proxy": proxy,
                "images_load_limit": 10240,
            },
            "headless": config.headless_mode,
            "debug_port": true,
            "timeout": 60
        }
    }
    const response = await axios(one_time_profile_config);
    await check_limits(response);
    return response;
}


// ============= MAIN PROCESS =============
(async () => {
    console.log('🚀 Starting Google Scraper with Human-like Behavior...');
    console.log('🛡️ Captcha detection enabled - profiles with captcha will be skipped\n');

    const proxy_count = config.proxies.length;
    const all_queries = config.google_search_queries;
    const query_batches = distribute_queries(all_queries, proxy_count);

    console.log(`Total proxies: ${proxy_count}`);
    console.log(`Total search queries: ${all_queries.length}`);
    console.log('Query distribution:');
    query_batches.forEach((batch, idx) => {
        console.log(`  Profile ${idx + 1}: ${batch.length} queries - ${batch.join(', ')}`);
    });
    console.log('');

    let successful_profiles = 0;
    let skipped_profiles = 0;
    let failed_profiles = 0;

    for (let i = 0; i < proxy_count; i++) {
        console.log(`\n${'='.repeat(80)}`);
        console.log(`📋 Processing profile ${i + 1}/${proxy_count}`);
        console.log(`${'='.repeat(80)}`);

        const queries_for_this_profile = query_batches[i];
        if (queries_for_this_profile.length === 0) {
            console.log(`⚠️ No queries assigned to profile ${i + 1}, skipping.`);
            continue;
        }

        let parsed_proxy = parse_proxy(config.proxies[i]);
        if (!parsed_proxy) {
            console.error(`❌ Failed to parse proxy: ${config.proxies[i]}`);
            failed_profiles++;
            continue;
        }

        console.log(`🔧 Creating and starting One Time Profile with proxy: ${parsed_proxy.host}:${parsed_proxy.port}`);
        let ws_endpoint;

        try {
            ws_endpoint = await octo_one_time_profile(config, parsed_proxy);
        } catch (error) {
            console.error(`❌ Failed to create or start profile: ${error.message}`);
            failed_profiles++;
            continue;
        }

        if (!ws_endpoint || !ws_endpoint.data.ws_endpoint || !ws_endpoint.data.uuid) {
            console.error('❌ Failed to create or start profile');
            failed_profiles++;
            continue;
        }

        console.log(`✅ Profile created and started: ${ws_endpoint.data.uuid}`);

        console.log(`🌐 Connecting to browser`);

        let browser;
        try {
            browser = await puppeteer.connect({
                browserWSEndpoint: ws_endpoint.data.ws_endpoint,
                defaultViewport: null
            });
        } catch (error) {
            console.error(`❌ Failed to connect to browser: ${error.message}`);
            await kill_browser(ws_endpoint.data.browser_pid);
            continue;
        }

        const page = await browser.newPage();

        const results_data = {
            has_results: false,
            all_results: []
        };

        let captcha_detected = false;

        // Execute only the queries assigned to this profile
        for (let j = 0; j < queries_for_this_profile.length; j++) {
            const query = queries_for_this_profile[j];

            try {
                const search_result = await google_search_human(page, query, results_data);

                if (search_result.error === 'captcha') {
                    console.log(`\n🚨 CAPTCHA DETECTED! Skipping profile ${ws_endpoint.data.uuid}`);
                    captcha_detected = true;
                    break;
                }

                if (j < queries_for_this_profile.length - 1 && !captcha_detected) {
                    const delay_between = random_range(5, 10);
                    console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next search...`);
                    await sleep(delay_between);
                }

            } catch (error) {
                console.error(`❌ Error during search "${query}": ${error.message}`);
            }
        }

        console.log(`🛑 Stopping profile...`);
        await kill_browser(ws_endpoint.data.browser_pid);

        if (captcha_detected) {
            console.log(`⏭️ Profile ${ws_endpoint.data.uuid} skipped due to captcha`);
            skipped_profiles++;
        } else if (results_data.all_results.length > 0) {
            const summary_filename = `summary_${ws_endpoint.data.uuid}_${Date.now()}.txt`;
            const summary_path = path.join(__dirname, 'search_results', summary_filename);

            let summary_content = `=== SEARCH SUMMARY ===\n`;
            summary_content += `Profile: ${ws_endpoint.data.uuid}\n`;
            summary_content += `Proxy: ${parsed_proxy.host}:${parsed_proxy.port}\n`;
            summary_content += `Queries executed: ${queries_for_this_profile.length}\n`;
            summary_content += `Queries: ${queries_for_this_profile.join(', ')}\n`;
            summary_content += `Total results collected: ${results_data.all_results.length}\n`;
            summary_content += `Time: ${new Date().toISOString()}\n`;
            summary_content += `${'='.repeat(80)}\n\n`;

            await fs.writeFile(summary_path, summary_content);
            console.log(`\n📊 Summary saved: ${summary_path}`);
            successful_profiles++;
        } else {
            console.log(`⚠️ Profile ${ws_endpoint.data.uuid} finished without results`);
            failed_profiles++;
        }

        console.log(`✅ Profile ${i + 1} completed`);

        if (i < proxy_count - 1) {
            const delay_between = random_range(10, 20);
            console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next profile...`);
            await sleep(delay_between);
        }
    }

    console.log(`\n${'='.repeat(80)}`);
    console.log(`📊 FINAL STATISTICS:`);
    console.log(`${'='.repeat(80)}`);
    console.log(`✅ Successful profiles: ${successful_profiles}`);
    console.log(`⏭️ Skipped due to captcha: ${skipped_profiles}`);
    console.log(`❌ Failed profiles: ${failed_profiles}`);
    console.log(`📁 All results saved in "search_results" folder`);
    console.log(`\n🎉 Google Scraper finished!`);
})();
const axios = require('axios');
const puppeteer = require('rebrowser-puppeteer');
const fs = require('fs').promises;
const path = require('path');

const config = {
    octo_local_api_base_url: `http://localhost:58888/api/profiles`, //change port if you don't use default 58888
    headless_mode: false,
    proxies: [
        "socks5://login:password@127.0.0.1:50000", //paste your proxies
        "socks5://login:password@127.0.0.1:50000"
    ],
    google_search_queries: ["nodejs", "sidwudraq", "arch linux"] //change queries
}

// ============= HELPER FUNCTIONS =============
function random_range(min, max) {
    return min + Math.random() * (max - min);
}

async function sleep(seconds) {
    return new Promise(resolve => setTimeout(resolve, seconds * 1000));
}

async function human_delay(min_ms = 50, max_ms = 200) {
    const mu = Math.log((min_ms + max_ms) / 2);
    const sigma = random_range(0.3, 0.6);
    let delay = Math.exp(mu + sigma * (Math.random() - 0.5) * 2);
    delay = Math.min(max_ms, Math.max(min_ms, delay));
    await new Promise(resolve => setTimeout(resolve, delay));
}

async function kill_browser(pid) {
    const { default: fkill } = await import('fkill');
    await fkill(pid, { force: true });
    console.log(`✅ Process with PID ${pid} successfully stopped.`);
}

// ============= BEZIER CURVES FOR HUMAN-LIKE MOVEMENT =============
function bezier_curve(t, p0, p1, p2, p3) {
    const mt = 1 - t;
    const mt2 = mt * mt;
    const t2 = t * t;

    const x = mt2 * mt * p0.x + 3 * mt2 * t * p1.x + 3 * mt * t2 * p2.x + t2 * t * p3.x;
    const y = mt2 * mt * p0.y + 3 * mt2 * t * p1.y + 3 * mt * t2 * p2.y + t2 * t * p3.y;

    return { x, y };
}

function generate_bezier_points(start, end) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const angle = Math.atan2(end.y - start.y, end.x - start.x);

    const deviation = random_range(distance * 0.2, distance * 0.5);
    const angle_variation = random_range(-Math.PI / 3, Math.PI / 3);

    const p1 = {
        x: start.x + Math.cos(angle + angle_variation) * deviation,
        y: start.y + Math.sin(angle + angle_variation) * deviation
    };

    const p2 = {
        x: end.x - Math.cos(angle - angle_variation) * deviation,
        y: end.y - Math.sin(angle - angle_variation) * deviation
    };

    return [start, p1, p2, end];
}

function generate_trajectory(start, end, steps = null) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const actual_steps = steps || Math.max(20, Math.min(100, Math.floor(distance / 3)));

    const bezier_points = generate_bezier_points(start, end);
    const trajectory = [];

    for (let i = 0; i <= actual_steps; i++) {
        const t = i / actual_steps;
        const eased_t = Math.pow(t, 1 + Math.random() * 0.3);
        const point = bezier_curve(eased_t, ...bezier_points);

        const jitter = {
            x: (Math.random() - 0.5) * random_range(0.5, 2),
            y: (Math.random() - 0.5) * random_range(0.5, 2)
        };

        trajectory.push({
            x: Math.round(point.x + jitter.x),
            y: Math.round(point.y + jitter.y)
        });
    }

    return trajectory;
}

// ============= HUMAN-LIKE CLICK =============
async function human_click(page, selector_or_element, options = {}) {
    const {
        move_speed = 1.0,
        random_overshoot = true,
        click_delay = null,
        force_visible = true
    } = options;

    const element = typeof selector_or_element === 'string'
        ? await page.$(selector_or_element)
        : selector_or_element;

    if (!element) {
        throw new Error(`Element not found: ${selector_or_element}`);
    }

    if (force_visible) {
        await element.scrollIntoView();
        await human_delay(100, 300);
    }

    const current_mouse = await page.evaluate(() => ({
        x: window.mouseX || window.innerWidth / 2,
        y: window.mouseY || window.innerHeight / 2
    }));

    const box = await element.boundingBox();
    if (!box) throw new Error('Could not get element coordinates');

    const target = {
        x: box.x + random_range(box.width * 0.2, box.width * 0.8),
        y: box.y + random_range(box.height * 0.2, box.height * 0.8)
    };

    if (random_overshoot && Math.random() < 0.3) {
        const overshoot_x = (Math.random() - 0.5) * random_range(10, 30);
        const overshoot_y = (Math.random() - 0.5) * random_range(10, 30);

        const overshoot_target = {
            x: target.x + overshoot_x,
            y: target.y + overshoot_y
        };

        const overshoot_trajectory = generate_trajectory(current_mouse, overshoot_target);
        for (const point of overshoot_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }

        const return_trajectory = generate_trajectory(overshoot_target, target);
        for (const point of return_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }
    } else {
        const trajectory = generate_trajectory(current_mouse, target);
        for (const point of trajectory) {
            await page.mouse.move(point.x, point.y);
            const delay = Math.max(1, Math.min(5, 10 / move_speed));
            await human_delay(delay * 0.5, delay * 1.5);
        }
    }

    const final_delay = click_delay !== null ? click_delay : random_range(80, 250);
    await human_delay(final_delay * 0.8, final_delay * 1.2);

    if (Math.random() < 0.15) {
        const micro_offset_x = (Math.random() - 0.5) * random_range(1, 4);
        const micro_offset_y = (Math.random() - 0.5) * random_range(1, 4);
        await page.mouse.move(target.x + micro_offset_x, target.y + micro_offset_y);
        await human_delay(10, 30);
    }

    await page.mouse.down();
    await human_delay(random_range(50, 150));

    if (Math.random() < 0.2) {
        await page.mouse.move(
            target.x + (Math.random() - 0.5) * 2,
            target.y + (Math.random() - 0.5) * 2
        );
    }

    await page.mouse.up();
    await human_delay(50, 150);

    await page.evaluate(({ x, y }) => {
        window.mouseX = x;
        window.mouseY = y;
    }, target);

    return { success: true, position: target };
}

// ============= HUMAN-LIKE TEXT INPUT =============
async function human_type(page, selector, text, options = {}) {
    const {
        typing_speed = null,
        random_mistakes = false,
        backspace_fix = false
    } = options;

    const element = typeof selector === 'string'
        ? await page.$(selector)
        : selector;

    if (!element) {
        throw new Error(`Element not found: ${selector}`);
    }

    await human_click(page, element, { pre_hover: true });

    // Clear the field
    await page.keyboard.down('Control');
    await page.keyboard.press('a');
    await page.keyboard.up('Control');
    await page.keyboard.press('Backspace');
    await human_delay(100, 200);

    for (let i = 0; i < text.length; i++) {
        const char = text[i];

        let delay;
        if (typing_speed) {
            delay = typing_speed;
        } else {
            const base_delay = random_range(50, 200);
            const is_space = char === ' ';
            delay = is_space ? base_delay * 2 : base_delay;
        }

        if (random_mistakes && Math.random() < 0.02) {
            const wrong_char = String.fromCharCode(
                char.charCodeAt(0) + (Math.random() > 0.5 ? 1 : -1)
            );
            await page.keyboard.type(wrong_char, { delay: delay * 0.5 });
            await human_delay(100, 200);

            if (backspace_fix) {
                await page.keyboard.press('Backspace');
                await human_delay(50, 100);
            } else {
                continue;
            }
        }

        await page.keyboard.type(char, { delay: delay });
    }

    await human_delay(100, 300);
    return true;
}

// ============= HUMAN-LIKE SCROLL =============
async function human_scroll(page, options = {}) {
    const {
        scrolls = null,
        min_scroll = 300,
        max_scroll = 800
    } = options;

    const num_scrolls = scrolls || Math.floor(random_range(3, 8));

    for (let i = 0; i < num_scrolls; i++) {
        const scroll_distance = random_range(min_scroll, max_scroll);
        await page.evaluate((distance) => {
            window.scrollBy({
                top: distance,
                behavior: 'smooth'
            });
        }, scroll_distance);

        await human_delay(800, 2000);

        if (Math.random() < 0.2) {
            const back_distance = random_range(100, 300);
            await page.evaluate((distance) => {
                window.scrollBy({
                    top: -distance,
                    behavior: 'smooth'
                });
            }, back_distance);
            await human_delay(500, 1000);
        }
    }
}

// ============= DISTRIBUTE QUERIES AMONG PROFILES =============
function distribute_queries(queries, numProxies) {
    const total = queries.length;
    const baseCount = Math.floor(total / numProxies);
    const remainder = total % numProxies;

    const batches = [];
    let start = 0;
    for (let i = 0; i < numProxies; i++) {
        const count = baseCount + (i < remainder ? 1 : 0);
        const batch = queries.slice(start, start + count);
        batches.push(batch);
        start += count;
    }
    return batches;
}

// ============= PARSE GOOGLE RESULTS =============
async function parse_search_results(page, query) {
    return await page.evaluate((query) => {
        const results = [];

        // Find all result containers
        const organic_results = document.querySelectorAll('div.tF2Cxc');

        console.log(`Found ${organic_results.length} result containers`);

        organic_results.forEach((result, index) => {
            try {
                // Title
                const title_element = result.querySelector('h3.LC20lb.MBeuO.DKV0Md');
                const title = title_element ? title_element.innerText : '';

                // Link
                let link_element = result.querySelector('a');
                let link = link_element ? link_element.href : '';

                // Clean Google redirect
                if (link && link.includes('/url?q=')) {
                    const url_match = link.match(/\/url\?q=([^&]+)/);
                    if (url_match) {
                        link = decodeURIComponent(url_match[1]);
                    }
                }

                // Description
                let desc_element = result.querySelector('div.VwiC3b.yXK7lf.p4wth.r025kc.Hdw6tb');
                let description = desc_element ? desc_element.innerText : '';

                // Fallback selector
                if (!description) {
                    const fallback_desc = result.querySelector('div.VwiC3b');
                    description = fallback_desc ? fallback_desc.innerText : '';
                }

                if (title && title.trim() && link) {
                    results.push({
                        position: results.length + 1,
                        title: title.trim(),
                        link: link,
                        description: description.trim().substring(0, 500)
                    });
                }
            } catch (error) {
                console.error(`Error parsing result ${index}:`, error);
            }
        });

        console.log(`Successfully parsed ${results.length} results`);

        return {
            query: query,
            timestamp: new Date().toISOString(),
            total_results: results.length,
            results: results
        };
    }, query);
}

// ============= SAVE RESULTS TO FILE =============
async function save_results_to_file(query, data, is_appending = false) {
    const filename = `${query.replace(/[^a-z0-9]/gi, '_').toLowerCase()}_results.txt`;
    const filepath = path.join(__dirname, 'search_results', filename);

    // Create directory if needed
    await fs.mkdir(path.join(__dirname, 'search_results'), { recursive: true });

    let content = '';

    if (!is_appending) {
        content += `=== GOOGLE SEARCH RESULTS ===\n`;
        content += `Query: ${data.query}\n`;
        content += `Time: ${data.timestamp}\n`;
        content += `Total results: ${data.total_results}\n`;
        content += `${'='.repeat(80)}\n\n`;
    }

    for (const result of data.results) {
        content += `${result.position}. ${result.title}\n`;
        content += `   URL: ${result.link}\n`;
        content += `   Description: ${result.description.substring(0, 200)}...\n`;
        content += `   ${'-'.repeat(80)}\n`;
    }

    content += `\n📄 Page saved: ${new Date().toISOString()}\n`;
    content += `${'='.repeat(80)}\n\n`;

    await fs.writeFile(filepath, content, { flag: is_appending ? 'a' : 'w' });
    console.log(`✅ Results saved to: ${filepath}`);
    return filepath;
}

// ============= OPEN RANDOM RESULT PAGE =============
async function open_random_result(page, results) {
    if (!results || results.length === 0) {
        console.log('No results to open');
        return false;
    }

    // Choose a random result (usually not the first)
    let result_index = 0;
    if (results.length > 1) {
        result_index = Math.random() < 0.7
            ? Math.floor(random_range(1, Math.min(5, results.length)))
            : Math.floor(random_range(0, results.length));
    }

    const selected_result = results[result_index];
    console.log(`Opening result ${result_index + 1}: ${selected_result.title.substring(0, 50)}...`);

    try {
        // Check for captcha before opening
        const has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected, not opening result');
            return false;
        }

        // Open in a new tab
        const new_page = await page.browser().newPage();
        await new_page.goto(selected_result.link, {
            waitUntil: 'domcontentloaded',
            timeout: 20000
        });
        await human_delay(2000, 4000);

        // Check for captcha on the opened page
        const page_has_captcha = await check_for_captcha(new_page);
        if (page_has_captcha) {
            console.log('🚫 Captcha detected on opened page');
            await new_page.close();
            return false;
        }

        // Scroll on the opened page
        await human_scroll(new_page, { scrolls: random_range(2, 5) });
        await human_delay(1500, 3000);

        // Close the tab
        await new_page.close();
        console.log(`✅ Page viewed and closed`);

        return true;
    } catch (error) {
        console.log(`❌ Error opening page: ${error.message}`);
        return false;
    }
}

// ============= CAPTCHA CHECK =============
async function check_for_captcha(page) {
    const captcha_selectors = [
        '#captcha-form',
        '.g-recaptcha',
        'iframe[src*="recaptcha"]',
        'form[action*="captcha"]',
        '#captcha',
        '.captcha',
        'div[jsname="Jai8Rc"]',
        'form[action*="sorry"]'
    ];

    for (const selector of captcha_selectors) {
        const element = await page.$(selector);
        if (element) return true;
    }

    const current_url = page.url();
    if (current_url.includes('sorry') || current_url.includes('captcha')) {
        return true;
    }

    const page_text = await page.evaluate(() => document.body.innerText);
    const captcha_keywords = ['captcha', 'robot', 'verify', 'unusual traffic', 'confirm', 'not a robot'];

    for (const keyword of captcha_keywords) {
        if (page_text.toLowerCase().includes(keyword)) {
            return true;
        }
    }

    return false;
}

// ============= MAIN SEARCH FUNCTION =============
async function google_search_human(page, query, results_data, retry_count = 0) {
    const max_retries = 2;

    console.log(`🔍 Searching: ${query}${retry_count > 0 ? ` (attempt ${retry_count + 1})` : ''}`);

    try {
        // Go to Google homepage
        await page.goto('https://www.google.com', {
            waitUntil: 'domcontentloaded',
            timeout: 30000
        });
        await human_delay(1000, 2000);

        // Check for captcha
        let has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected!');
            return { error: 'captcha', query: query };
        }

        // Accept cookies if present
        try {
            const cookie_button = await page.$('#L2AGLb');
            if (cookie_button) {
                await human_click(page, cookie_button);
                console.log('✅ Cookies accepted');
                await human_delay(500, 1000);
            }
        } catch (error) {
            console.log('No cookie button');
        }

        // Enter search query
        const search_input = await page.$('textarea[name="q"], input[name="q"]');
        if (!search_input) {
            throw new Error('Search input not found');
        }

        await human_type(page, search_input, query, {
            random_mistakes: true,
            backspace_fix: true
        });

        await human_delay(500, 1000);

        // Check for captcha before submitting
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected before submission!');
            return { error: 'captcha', query: query };
        }

        // Press Enter
        console.log('📤 Submitting query...');

        await Promise.all([
            page.waitForNavigation({
                waitUntil: 'domcontentloaded',
                timeout: 15000
            }).catch(e => {
                console.log(`⚠️ Navigation warning: ${e.message}`);
                return null;
            }),
            page.keyboard.press('Enter'),
            human_delay(500, 1000)
        ]);

        // Check for captcha after search
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected after search!');
            return { error: 'captcha', query: query };
        }

        console.log('⏳ Waiting for results to load...');

        // Wait for results to appear
        try {
            await page.waitForSelector('div.tF2Cxc', {
                timeout: 15000,
                visible: true
            });
            console.log('✅ Results loaded');
        } catch (error) {
            console.log('⚠️ Results not found, continuing...');
        }

        await human_delay(1500, 2500);

        // Scroll through results
        console.log('📜 Scrolling through results...');
        await human_scroll(page, { scrolls: random_range(4, 8) });

        // Parse results
        console.log('📊 Parsing results...');
        const parsed_results = await parse_search_results(page, query);

        if (parsed_results.results.length === 0 && retry_count < max_retries) {
            console.log('⚠️ No results found, retrying...');
            await human_delay(2000, 3000);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        // Save results
        const is_appending = results_data.has_results;
        await save_results_to_file(query, parsed_results, is_appending);
        results_data.has_results = true;
        results_data.all_results.push(...parsed_results.results);

        // Open 1-2 random result pages
        if (parsed_results.results.length > 0) {
            const pages_to_open = Math.floor(random_range(1, Math.min(3, parsed_results.results.length)));
            console.log(`📖 Opening ${pages_to_open} result pages...`);

            for (let i = 0; i < pages_to_open; i++) {
                await open_random_result(page, parsed_results.results);
                await human_delay(1000, 2000);

                // Return to results page
                const current_url = page.url();
                if (!current_url.includes('google.com/search')) {
                    try {
                        await page.goBack({ waitUntil: 'domcontentloaded', timeout: 10000 });
                        await human_delay(1000, 1500);
                    } catch (error) {
                        console.log('⚠️ Could not go back');
                        await page.reload({ waitUntil: 'domcontentloaded' });
                    }
                }
            }
        }

        console.log(`✅ Search "${query}" completed, found ${parsed_results.results.length} results`);
        return { success: true, query: query, results: parsed_results.results };

    } catch (error) {
        console.error(`❌ Error during search "${query}": ${error.message}`);

        const has_captcha = await check_for_captcha(page).catch(() => false);
        if (has_captcha) {
            console.log('🚫 Error caused by captcha');
            return { error: 'captcha', query: query };
        }

        if (retry_count < max_retries) {
            console.log(`🔄 Retrying in 5 seconds...`);
            await sleep(5);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        return { error: 'timeout', query: query };
    }
}

// ============= OCTO FUNCTIONS =============
async function check_limits(response) {
    function parse_int_safe(value) {
        const parsed = parseInt(value, 10);
        return isNaN(parsed) ? 0 : parsed;
    }
    const ratelimit_header = response.headers.ratelimit;
    if (!ratelimit_header) {
        console.warn('No ratelimit header found!');
        return;
    }
    const limit_entries = ratelimit_header.split(',').map(entry => entry.trim());
    for (const entry of limit_entries) {
        const name_match = entry.match(/^([^;]+)/);
        const r_match = entry.match(/;r=(\d+)/);
        const t_match = entry.match(/;t=(\d+)/);
        if (!r_match || !t_match) {
            console.warn(`Invalid ratelimit format: ${entry}`);
            continue;
        }
        const limit_name = name_match ? name_match[1] : 'unknown_limit';
        const remaining_quantity = parse_int_safe(r_match[1]);
        const window_seconds = parse_int_safe(t_match[1]);
        if (remaining_quantity < 5) {
            const wait_time = window_seconds + 1;
            console.log(`Waiting ${wait_time} seconds due to ${limit_name} limit`);
            await sleep(wait_time);
        }
    }
}

function parse_proxy(proxy) {
    const regex = /^(\w+):\/\/(?:([^:]+):([^@]+)@)?([^:]+):(\d+)$/;
    const match = proxy.match(regex);
    if (!match) return null;
    const [, type, login, password, host, port] = match;
    return { type, host, port, login: login || null, password: password || null };
}

async function octo_one_time_profile(config, proxy) {
    const one_time_profile_config = {
        method: "post",
        url: `${config.octo_local_api_base_url}/one_time/start`,
        headers: {
            'Content-Type': 'application/json'
        },
        data: {
            "profile_data": {
                "fingerprint": {
                    "os": Math.random() < 0.5 ? "win" : "mac"
                },
                "proxy": proxy,
                "images_load_limit": 10240,
            },
            "headless": config.headless_mode,
            "debug_port": true,
            "timeout": 60
        }
    }
    const response = await axios(one_time_profile_config);
    await check_limits(response);
    return response;
}


// ============= MAIN PROCESS =============
(async () => {
    console.log('🚀 Starting Google Scraper with Human-like Behavior...');
    console.log('🛡️ Captcha detection enabled - profiles with captcha will be skipped\n');

    const proxy_count = config.proxies.length;
    const all_queries = config.google_search_queries;
    const query_batches = distribute_queries(all_queries, proxy_count);

    console.log(`Total proxies: ${proxy_count}`);
    console.log(`Total search queries: ${all_queries.length}`);
    console.log('Query distribution:');
    query_batches.forEach((batch, idx) => {
        console.log(`  Profile ${idx + 1}: ${batch.length} queries - ${batch.join(', ')}`);
    });
    console.log('');

    let successful_profiles = 0;
    let skipped_profiles = 0;
    let failed_profiles = 0;

    for (let i = 0; i < proxy_count; i++) {
        console.log(`\n${'='.repeat(80)}`);
        console.log(`📋 Processing profile ${i + 1}/${proxy_count}`);
        console.log(`${'='.repeat(80)}`);

        const queries_for_this_profile = query_batches[i];
        if (queries_for_this_profile.length === 0) {
            console.log(`⚠️ No queries assigned to profile ${i + 1}, skipping.`);
            continue;
        }

        let parsed_proxy = parse_proxy(config.proxies[i]);
        if (!parsed_proxy) {
            console.error(`❌ Failed to parse proxy: ${config.proxies[i]}`);
            failed_profiles++;
            continue;
        }

        console.log(`🔧 Creating and starting One Time Profile with proxy: ${parsed_proxy.host}:${parsed_proxy.port}`);
        let ws_endpoint;

        try {
            ws_endpoint = await octo_one_time_profile(config, parsed_proxy);
        } catch (error) {
            console.error(`❌ Failed to create or start profile: ${error.message}`);
            failed_profiles++;
            continue;
        }

        if (!ws_endpoint || !ws_endpoint.data.ws_endpoint || !ws_endpoint.data.uuid) {
            console.error('❌ Failed to create or start profile');
            failed_profiles++;
            continue;
        }

        console.log(`✅ Profile created and started: ${ws_endpoint.data.uuid}`);

        console.log(`🌐 Connecting to browser`);

        let browser;
        try {
            browser = await puppeteer.connect({
                browserWSEndpoint: ws_endpoint.data.ws_endpoint,
                defaultViewport: null
            });
        } catch (error) {
            console.error(`❌ Failed to connect to browser: ${error.message}`);
            await kill_browser(ws_endpoint.data.browser_pid);
            continue;
        }

        const page = await browser.newPage();

        const results_data = {
            has_results: false,
            all_results: []
        };

        let captcha_detected = false;

        // Execute only the queries assigned to this profile
        for (let j = 0; j < queries_for_this_profile.length; j++) {
            const query = queries_for_this_profile[j];

            try {
                const search_result = await google_search_human(page, query, results_data);

                if (search_result.error === 'captcha') {
                    console.log(`\n🚨 CAPTCHA DETECTED! Skipping profile ${ws_endpoint.data.uuid}`);
                    captcha_detected = true;
                    break;
                }

                if (j < queries_for_this_profile.length - 1 && !captcha_detected) {
                    const delay_between = random_range(5, 10);
                    console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next search...`);
                    await sleep(delay_between);
                }

            } catch (error) {
                console.error(`❌ Error during search "${query}": ${error.message}`);
            }
        }

        console.log(`🛑 Stopping profile...`);
        await kill_browser(ws_endpoint.data.browser_pid);

        if (captcha_detected) {
            console.log(`⏭️ Profile ${ws_endpoint.data.uuid} skipped due to captcha`);
            skipped_profiles++;
        } else if (results_data.all_results.length > 0) {
            const summary_filename = `summary_${ws_endpoint.data.uuid}_${Date.now()}.txt`;
            const summary_path = path.join(__dirname, 'search_results', summary_filename);

            let summary_content = `=== SEARCH SUMMARY ===\n`;
            summary_content += `Profile: ${ws_endpoint.data.uuid}\n`;
            summary_content += `Proxy: ${parsed_proxy.host}:${parsed_proxy.port}\n`;
            summary_content += `Queries executed: ${queries_for_this_profile.length}\n`;
            summary_content += `Queries: ${queries_for_this_profile.join(', ')}\n`;
            summary_content += `Total results collected: ${results_data.all_results.length}\n`;
            summary_content += `Time: ${new Date().toISOString()}\n`;
            summary_content += `${'='.repeat(80)}\n\n`;

            await fs.writeFile(summary_path, summary_content);
            console.log(`\n📊 Summary saved: ${summary_path}`);
            successful_profiles++;
        } else {
            console.log(`⚠️ Profile ${ws_endpoint.data.uuid} finished without results`);
            failed_profiles++;
        }

        console.log(`✅ Profile ${i + 1} completed`);

        if (i < proxy_count - 1) {
            const delay_between = random_range(10, 20);
            console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next profile...`);
            await sleep(delay_between);
        }
    }

    console.log(`\n${'='.repeat(80)}`);
    console.log(`📊 FINAL STATISTICS:`);
    console.log(`${'='.repeat(80)}`);
    console.log(`✅ Successful profiles: ${successful_profiles}`);
    console.log(`⏭️ Skipped due to captcha: ${skipped_profiles}`);
    console.log(`❌ Failed profiles: ${failed_profiles}`);
    console.log(`📁 All results saved in "search_results" folder`);
    console.log(`\n🎉 Google Scraper finished!`);
})();

保持匿名,充分利用多账户功能,借助市面上最优质的反检测浏览器实现您的目标。

为什么抓取 Google 搜索结果

Google 是一个全球性的消费者需求和竞争对手活动数据库。分析搜索引擎结果页(SERP)可以提供关键洞察:网站对关键词的实际排名、竞争对手的标题和元描述、富摘要的存在及其格式,以及来自“People Also Ask”模块和搜索建议的数据。这些数据可帮助公司和营销人员:

  • 跟踪排名和可见度:分析 SEO 表现并监控随时间的进展。

  • 研究竞争对手:了解他们的关键词和内容策略,并识别市场空白。

  • 发现细分领域和趋势:找到新的关键词和查询,以创建相关内容。

  • 分析广告:研究竞争对手的广告、标题、文案和策略。

因此,这些洞察对于 SEO 专家、营销人员、分析师、企业主以及在线营销工具开发者最具价值。

数据收集工具和方法

1. 第三方 SERP API(付费服务)

这些是专门的 API,负责处理数据收集中的所有技术复杂性。你发送请求后,会收到包含搜索结果、广告和其他元素的结构化 Json。服务提供商会管理代理轮换、解决 CAPTCHA,并渲染 JavaScript,交付可直接使用的数据。

  • 优点:易于集成、可扩展、由服务商处理封禁问题、数据结构清晰。

  • 缺点:规模化成本高(例如,Bright Data 起价为每 1,000 次请求 1 美元)、供应商锁定、处理延迟。

2. 官方 Google API(Custom Search JSON API)

这是一种通过将 Google Search 嵌入你的网站来访问搜索数据的合法方式。然而,它在本质上是不同的,因为它不会模拟真实用户搜索,也不会返回带有广告和动态元素的“实时” SERP。结果通常不够及时,并且结构也不同。

  • 优点:合法、稳定、易于使用,包含免费额度(每天 100 次请求)。

  • 缺点:不会返回真实的 SERP 数据。该 API 提供的是来自一组有限预定义站点的结构化结果,而不是用户看到的真实搜索页面。它有配额和限制,因此不适合大规模排名跟踪或竞争分析。

3. 直接 HTTP 请求(抓取)

这种方法模拟标准浏览器请求。你的脚本(Python、Node.js 等)向 Google Search URL 发送 GET 请求并接收 HTML 代码,然后需要对其进行解析。为了避免被检测,你需要使用 代理 并模拟和轮换 浏览器头

  • 优点:对流程有完全控制、成本低(只需要服务器和代理)、灵活性高。

  • 缺点:复杂且脆弱。Google 会积极阻止非浏览器请求,因此需要持续解决验证码并轮换指纹。即使是带有 TLS 和头部模拟的高级方案也可能失败。Google 布局的任何变化都可能使你的解析器失效。

4. 浏览器自动化(Puppeteer、Playwright、Selenium)

这种方法模拟真实用户行为:打开浏览器、输入查询、点击和滚动。它能完美模仿人类交互,但需要更多计算资源。像 Puppeteer 这样的库可以控制 Chrome 实例,从动态页面收集数据。

  • 优点:可以绕过复杂防护、执行 JavaScript、数据准确性最高(你抓取到的就是用户所见)、灵活且强大。

  • 缺点:资源消耗高(CPU、内存)、比直接 HTTP 请求更慢、对于大规模项目来说配置和维护复杂。

为什么代理和反检测浏览器至关重要

Google 会主动保护其数据,并积极阻止自动化请求。两大主要障碍是 验证码 和基于 IP 的封禁,这些通常在请求超过限制时触发。

  • 代理充当中介,隐藏你的真实 IP 地址。核心策略是 代理轮换,即定期更换 IP,以模拟来自不同用户的流量并避免触发反机器人系统。

  • 反检测浏览器解决的是一个更高级的问题:掩盖数字指纹。它们允许你伪装 User-Agent、屏幕分辨率、媒体设备、GPU 设置等环境参数。这会为每个新配置文件创建一个逼真的指纹,这对于绕过那些分析设备指纹的系统至关重要。将反检测浏览器与高质量代理结合使用,可以创建成千上万个独特的“用户”,并大规模收集数据。

Octo Browser 在 Google SERP 抓取中的能力

Octo Browser 包含一个 API,可实现数据收集过程的完全自动化。Octo 还提供了带有请求示例的详细 API 文档

文档中包含用于集成 Puppeteer、Playwright 和 Selenium 的代码片段,这些工具通过 CDP 协议控制浏览器。

实用建议

  1. 仔细研究官方 API 文档

  2. 查看与 API 使用相关的 常见问题

  3. 阅读关于使用 Octo API 的 详细文章

  4. Octo Browser 中的 API 请求会按订阅级别限制,但可以提高。使用检查响应头中 API 限额的函数。忽略 HTTP 429 错误可能会延长封禁时长。如果你在一个账户下使用多个设备进行自动化,请实现集中式请求跟踪(例如使用 Redis)。

  5. 不要使用未打补丁的自动化库版本,因为它们包含可被检测到的漏洞。对于 Puppeteer/Playwright,请使用 rebrowser 补丁。对于 Selenium,请使用 undetected-chromedriver。

  6. 使用最能模拟人类行为的函数和库:鼠标点击、悬停、光标移动、输入、滚动、导航流程以及随机动作。

  7. 使用本地缓存保存配置文件,以减少代理流量。这可以通过在创建配置文件时传入 "local_cache": true 来实现,也可以通过 --disk-cache-dir 使用共享缓存目录,例如 flags:["--disk-cache-dir=C:/Cache"]

  8. 在配置文件设置中限制图片加载,以节省代理流量。可在创建配置文件时设置 "images_load_limit": 10240,将图片限制为不大于 10,240 字节。

抓取方法比较

方法

成本

复杂度

封禁风险

数据质量

付费 SERP API

高(起价每 1,000 次请求 1 美元)

极低

官方 API

低 / 免费

低(不是真实 SERP 数据)

HTTP 请求

中等(需要代理)

非常高

使用反检测浏览器进行自动化

中等(需要订阅和代理)

中等

极低

最高

用于抓取 Google SERP 的现成脚本

下面是一个可与 Octo Browser API 配合使用的抓取脚本示例。你可以将此脚本或其中的一部分作为构建完整项目的起点,并根据需要进行调整。

  1. 下载并安装 VS Code。

  2. 下载并安装 Node.js。

  3. 在方便的位置创建一个文件夹,并例如将其命名为 octo_scraper

  4. 在 VS Code 中打开这个文件夹。

  5. 创建一个 .js 文件。最好根据其功能命名,例如 google_scraping.js

  6. 将脚本代码粘贴到文件中。

  7. 在代码中的 config 变量里,把你的代理添加到 proxies 数组中。

  8. 在同一位置,将你的搜索查询添加到 google_search_queries 数组中。在这个脚本示例中,查询数量必须大于或等于代理数量。你可以轻松修改抓取逻辑以适应你的需求。

In the code, in the config variable, add your proxies to the proxies array.

注意:每个数组元素都必须用引号括起来。元素之间用逗号分隔。

  1. 打开终端并运行命令 npm i rebrowser-puppeteer axios fkill 来安装 Node.js 依赖。

Open the terminal and run the command npm i rebrowser-puppeteer axios fkill to install the Node.js dependencies.
  1. . 如果 VS Code 显示错误,请以管理员身份打开 Windows PowerShell,输入命令 Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned,然后确认。接着重复上一步。

  2. . 启动 Octo Browser。

  3. . 在 Visual Studio 中运行程序(Ctrl/Cmd + F5),等待脚本完成。

  4. . 抓取器会为每个添加的代理创建 一次性配置文件,并按顺序执行指定查询。脚本会模拟真实用户行为,以绕过 Google 的反欺诈系统。

  5. . 你可以在调试控制台中监控过程。如果出现验证码,脚本会关闭该配置文件并启动一个新的。

You can monitor the process in the debug console. If a CAPTCHA appears, the script will close the profile and launch a new one.
  1. . 搜索结果将保存在项目目录中的 search_results 文件夹里。

Search results will be saved in the search_results folder in the project directory.

脚本代码

const axios = require('axios');
const puppeteer = require('rebrowser-puppeteer');
const fs = require('fs').promises;
const path = require('path');

const config = {
    octo_local_api_base_url: `http://localhost:58888/api/profiles`, //change port if you don't use default 58888
    headless_mode: false,
    proxies: [
        "socks5://login:password@127.0.0.1:50000", //paste your proxies
        "socks5://login:password@127.0.0.1:50000"
    ],
    google_search_queries: ["nodejs", "sidwudraq", "arch linux"] //change queries
}

// ============= HELPER FUNCTIONS =============
function random_range(min, max) {
    return min + Math.random() * (max - min);
}

async function sleep(seconds) {
    return new Promise(resolve => setTimeout(resolve, seconds * 1000));
}

async function human_delay(min_ms = 50, max_ms = 200) {
    const mu = Math.log((min_ms + max_ms) / 2);
    const sigma = random_range(0.3, 0.6);
    let delay = Math.exp(mu + sigma * (Math.random() - 0.5) * 2);
    delay = Math.min(max_ms, Math.max(min_ms, delay));
    await new Promise(resolve => setTimeout(resolve, delay));
}

async function kill_browser(pid) {
    const { default: fkill } = await import('fkill');
    await fkill(pid, { force: true });
    console.log(`✅ Process with PID ${pid} successfully stopped.`);
}

// ============= BEZIER CURVES FOR HUMAN-LIKE MOVEMENT =============
function bezier_curve(t, p0, p1, p2, p3) {
    const mt = 1 - t;
    const mt2 = mt * mt;
    const t2 = t * t;

    const x = mt2 * mt * p0.x + 3 * mt2 * t * p1.x + 3 * mt * t2 * p2.x + t2 * t * p3.x;
    const y = mt2 * mt * p0.y + 3 * mt2 * t * p1.y + 3 * mt * t2 * p2.y + t2 * t * p3.y;

    return { x, y };
}

function generate_bezier_points(start, end) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const angle = Math.atan2(end.y - start.y, end.x - start.x);

    const deviation = random_range(distance * 0.2, distance * 0.5);
    const angle_variation = random_range(-Math.PI / 3, Math.PI / 3);

    const p1 = {
        x: start.x + Math.cos(angle + angle_variation) * deviation,
        y: start.y + Math.sin(angle + angle_variation) * deviation
    };

    const p2 = {
        x: end.x - Math.cos(angle - angle_variation) * deviation,
        y: end.y - Math.sin(angle - angle_variation) * deviation
    };

    return [start, p1, p2, end];
}

function generate_trajectory(start, end, steps = null) {
    const distance = Math.hypot(end.x - start.x, end.y - start.y);
    const actual_steps = steps || Math.max(20, Math.min(100, Math.floor(distance / 3)));

    const bezier_points = generate_bezier_points(start, end);
    const trajectory = [];

    for (let i = 0; i <= actual_steps; i++) {
        const t = i / actual_steps;
        const eased_t = Math.pow(t, 1 + Math.random() * 0.3);
        const point = bezier_curve(eased_t, ...bezier_points);

        const jitter = {
            x: (Math.random() - 0.5) * random_range(0.5, 2),
            y: (Math.random() - 0.5) * random_range(0.5, 2)
        };

        trajectory.push({
            x: Math.round(point.x + jitter.x),
            y: Math.round(point.y + jitter.y)
        });
    }

    return trajectory;
}

// ============= HUMAN-LIKE CLICK =============
async function human_click(page, selector_or_element, options = {}) {
    const {
        move_speed = 1.0,
        random_overshoot = true,
        click_delay = null,
        force_visible = true
    } = options;

    const element = typeof selector_or_element === 'string'
        ? await page.$(selector_or_element)
        : selector_or_element;

    if (!element) {
        throw new Error(`Element not found: ${selector_or_element}`);
    }

    if (force_visible) {
        await element.scrollIntoView();
        await human_delay(100, 300);
    }

    const current_mouse = await page.evaluate(() => ({
        x: window.mouseX || window.innerWidth / 2,
        y: window.mouseY || window.innerHeight / 2
    }));

    const box = await element.boundingBox();
    if (!box) throw new Error('Could not get element coordinates');

    const target = {
        x: box.x + random_range(box.width * 0.2, box.width * 0.8),
        y: box.y + random_range(box.height * 0.2, box.height * 0.8)
    };

    if (random_overshoot && Math.random() < 0.3) {
        const overshoot_x = (Math.random() - 0.5) * random_range(10, 30);
        const overshoot_y = (Math.random() - 0.5) * random_range(10, 30);

        const overshoot_target = {
            x: target.x + overshoot_x,
            y: target.y + overshoot_y
        };

        const overshoot_trajectory = generate_trajectory(current_mouse, overshoot_target);
        for (const point of overshoot_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }

        const return_trajectory = generate_trajectory(overshoot_target, target);
        for (const point of return_trajectory) {
            await page.mouse.move(point.x, point.y);
            await human_delay(1, 3);
        }
    } else {
        const trajectory = generate_trajectory(current_mouse, target);
        for (const point of trajectory) {
            await page.mouse.move(point.x, point.y);
            const delay = Math.max(1, Math.min(5, 10 / move_speed));
            await human_delay(delay * 0.5, delay * 1.5);
        }
    }

    const final_delay = click_delay !== null ? click_delay : random_range(80, 250);
    await human_delay(final_delay * 0.8, final_delay * 1.2);

    if (Math.random() < 0.15) {
        const micro_offset_x = (Math.random() - 0.5) * random_range(1, 4);
        const micro_offset_y = (Math.random() - 0.5) * random_range(1, 4);
        await page.mouse.move(target.x + micro_offset_x, target.y + micro_offset_y);
        await human_delay(10, 30);
    }

    await page.mouse.down();
    await human_delay(random_range(50, 150));

    if (Math.random() < 0.2) {
        await page.mouse.move(
            target.x + (Math.random() - 0.5) * 2,
            target.y + (Math.random() - 0.5) * 2
        );
    }

    await page.mouse.up();
    await human_delay(50, 150);

    await page.evaluate(({ x, y }) => {
        window.mouseX = x;
        window.mouseY = y;
    }, target);

    return { success: true, position: target };
}

// ============= HUMAN-LIKE TEXT INPUT =============
async function human_type(page, selector, text, options = {}) {
    const {
        typing_speed = null,
        random_mistakes = false,
        backspace_fix = false
    } = options;

    const element = typeof selector === 'string'
        ? await page.$(selector)
        : selector;

    if (!element) {
        throw new Error(`Element not found: ${selector}`);
    }

    await human_click(page, element, { pre_hover: true });

    // Clear the field
    await page.keyboard.down('Control');
    await page.keyboard.press('a');
    await page.keyboard.up('Control');
    await page.keyboard.press('Backspace');
    await human_delay(100, 200);

    for (let i = 0; i < text.length; i++) {
        const char = text[i];

        let delay;
        if (typing_speed) {
            delay = typing_speed;
        } else {
            const base_delay = random_range(50, 200);
            const is_space = char === ' ';
            delay = is_space ? base_delay * 2 : base_delay;
        }

        if (random_mistakes && Math.random() < 0.02) {
            const wrong_char = String.fromCharCode(
                char.charCodeAt(0) + (Math.random() > 0.5 ? 1 : -1)
            );
            await page.keyboard.type(wrong_char, { delay: delay * 0.5 });
            await human_delay(100, 200);

            if (backspace_fix) {
                await page.keyboard.press('Backspace');
                await human_delay(50, 100);
            } else {
                continue;
            }
        }

        await page.keyboard.type(char, { delay: delay });
    }

    await human_delay(100, 300);
    return true;
}

// ============= HUMAN-LIKE SCROLL =============
async function human_scroll(page, options = {}) {
    const {
        scrolls = null,
        min_scroll = 300,
        max_scroll = 800
    } = options;

    const num_scrolls = scrolls || Math.floor(random_range(3, 8));

    for (let i = 0; i < num_scrolls; i++) {
        const scroll_distance = random_range(min_scroll, max_scroll);
        await page.evaluate((distance) => {
            window.scrollBy({
                top: distance,
                behavior: 'smooth'
            });
        }, scroll_distance);

        await human_delay(800, 2000);

        if (Math.random() < 0.2) {
            const back_distance = random_range(100, 300);
            await page.evaluate((distance) => {
                window.scrollBy({
                    top: -distance,
                    behavior: 'smooth'
                });
            }, back_distance);
            await human_delay(500, 1000);
        }
    }
}

// ============= DISTRIBUTE QUERIES AMONG PROFILES =============
function distribute_queries(queries, numProxies) {
    const total = queries.length;
    const baseCount = Math.floor(total / numProxies);
    const remainder = total % numProxies;

    const batches = [];
    let start = 0;
    for (let i = 0; i < numProxies; i++) {
        const count = baseCount + (i < remainder ? 1 : 0);
        const batch = queries.slice(start, start + count);
        batches.push(batch);
        start += count;
    }
    return batches;
}

// ============= PARSE GOOGLE RESULTS =============
async function parse_search_results(page, query) {
    return await page.evaluate((query) => {
        const results = [];

        // Find all result containers
        const organic_results = document.querySelectorAll('div.tF2Cxc');

        console.log(`Found ${organic_results.length} result containers`);

        organic_results.forEach((result, index) => {
            try {
                // Title
                const title_element = result.querySelector('h3.LC20lb.MBeuO.DKV0Md');
                const title = title_element ? title_element.innerText : '';

                // Link
                let link_element = result.querySelector('a');
                let link = link_element ? link_element.href : '';

                // Clean Google redirect
                if (link && link.includes('/url?q=')) {
                    const url_match = link.match(/\/url\?q=([^&]+)/);
                    if (url_match) {
                        link = decodeURIComponent(url_match[1]);
                    }
                }

                // Description
                let desc_element = result.querySelector('div.VwiC3b.yXK7lf.p4wth.r025kc.Hdw6tb');
                let description = desc_element ? desc_element.innerText : '';

                // Fallback selector
                if (!description) {
                    const fallback_desc = result.querySelector('div.VwiC3b');
                    description = fallback_desc ? fallback_desc.innerText : '';
                }

                if (title && title.trim() && link) {
                    results.push({
                        position: results.length + 1,
                        title: title.trim(),
                        link: link,
                        description: description.trim().substring(0, 500)
                    });
                }
            } catch (error) {
                console.error(`Error parsing result ${index}:`, error);
            }
        });

        console.log(`Successfully parsed ${results.length} results`);

        return {
            query: query,
            timestamp: new Date().toISOString(),
            total_results: results.length,
            results: results
        };
    }, query);
}

// ============= SAVE RESULTS TO FILE =============
async function save_results_to_file(query, data, is_appending = false) {
    const filename = `${query.replace(/[^a-z0-9]/gi, '_').toLowerCase()}_results.txt`;
    const filepath = path.join(__dirname, 'search_results', filename);

    // Create directory if needed
    await fs.mkdir(path.join(__dirname, 'search_results'), { recursive: true });

    let content = '';

    if (!is_appending) {
        content += `=== GOOGLE SEARCH RESULTS ===\n`;
        content += `Query: ${data.query}\n`;
        content += `Time: ${data.timestamp}\n`;
        content += `Total results: ${data.total_results}\n`;
        content += `${'='.repeat(80)}\n\n`;
    }

    for (const result of data.results) {
        content += `${result.position}. ${result.title}\n`;
        content += `   URL: ${result.link}\n`;
        content += `   Description: ${result.description.substring(0, 200)}...\n`;
        content += `   ${'-'.repeat(80)}\n`;
    }

    content += `\n📄 Page saved: ${new Date().toISOString()}\n`;
    content += `${'='.repeat(80)}\n\n`;

    await fs.writeFile(filepath, content, { flag: is_appending ? 'a' : 'w' });
    console.log(`✅ Results saved to: ${filepath}`);
    return filepath;
}

// ============= OPEN RANDOM RESULT PAGE =============
async function open_random_result(page, results) {
    if (!results || results.length === 0) {
        console.log('No results to open');
        return false;
    }

    // Choose a random result (usually not the first)
    let result_index = 0;
    if (results.length > 1) {
        result_index = Math.random() < 0.7
            ? Math.floor(random_range(1, Math.min(5, results.length)))
            : Math.floor(random_range(0, results.length));
    }

    const selected_result = results[result_index];
    console.log(`Opening result ${result_index + 1}: ${selected_result.title.substring(0, 50)}...`);

    try {
        // Check for captcha before opening
        const has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected, not opening result');
            return false;
        }

        // Open in a new tab
        const new_page = await page.browser().newPage();
        await new_page.goto(selected_result.link, {
            waitUntil: 'domcontentloaded',
            timeout: 20000
        });
        await human_delay(2000, 4000);

        // Check for captcha on the opened page
        const page_has_captcha = await check_for_captcha(new_page);
        if (page_has_captcha) {
            console.log('🚫 Captcha detected on opened page');
            await new_page.close();
            return false;
        }

        // Scroll on the opened page
        await human_scroll(new_page, { scrolls: random_range(2, 5) });
        await human_delay(1500, 3000);

        // Close the tab
        await new_page.close();
        console.log(`✅ Page viewed and closed`);

        return true;
    } catch (error) {
        console.log(`❌ Error opening page: ${error.message}`);
        return false;
    }
}

// ============= CAPTCHA CHECK =============
async function check_for_captcha(page) {
    const captcha_selectors = [
        '#captcha-form',
        '.g-recaptcha',
        'iframe[src*="recaptcha"]',
        'form[action*="captcha"]',
        '#captcha',
        '.captcha',
        'div[jsname="Jai8Rc"]',
        'form[action*="sorry"]'
    ];

    for (const selector of captcha_selectors) {
        const element = await page.$(selector);
        if (element) return true;
    }

    const current_url = page.url();
    if (current_url.includes('sorry') || current_url.includes('captcha')) {
        return true;
    }

    const page_text = await page.evaluate(() => document.body.innerText);
    const captcha_keywords = ['captcha', 'robot', 'verify', 'unusual traffic', 'confirm', 'not a robot'];

    for (const keyword of captcha_keywords) {
        if (page_text.toLowerCase().includes(keyword)) {
            return true;
        }
    }

    return false;
}

// ============= MAIN SEARCH FUNCTION =============
async function google_search_human(page, query, results_data, retry_count = 0) {
    const max_retries = 2;

    console.log(`🔍 Searching: ${query}${retry_count > 0 ? ` (attempt ${retry_count + 1})` : ''}`);

    try {
        // Go to Google homepage
        await page.goto('https://www.google.com', {
            waitUntil: 'domcontentloaded',
            timeout: 30000
        });
        await human_delay(1000, 2000);

        // Check for captcha
        let has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected!');
            return { error: 'captcha', query: query };
        }

        // Accept cookies if present
        try {
            const cookie_button = await page.$('#L2AGLb');
            if (cookie_button) {
                await human_click(page, cookie_button);
                console.log('✅ Cookies accepted');
                await human_delay(500, 1000);
            }
        } catch (error) {
            console.log('No cookie button');
        }

        // Enter search query
        const search_input = await page.$('textarea[name="q"], input[name="q"]');
        if (!search_input) {
            throw new Error('Search input not found');
        }

        await human_type(page, search_input, query, {
            random_mistakes: true,
            backspace_fix: true
        });

        await human_delay(500, 1000);

        // Check for captcha before submitting
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected before submission!');
            return { error: 'captcha', query: query };
        }

        // Press Enter
        console.log('📤 Submitting query...');

        await Promise.all([
            page.waitForNavigation({
                waitUntil: 'domcontentloaded',
                timeout: 15000
            }).catch(e => {
                console.log(`⚠️ Navigation warning: ${e.message}`);
                return null;
            }),
            page.keyboard.press('Enter'),
            human_delay(500, 1000)
        ]);

        // Check for captcha after search
        has_captcha = await check_for_captcha(page);
        if (has_captcha) {
            console.log('🚫 Captcha detected after search!');
            return { error: 'captcha', query: query };
        }

        console.log('⏳ Waiting for results to load...');

        // Wait for results to appear
        try {
            await page.waitForSelector('div.tF2Cxc', {
                timeout: 15000,
                visible: true
            });
            console.log('✅ Results loaded');
        } catch (error) {
            console.log('⚠️ Results not found, continuing...');
        }

        await human_delay(1500, 2500);

        // Scroll through results
        console.log('📜 Scrolling through results...');
        await human_scroll(page, { scrolls: random_range(4, 8) });

        // Parse results
        console.log('📊 Parsing results...');
        const parsed_results = await parse_search_results(page, query);

        if (parsed_results.results.length === 0 && retry_count < max_retries) {
            console.log('⚠️ No results found, retrying...');
            await human_delay(2000, 3000);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        // Save results
        const is_appending = results_data.has_results;
        await save_results_to_file(query, parsed_results, is_appending);
        results_data.has_results = true;
        results_data.all_results.push(...parsed_results.results);

        // Open 1-2 random result pages
        if (parsed_results.results.length > 0) {
            const pages_to_open = Math.floor(random_range(1, Math.min(3, parsed_results.results.length)));
            console.log(`📖 Opening ${pages_to_open} result pages...`);

            for (let i = 0; i < pages_to_open; i++) {
                await open_random_result(page, parsed_results.results);
                await human_delay(1000, 2000);

                // Return to results page
                const current_url = page.url();
                if (!current_url.includes('google.com/search')) {
                    try {
                        await page.goBack({ waitUntil: 'domcontentloaded', timeout: 10000 });
                        await human_delay(1000, 1500);
                    } catch (error) {
                        console.log('⚠️ Could not go back');
                        await page.reload({ waitUntil: 'domcontentloaded' });
                    }
                }
            }
        }

        console.log(`✅ Search "${query}" completed, found ${parsed_results.results.length} results`);
        return { success: true, query: query, results: parsed_results.results };

    } catch (error) {
        console.error(`❌ Error during search "${query}": ${error.message}`);

        const has_captcha = await check_for_captcha(page).catch(() => false);
        if (has_captcha) {
            console.log('🚫 Error caused by captcha');
            return { error: 'captcha', query: query };
        }

        if (retry_count < max_retries) {
            console.log(`🔄 Retrying in 5 seconds...`);
            await sleep(5);
            return await google_search_human(page, query, results_data, retry_count + 1);
        }

        return { error: 'timeout', query: query };
    }
}

// ============= OCTO FUNCTIONS =============
async function check_limits(response) {
    function parse_int_safe(value) {
        const parsed = parseInt(value, 10);
        return isNaN(parsed) ? 0 : parsed;
    }
    const ratelimit_header = response.headers.ratelimit;
    if (!ratelimit_header) {
        console.warn('No ratelimit header found!');
        return;
    }
    const limit_entries = ratelimit_header.split(',').map(entry => entry.trim());
    for (const entry of limit_entries) {
        const name_match = entry.match(/^([^;]+)/);
        const r_match = entry.match(/;r=(\d+)/);
        const t_match = entry.match(/;t=(\d+)/);
        if (!r_match || !t_match) {
            console.warn(`Invalid ratelimit format: ${entry}`);
            continue;
        }
        const limit_name = name_match ? name_match[1] : 'unknown_limit';
        const remaining_quantity = parse_int_safe(r_match[1]);
        const window_seconds = parse_int_safe(t_match[1]);
        if (remaining_quantity < 5) {
            const wait_time = window_seconds + 1;
            console.log(`Waiting ${wait_time} seconds due to ${limit_name} limit`);
            await sleep(wait_time);
        }
    }
}

function parse_proxy(proxy) {
    const regex = /^(\w+):\/\/(?:([^:]+):([^@]+)@)?([^:]+):(\d+)$/;
    const match = proxy.match(regex);
    if (!match) return null;
    const [, type, login, password, host, port] = match;
    return { type, host, port, login: login || null, password: password || null };
}

async function octo_one_time_profile(config, proxy) {
    const one_time_profile_config = {
        method: "post",
        url: `${config.octo_local_api_base_url}/one_time/start`,
        headers: {
            'Content-Type': 'application/json'
        },
        data: {
            "profile_data": {
                "fingerprint": {
                    "os": Math.random() < 0.5 ? "win" : "mac"
                },
                "proxy": proxy,
                "images_load_limit": 10240,
            },
            "headless": config.headless_mode,
            "debug_port": true,
            "timeout": 60
        }
    }
    const response = await axios(one_time_profile_config);
    await check_limits(response);
    return response;
}


// ============= MAIN PROCESS =============
(async () => {
    console.log('🚀 Starting Google Scraper with Human-like Behavior...');
    console.log('🛡️ Captcha detection enabled - profiles with captcha will be skipped\n');

    const proxy_count = config.proxies.length;
    const all_queries = config.google_search_queries;
    const query_batches = distribute_queries(all_queries, proxy_count);

    console.log(`Total proxies: ${proxy_count}`);
    console.log(`Total search queries: ${all_queries.length}`);
    console.log('Query distribution:');
    query_batches.forEach((batch, idx) => {
        console.log(`  Profile ${idx + 1}: ${batch.length} queries - ${batch.join(', ')}`);
    });
    console.log('');

    let successful_profiles = 0;
    let skipped_profiles = 0;
    let failed_profiles = 0;

    for (let i = 0; i < proxy_count; i++) {
        console.log(`\n${'='.repeat(80)}`);
        console.log(`📋 Processing profile ${i + 1}/${proxy_count}`);
        console.log(`${'='.repeat(80)}`);

        const queries_for_this_profile = query_batches[i];
        if (queries_for_this_profile.length === 0) {
            console.log(`⚠️ No queries assigned to profile ${i + 1}, skipping.`);
            continue;
        }

        let parsed_proxy = parse_proxy(config.proxies[i]);
        if (!parsed_proxy) {
            console.error(`❌ Failed to parse proxy: ${config.proxies[i]}`);
            failed_profiles++;
            continue;
        }

        console.log(`🔧 Creating and starting One Time Profile with proxy: ${parsed_proxy.host}:${parsed_proxy.port}`);
        let ws_endpoint;

        try {
            ws_endpoint = await octo_one_time_profile(config, parsed_proxy);
        } catch (error) {
            console.error(`❌ Failed to create or start profile: ${error.message}`);
            failed_profiles++;
            continue;
        }

        if (!ws_endpoint || !ws_endpoint.data.ws_endpoint || !ws_endpoint.data.uuid) {
            console.error('❌ Failed to create or start profile');
            failed_profiles++;
            continue;
        }

        console.log(`✅ Profile created and started: ${ws_endpoint.data.uuid}`);

        console.log(`🌐 Connecting to browser`);

        let browser;
        try {
            browser = await puppeteer.connect({
                browserWSEndpoint: ws_endpoint.data.ws_endpoint,
                defaultViewport: null
            });
        } catch (error) {
            console.error(`❌ Failed to connect to browser: ${error.message}`);
            await kill_browser(ws_endpoint.data.browser_pid);
            continue;
        }

        const page = await browser.newPage();

        const results_data = {
            has_results: false,
            all_results: []
        };

        let captcha_detected = false;

        // Execute only the queries assigned to this profile
        for (let j = 0; j < queries_for_this_profile.length; j++) {
            const query = queries_for_this_profile[j];

            try {
                const search_result = await google_search_human(page, query, results_data);

                if (search_result.error === 'captcha') {
                    console.log(`\n🚨 CAPTCHA DETECTED! Skipping profile ${ws_endpoint.data.uuid}`);
                    captcha_detected = true;
                    break;
                }

                if (j < queries_for_this_profile.length - 1 && !captcha_detected) {
                    const delay_between = random_range(5, 10);
                    console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next search...`);
                    await sleep(delay_between);
                }

            } catch (error) {
                console.error(`❌ Error during search "${query}": ${error.message}`);
            }
        }

        console.log(`🛑 Stopping profile...`);
        await kill_browser(ws_endpoint.data.browser_pid);

        if (captcha_detected) {
            console.log(`⏭️ Profile ${ws_endpoint.data.uuid} skipped due to captcha`);
            skipped_profiles++;
        } else if (results_data.all_results.length > 0) {
            const summary_filename = `summary_${ws_endpoint.data.uuid}_${Date.now()}.txt`;
            const summary_path = path.join(__dirname, 'search_results', summary_filename);

            let summary_content = `=== SEARCH SUMMARY ===\n`;
            summary_content += `Profile: ${ws_endpoint.data.uuid}\n`;
            summary_content += `Proxy: ${parsed_proxy.host}:${parsed_proxy.port}\n`;
            summary_content += `Queries executed: ${queries_for_this_profile.length}\n`;
            summary_content += `Queries: ${queries_for_this_profile.join(', ')}\n`;
            summary_content += `Total results collected: ${results_data.all_results.length}\n`;
            summary_content += `Time: ${new Date().toISOString()}\n`;
            summary_content += `${'='.repeat(80)}\n\n`;

            await fs.writeFile(summary_path, summary_content);
            console.log(`\n📊 Summary saved: ${summary_path}`);
            successful_profiles++;
        } else {
            console.log(`⚠️ Profile ${ws_endpoint.data.uuid} finished without results`);
            failed_profiles++;
        }

        console.log(`✅ Profile ${i + 1} completed`);

        if (i < proxy_count - 1) {
            const delay_between = random_range(10, 20);
            console.log(`\n⏰ Waiting ${delay_between.toFixed(1)} seconds before next profile...`);
            await sleep(delay_between);
        }
    }

    console.log(`\n${'='.repeat(80)}`);
    console.log(`📊 FINAL STATISTICS:`);
    console.log(`${'='.repeat(80)}`);
    console.log(`✅ Successful profiles: ${successful_profiles}`);
    console.log(`⏭️ Skipped due to captcha: ${skipped_profiles}`);
    console.log(`❌ Failed profiles: ${failed_profiles}`);
    console.log(`📁 All results saved in "search_results" folder`);
    console.log(`\n🎉 Google Scraper finished!`);
})();

随时获取最新的Octo Browser新闻

通过点击按钮,您同意我们的 隐私政策

随时获取最新的Octo Browser新闻

通过点击按钮,您同意我们的 隐私政策

随时获取最新的Octo Browser新闻

通过点击按钮,您同意我们的 隐私政策

立即加入Octo Browser

或者随时联系客户服务,如果您有任何问题。

立即加入Octo Browser

或者随时联系客户服务,如果您有任何问题。

立即加入Octo Browser

或者随时联系客户服务,如果您有任何问题。

©

2026年

Octo Browser

©

2026年

Octo Browser

©

2026年

Octo Browser