How to scrape Twitter (X.com) — a step-by-step guide

On X.com you can track market trends and consumer behavior from user posts. But the platform’s official API has limits, so web-scraping specialists collect information from the social media platform using browser automation. In this article we explain how to scrape tweets, profiles, search results, replies, and timelines on X.com using the Playwright library and a headless browser.

Why scrape Twitter (X.com)

Twitter is a source of data on user behavior, opinions, and current trends. By scraping X.com you can:

Analyze your competitors. Collect data from competitors’ profiles on Twitter to monitor their marketing strategies, product announcements, and audience reactions in real time.
Discover trends. Gather popular hashtags on Twitter to quickly adapt your content or product to new trends.
Study consumers. Scraping reviews, brand mentions and replies on Twitter lets you understand customers’ pain points and expectations, which helps improve the product and increase sales.

Legal and ethical aspects

X.com forbids scraping its data without permission. Violating these rules can lead to account and IP bans. However, case law — notably in the U.S. — recognizes scraping publicly available data as lawful. For example, in 2022 the U.S. Ninth Circuit Court of Appeals confirmed that scraping publicly accessible information does not violate the Computer Fraud and Abuse Act (CFAA).

To further protect yourself, it’s better to:

scrape only publicly available data;
avoid scraping private profiles and direct messages on Twitter;
respect an allowed request rate while scraping so you don’t overload X.com’s servers.

Bypassing X.com blocks

X.com’s security systems analyze user behavior across many parameters. Therefore, for maximum protection against bans for web scraping use every available measure:

Take reasonable pauses between requests while scraping. It’s important not only to avoid overloading X.com’s servers, but also to introduce random intervals between requests so they resemble human activity.
Use high-quality proxies. Security systems may block an IP address that sends too many requests. For safe scraping, use proxies and limit the number of requests originating from a single address.
Use an anti-detect browser. Twitter identifies users not only by their behavior and IP addresses, but also by digital fingerprints, unique combinations of dozens of device parameters. A fingerprint includes many settings: OS version, geolocation, timezone and languages, installed fonts and extensions, and many more. Therefore, when scraping, combine proxies with an anti-detect browser. In such a browser you can create virtual profiles with different fingerprints, and each will look to X.com as a separate user rather than a single scraping bot.

Important: use a separate virtual profile for each X.com account and connect different proxies with different IPs to each. This prevents Twitter’s security systems from linking your accounts by identical device settings or IPs and blocking you for scraping.

Properly prepare your profiles. For Twitter to trust your accounts, they need cookies. If you are registering a new account from scratch, prepare the virtual profile using Octo Browser’s Cookie Robot. If you scrape using pre-existing X.com accounts, export cookies from your previous browser and import them into Octo.

How to scrape tweets

To scrape posts from X.com, you need to load web pages through a browser in headless mode (for example, Octo) and intercept background requests. Here’s what tweet scraping using the open-source Playwright library looks like:

from playwright.sync_api import sync_playwright
def scrape_tweet(url: str) -> dict:
    """
    Scrape a single tweet page for Tweet thread e.g.:
    https://twitter.com/Scrapfly_dev/status/1667013143904567296
    Return parent tweet, reply tweets and recommended tweets
    """
    _xhr_calls = []
    def intercept_response(response):
        """capture all background requests and save them"""
        # we can extract details from background requests
        if response.request.resource_type == "xhr":
            _xhr_calls.append(response)
        return response
    with sync_playwright() as pw:
        browser = pw.chromium.launch(headless=False)
        context = browser.new_context(viewport={"width": 1920, "height": 1080})
        page = context.new_page()
        # enable background request intercepting:
        page.on("response", intercept_response)
        # go to url and wait for the page to load
        page.goto(url)
        page.wait_for_selector("[data-testid='tweet']")
        # find all tweet background requests:
        tweet_calls = [f for f in _xhr_calls if "TweetResultByRestId" in f.url]
        for xhr in tweet_calls:
            data = xhr.json()
            return data['data']['tweetResult']['result']
if __name__ == "__main__":    print(scrape_tweet("https://twitter.com/Scrapfly_dev/status/1664267318053179398"))

The script loads a tweet through a headless browser and intercepts background requests. It then filters those that contain tweet data.

Note: For scraping you must wait until tweets appear on the HTML page — this indicates the background requests have finished.

How to scrape profiles

You can scrape X.com user profiles the same way as tweets: by capturing background requests in a headless browser. Use the following algorithm to get profile metadata:

Log into a Twitter account.
Open the user’s page on X.com.
Extract name, bio/description, follower count and account creation date.
Add delays so Twitter won’t flag the scraper for suspicious activity.

How to scrape search, replies, and timelines

Using the Playwright library you can scrape even the dynamic parts of X.com:

Search. The script simulates typing a query into Twitter’s search box and presses Enter. Then it scrolls and extracts data to scrape as many relevant posts as possible for the keyword.
Replies. To get replies to a specific post, Playwright opens that post’s page. The script focuses on the comments area, scrolls to load the full thread, and scrapes reply text and the authors’ names.
Timelines. The script opens the profile’s main page and scrolls in a loop to scrape all recent posts from the user.

Data storage and export

After scraping, structure the collected data. Playwright extracts data from HTML, so the output will be a list of structured records (for example, Python dictionaries). You can export them to a spreadsheet format such as CSV or Excel so they can be loaded into analytics tools later.

Analyzing collected data

Once data is scraped, you need to analyze it. Depending on your goals, you can use different methods:

Sentiment analysis. Assess the emotional tone of posts and replies to understand how users feel about your product or competitors — positive, negative, or neutral.
Clustering. Group collected posts by topics. For a product company clusters might look like: “delivery complaints,” “positive product reviews,” “feature requests.”
Influencer identification. Find users with large follower counts and high engagement who discuss your niche. You can then reach out to them for collaboration and potentially make them brand advocates.

FAQ

Is scraping Twitter (X.com) legal?

Scraping publicly available data itself is not inherently illegal. However, X.com forbids scraping. Technically you can scrape the data, but Twitter has the right to block your account or IP address for violating platform rules.

Can you scrape Twitter using Python?

Yes, Python is a popular language for web scraping automation. You can scrape with libraries like Playwright, which help bypass limitations of the official Twitter API.

How to scrape Twitter without getting blocked?

To reduce the risk of bans for web scraping you should:

Use proxies.
Use an anti-detect browser (for example, Octo Browser) to create profiles with different digital fingerprints so X.com’s security systems can’t trace your activity back to a single user.
Add random, human-like delays between requests.
Save cookies in an anti-detect browser.

Why scrape Twitter (X.com)

Twitter is a source of data on user behavior, opinions, and current trends. By scraping X.com you can:

Analyze your competitors. Collect data from competitors’ profiles on Twitter to monitor their marketing strategies, product announcements, and audience reactions in real time.
Discover trends. Gather popular hashtags on Twitter to quickly adapt your content or product to new trends.
Study consumers. Scraping reviews, brand mentions and replies on Twitter lets you understand customers’ pain points and expectations, which helps improve the product and increase sales.

Legal and ethical aspects

X.com forbids scraping its data without permission. Violating these rules can lead to account and IP bans. However, case law — notably in the U.S. — recognizes scraping publicly available data as lawful. For example, in 2022 the U.S. Ninth Circuit Court of Appeals confirmed that scraping publicly accessible information does not violate the Computer Fraud and Abuse Act (CFAA).

To further protect yourself, it’s better to:

scrape only publicly available data;
avoid scraping private profiles and direct messages on Twitter;
respect an allowed request rate while scraping so you don’t overload X.com’s servers.

Bypassing X.com blocks

X.com’s security systems analyze user behavior across many parameters. Therefore, for maximum protection against bans for web scraping use every available measure:

Take reasonable pauses between requests while scraping. It’s important not only to avoid overloading X.com’s servers, but also to introduce random intervals between requests so they resemble human activity.
Use high-quality proxies. Security systems may block an IP address that sends too many requests. For safe scraping, use proxies and limit the number of requests originating from a single address.
Use an anti-detect browser. Twitter identifies users not only by their behavior and IP addresses, but also by digital fingerprints, unique combinations of dozens of device parameters. A fingerprint includes many settings: OS version, geolocation, timezone and languages, installed fonts and extensions, and many more. Therefore, when scraping, combine proxies with an anti-detect browser. In such a browser you can create virtual profiles with different fingerprints, and each will look to X.com as a separate user rather than a single scraping bot.

Important: use a separate virtual profile for each X.com account and connect different proxies with different IPs to each. This prevents Twitter’s security systems from linking your accounts by identical device settings or IPs and blocking you for scraping.

Properly prepare your profiles. For Twitter to trust your accounts, they need cookies. If you are registering a new account from scratch, prepare the virtual profile using Octo Browser’s Cookie Robot. If you scrape using pre-existing X.com accounts, export cookies from your previous browser and import them into Octo.

How to scrape tweets

To scrape posts from X.com, you need to load web pages through a browser in headless mode (for example, Octo) and intercept background requests. Here’s what tweet scraping using the open-source Playwright library looks like:

from playwright.sync_api import sync_playwright
def scrape_tweet(url: str) -> dict:
    """
    Scrape a single tweet page for Tweet thread e.g.:
    https://twitter.com/Scrapfly_dev/status/1667013143904567296
    Return parent tweet, reply tweets and recommended tweets
    """
    _xhr_calls = []
    def intercept_response(response):
        """capture all background requests and save them"""
        # we can extract details from background requests
        if response.request.resource_type == "xhr":
            _xhr_calls.append(response)
        return response
    with sync_playwright() as pw:
        browser = pw.chromium.launch(headless=False)
        context = browser.new_context(viewport={"width": 1920, "height": 1080})
        page = context.new_page()
        # enable background request intercepting:
        page.on("response", intercept_response)
        # go to url and wait for the page to load
        page.goto(url)
        page.wait_for_selector("[data-testid='tweet']")
        # find all tweet background requests:
        tweet_calls = [f for f in _xhr_calls if "TweetResultByRestId" in f.url]
        for xhr in tweet_calls:
            data = xhr.json()
            return data['data']['tweetResult']['result']
if __name__ == "__main__":    print(scrape_tweet("https://twitter.com/Scrapfly_dev/status/1664267318053179398"))

The script loads a tweet through a headless browser and intercepts background requests. It then filters those that contain tweet data.

Note: For scraping you must wait until tweets appear on the HTML page — this indicates the background requests have finished.

How to scrape profiles

You can scrape X.com user profiles the same way as tweets: by capturing background requests in a headless browser. Use the following algorithm to get profile metadata:

Log into a Twitter account.
Open the user’s page on X.com.
Extract name, bio/description, follower count and account creation date.
Add delays so Twitter won’t flag the scraper for suspicious activity.

How to scrape search, replies, and timelines

Using the Playwright library you can scrape even the dynamic parts of X.com:

Search. The script simulates typing a query into Twitter’s search box and presses Enter. Then it scrolls and extracts data to scrape as many relevant posts as possible for the keyword.
Replies. To get replies to a specific post, Playwright opens that post’s page. The script focuses on the comments area, scrolls to load the full thread, and scrapes reply text and the authors’ names.
Timelines. The script opens the profile’s main page and scrolls in a loop to scrape all recent posts from the user.

Data storage and export

After scraping, structure the collected data. Playwright extracts data from HTML, so the output will be a list of structured records (for example, Python dictionaries). You can export them to a spreadsheet format such as CSV or Excel so they can be loaded into analytics tools later.

Analyzing collected data

Once data is scraped, you need to analyze it. Depending on your goals, you can use different methods:

Sentiment analysis. Assess the emotional tone of posts and replies to understand how users feel about your product or competitors — positive, negative, or neutral.
Clustering. Group collected posts by topics. For a product company clusters might look like: “delivery complaints,” “positive product reviews,” “feature requests.”
Influencer identification. Find users with large follower counts and high engagement who discuss your niche. You can then reach out to them for collaboration and potentially make them brand advocates.

FAQ

Is scraping Twitter (X.com) legal?

Scraping publicly available data itself is not inherently illegal. However, X.com forbids scraping. Technically you can scrape the data, but Twitter has the right to block your account or IP address for violating platform rules.

Can you scrape Twitter using Python?

Yes, Python is a popular language for web scraping automation. You can scrape with libraries like Playwright, which help bypass limitations of the official Twitter API.

How to scrape Twitter without getting blocked?

To reduce the risk of bans for web scraping you should:

Use proxies.
Use an anti-detect browser (for example, Octo Browser) to create profiles with different digital fingerprints so X.com’s security systems can’t trace your activity back to a single user.
Add random, human-like delays between requests.
Save cookies in an anti-detect browser.

How to scrape Twitter (X.com) — a step-by-step guide

Palina Zabela

Contents

Why scrape Twitter (X.com)

Legal and ethical aspects

Bypassing X.com blocks

How to scrape tweets

How to scrape profiles

How to scrape search, replies, and timelines

Data storage and export

Analyzing collected data

FAQ

Is scraping Twitter (X.com) legal?

Can you scrape Twitter using Python?

How to scrape Twitter without getting blocked?

Why scrape Twitter (X.com)

Legal and ethical aspects

Bypassing X.com blocks

How to scrape tweets

How to scrape profiles

How to scrape search, replies, and timelines

Data storage and export

Analyzing collected data

FAQ

Is scraping Twitter (X.com) legal?

Can you scrape Twitter using Python?

How to scrape Twitter without getting blocked?

Related articles

How to Scrape LinkedIn Data — Octo Browser

How to Scrape LinkedIn Data — Octo Browser

How to earn money with web scraping in 2025?

How to earn money with web scraping in 2025?

What is Web Scraping and How Does It Work?

What is Web Scraping and How Does It Work?

How to Scrape LinkedIn Data — Octo Browser

How to earn money with web scraping in 2025?

Join Octo Browser now

Join Octo Browser now

Join Octo Browser now