Social Media Scraping - Rilo Documentation

Rilo’s SocialScrapingLibrary provides tools to scrape public data from major social media platforms. Perfect for social listening, content monitoring, and data collection.

Overview

SocialScrapingLibrary supports:

Reddit: Posts, comments, subreddits
Twitter/X: Tweets, profiles, searches
LinkedIn: Public profiles and posts
Instagram: Posts, profiles, reels, hashtags
TikTok: Videos, profiles, hashtags
YouTube: Videos, channels, comments
Facebook: Posts, profiles, groups

Social scraping tools can only access public data. Private accounts, DMs, and restricted content are not accessible. This is for privacy and terms-of-service compliance.

Supported Platforms

Tool: RedditScraperTool from SocialScrapingLibrary Capabilities:

Get best/hot/new/rising/controversial posts from subreddits
Search posts by keyword with filters
Get post comments with sorting
Filter by upvotes, time, and more
Pagination support

Available Methods:

Method	Description	Key Parameters
`get_posts_best`	Get best posts from subreddits	`subreddits`, `limit`, `min_upvotes`
`get_posts_hot`	Get hot posts	`subreddits`, `limit`
`get_posts_new`	Get newest posts	`subreddits`, `limit`
`get_posts_rising`	Get rising posts	`subreddits`, `limit`
`get_posts_controversial`	Get controversial posts	`subreddits`, `time_filter`, `limit`
`search_posts`	Search posts by keyword	`query`, `subreddits`, `sort`, `time_filter`
`get_post_comments`	Get post comments	`post_url`, `sort`, `limit`

Example: Get Best Posts

from library.social_scraping_library import SocialScrapingLibrary

scraper = SocialScrapingLibrary()
result = scraper.get_posts_best(
    subreddits=["r/python", "r/programming"],
    limit=10,
    min_upvotes=50
)

Example: Search with Filters

result = scraper.search_posts(
    query="machine learning",
    subreddits=["r/MachineLearning", "r/learnmachinelearning"],
    sort="top",
    time_filter="week",
    limit=20
)

Sorting Options:

relevance - Most relevant (default for search)
hot - Currently trending
top - Highest voted
new - Most recent
comments - Most commented

Time Filters:

hour, day, week, month, year, all

Important Notes:

Subreddits: Use “r/subreddit” format (e.g., “r/python”)
Pagination: Check has_more, use after cursor for next page
Rate limits: Reddit API has rate limits, cache is used automatically

Twitter/X

Tool: TwitterScraperTool from SocialScrapingLibrary Capabilities:

Get tweets from profiles with filtering
Advanced search with operators
Get tweet details and replies
Get trending topics
Track hashtags
Profile information

Available Methods:

Method	Description	Key Parameters
`get_tweets`	Get tweets from a profile	`username`, `limit`, `replies`, `since`, `until`
`search_tweets`	Advanced search with operators	`query`, `limit`, `since`, `until`
`get_tweet_details`	Get single tweet details	`tweet_url`
`get_tweet_replies`	Get tweet replies with pagination	`tweet_url`, `cursor`
`get_profile_info`	Get profile information	`username`
`get_trending_topics`	Get trending topics	`country_code`, `limit`
`get_hashtag_tweets`	Track hashtag	`hashtag`, `limit`

Twitter Search Operators:

Operator	Example	Description
`"phrase"`	`"machine learning"`	Exact phrase match
`from:`	`from:OpenAI`	Tweets by user
`to:`	`to:elonmusk`	Replies to user
`since:`	`since:2024-01-01`	On or after date
`until:`	`until:2024-12-31`	Before date
`min_faves:`	`min_faves:100`	Minimum likes
`min_retweets:`	`min_retweets:50`	Minimum retweets
`filter:images`	`AI filter:images`	Has images
`filter:videos`	`AI filter:videos`	Has videos
`filter:links`	`AI filter:links`	Has URLs
`-`	`AI -crypto`	Exclude term
`OR`	`(ChatGPT OR Claude)`	Either term

Example: Get User Tweets

result = scraper.get_tweets(
    username="OpenAI",
    limit=20,
    replies=False,  # Exclude replies
    since="2024-01-01"
)

Example: Advanced Search

result = scraper.search_tweets(
    query="artificial intelligence min_faves:100 filter:images since:2024-01-01",
    limit=50
)

Example: Track Hashtag

result = scraper.get_hashtag_tweets(
    hashtag="AI",  # Without # symbol
    limit=30
)

Example: Get Trending Topics

result = scraper.get_trending_topics(
    country_code="US",  # US, GB, CA, etc.
    limit=30
)

Example: Paginated Replies

# First page
result = scraper.get_tweet_replies(
    tweet_url="https://twitter.com/user/status/123456789"
)
replies = result["replies"]

# Next page
if result.get("has_next_page"):
    next_result = scraper.get_tweet_replies(
        tweet_url="https://twitter.com/user/status/123456789",
        cursor=result["next_cursor"]
    )

Important Notes:

Usernames: WITHOUT the @ symbol (use “OpenAI” not “@OpenAI”)
Hashtags in config: WITHOUT # symbol (use “AI” not “#AI”)
Hashtags in search query: WITH # symbol (use “#AI” in query string)
Pagination: Check has_next_page, use next_cursor for subsequent pages

Limitations:

Public tweets only
Rate limits apply (cached automatically)
Trending topics: Returns ~30 topics per country
Tweet replies: Pagination required for large threads

Capabilities:

Get public profile information
Enrich company data
Search profiles
Get profile posts

Example:

result = scraper.get_profile_info(
    profile_url="https://linkedin.com/in/example"
)

Instagram

Capabilities:

Get profile posts and reels
Search reels by keyword
Get post comments
Search hashtags and locations

Example:

result = scraper.get_profile_posts(
    handle="example_user",
    limit=20
)

TikTok

Capabilities:

Get profile videos
Search videos by hashtag or keyword
Get video details and comments
Fetch profile information

Example:

result = scraper.get_profile_videos(
    username="example_user",
    limit=10
)

YouTube

Capabilities:

Get video metadata and transcripts
Fetch channel info and videos
Search videos by keyword
Get video comments

Example:

result = scraper.get_video_details(
    video_url="https://youtube.com/watch?v=..."
)

Facebook

Capabilities:

Get posts from profiles/pages
Get group posts with sorting
Get post comments
Fetch profile/page information

Example:

result = scraper.get_profile_posts(
    profile_url="https://facebook.com/example",
    limit=20
)

Public Data Only

All social scraping tools can only access public data. This is a hard limitation for privacy and terms-of-service compliance.

What You Can Access

✅ Public posts and tweets ✅ Public profiles and pages ✅ Public comments and replies ✅ Public groups and communities ✅ Public hashtags and searches

What You Cannot Access

❌ Private accounts ❌ Direct messages (DMs) ❌ Restricted content ❌ Private groups ❌ Content requiring login

Caching

Social scraping tools use caching to reduce API calls and improve performance.

Cache TTL

Different methods have different cache durations:

TTL	Methods
300s (5 min)	Profile info, post details
120s (2 min)	Posts, comments, feeds
60s (1 min)	Search results

Bypassing Cache

Force fresh data when needed:

result = scraper.get_profile_posts(
    handle="example_user",
    bypass_cache=True  # Force fresh data
)

Use bypass_cache=True when you need real-time data, such as checking for new posts immediately after they’re published.

Batch Operations

Use batch methods for multiple items - significantly faster than sequential calls.

Example: Batch Profile Info

# BAD - Sequential (slow)
for handle in ["user1", "user2", "user3"]:
    result = scraper.get_profile_info(handle=handle)

# GOOD - Parallel (10x faster)
results = scraper.get_profile_info_batch(
    handles=["user1", "user2", "user3"],
    max_workers=10
)

Available Batch Methods

Most tools support batch operations:

get_profile_info_batch
get_profile_posts_batch
get_post_details_batch
get_post_comments_batch

Pagination

Many methods support pagination for large result sets.

Example: Paginated Posts

# First page
result1 = scraper.get_profile_posts(handle="user", limit=20)
posts = result1["items"]

# Next page if available
if result1.get("has_more") and result1.get("next_max_id"):
    result2 = scraper.get_profile_posts(
        handle="user",
        next_max_id=result1["next_max_id"]
    )
    posts.extend(result2["items"])

Use Cases

Social Listening

Monitor mentions, hashtags, and keywords across platforms.

Content Research

Research trending content and topics.

Competitor Analysis

Track competitor posts and engagement.

Lead Generation

Find potential leads from public profiles.

Best Practices

Use Reasonable Limits

Don’t fetch 100 items just to use 5. Use reasonable limits based on actual needs.

Use Batch Operations

Always use batch methods when processing multiple items. They’re much faster.

Respect Cache

Don’t bypass cache unless you need real-time data. Caching improves performance.

Handle Pagination

Use pagination for large result sets instead of high limits.

Limitations

Platform-Specific Limits

Each platform has its own limitations:

Instagram: Comments endpoint returns max ~300 entries
Facebook: Maximum 100 items per request
Reddit: Rate limits apply to API calls
Twitter: API rate limits and restrictions

General Limitations

Public data only: Cannot access private content
No authentication: Cannot use logged-in features
Rate limits: Platform-specific rate limits apply
Terms of service: Must comply with each platform’s ToS

Always comply with each platform’s Terms of Service. Rilo provides tools, but you’re responsible for how you use them.

Troubleshooting

No results returned

Verify the profile/account is public
Check that the URL/username is correct
Ensure you’re not hitting rate limits
Try bypassing cache

Rate limit errors

Reduce request frequency
Use caching to avoid repeated calls
Implement retry logic with backoff

Private content error

Verify the content is public
Check account privacy settings
Use public alternatives if available

Web Tools - General web scraping capabilities
Integrations - Additional scraping tools via MCP

Social scraping is a powerful feature for data collection and monitoring. Always respect platform terms of service and user privacy.

​Overview

​Supported Platforms

​Reddit

​Twitter/X

​LinkedIn

​Instagram

​TikTok

​YouTube

​Facebook

​Public Data Only

​What You Can Access

​What You Cannot Access

​Caching

​Cache TTL

​Bypassing Cache

​Batch Operations

​Example: Batch Profile Info

​Available Batch Methods

​Pagination

​Example: Paginated Posts

​Use Cases

Social Listening

Content Research

Competitor Analysis

Lead Generation

​Best Practices

​Limitations

​Platform-Specific Limits

​General Limitations

​Troubleshooting

​Related Features

Overview

Supported Platforms

Reddit

Twitter/X

LinkedIn

Instagram

TikTok

YouTube

Facebook

Public Data Only

What You Can Access

What You Cannot Access

Caching

Cache TTL

Bypassing Cache

Batch Operations

Example: Batch Profile Info

Available Batch Methods

Pagination

Example: Paginated Posts

Use Cases

Best Practices

Limitations

Platform-Specific Limits

General Limitations

Troubleshooting

Related Features