Skip to main content
Rilo’s SocialScrapingLibrary provides tools to scrape public data from major social media platforms. Perfect for social listening, content monitoring, and data collection.

Overview

SocialScrapingLibrary supports:
  • Reddit: Posts, comments, subreddits
  • Twitter/X: Tweets, profiles, searches
  • LinkedIn: Public profiles and posts
  • Instagram: Posts, profiles, reels, hashtags
  • TikTok: Videos, profiles, hashtags
  • YouTube: Videos, channels, comments
  • Facebook: Posts, profiles, groups
Social scraping tools can only access public data. Private accounts, DMs, and restricted content are not accessible. This is for privacy and terms-of-service compliance.

Supported Platforms

Reddit

Tool: RedditScraperTool from SocialScrapingLibrary Capabilities:
  • Get best/hot/new/rising/controversial posts from subreddits
  • Search posts by keyword with filters
  • Get post comments with sorting
  • Filter by upvotes, time, and more
  • Pagination support
Available Methods:
MethodDescriptionKey Parameters
get_posts_bestGet best posts from subredditssubreddits, limit, min_upvotes
get_posts_hotGet hot postssubreddits, limit
get_posts_newGet newest postssubreddits, limit
get_posts_risingGet rising postssubreddits, limit
get_posts_controversialGet controversial postssubreddits, time_filter, limit
search_postsSearch posts by keywordquery, subreddits, sort, time_filter
get_post_commentsGet post commentspost_url, sort, limit
Example: Get Best Posts
from library.social_scraping_library import SocialScrapingLibrary

scraper = SocialScrapingLibrary()
result = scraper.get_posts_best(
    subreddits=["r/python", "r/programming"],
    limit=10,
    min_upvotes=50
)
Example: Search with Filters
result = scraper.search_posts(
    query="machine learning",
    subreddits=["r/MachineLearning", "r/learnmachinelearning"],
    sort="top",
    time_filter="week",
    limit=20
)
Sorting Options:
  • relevance - Most relevant (default for search)
  • hot - Currently trending
  • top - Highest voted
  • new - Most recent
  • comments - Most commented
Time Filters:
  • hour, day, week, month, year, all
Important Notes:
  • Subreddits: Use “r/subreddit” format (e.g., “r/python”)
  • Pagination: Check has_more, use after cursor for next page
  • Rate limits: Reddit API has rate limits, cache is used automatically

Twitter/X

Tool: TwitterScraperTool from SocialScrapingLibrary Capabilities:
  • Get tweets from profiles with filtering
  • Advanced search with operators
  • Get tweet details and replies
  • Get trending topics
  • Track hashtags
  • Profile information
Available Methods:
MethodDescriptionKey Parameters
get_tweetsGet tweets from a profileusername, limit, replies, since, until
search_tweetsAdvanced search with operatorsquery, limit, since, until
get_tweet_detailsGet single tweet detailstweet_url
get_tweet_repliesGet tweet replies with paginationtweet_url, cursor
get_profile_infoGet profile informationusername
get_trending_topicsGet trending topicscountry_code, limit
get_hashtag_tweetsTrack hashtaghashtag, limit
Twitter Search Operators:
OperatorExampleDescription
"phrase""machine learning"Exact phrase match
from:from:OpenAITweets by user
to:to:elonmuskReplies to user
since:since:2024-01-01On or after date
until:until:2024-12-31Before date
min_faves:min_faves:100Minimum likes
min_retweets:min_retweets:50Minimum retweets
filter:imagesAI filter:imagesHas images
filter:videosAI filter:videosHas videos
filter:linksAI filter:linksHas URLs
-AI -cryptoExclude term
OR(ChatGPT OR Claude)Either term
Example: Get User Tweets
result = scraper.get_tweets(
    username="OpenAI",
    limit=20,
    replies=False,  # Exclude replies
    since="2024-01-01"
)
Example: Advanced Search
result = scraper.search_tweets(
    query="artificial intelligence min_faves:100 filter:images since:2024-01-01",
    limit=50
)
Example: Track Hashtag
result = scraper.get_hashtag_tweets(
    hashtag="AI",  # Without # symbol
    limit=30
)
Example: Get Trending Topics
result = scraper.get_trending_topics(
    country_code="US",  # US, GB, CA, etc.
    limit=30
)
Example: Paginated Replies
# First page
result = scraper.get_tweet_replies(
    tweet_url="https://twitter.com/user/status/123456789"
)
replies = result["replies"]

# Next page
if result.get("has_next_page"):
    next_result = scraper.get_tweet_replies(
        tweet_url="https://twitter.com/user/status/123456789",
        cursor=result["next_cursor"]
    )
Important Notes:
  • Usernames: WITHOUT the @ symbol (use “OpenAI” not “@OpenAI”)
  • Hashtags in config: WITHOUT # symbol (use “AI” not “#AI”)
  • Hashtags in search query: WITH # symbol (use “#AI” in query string)
  • Pagination: Check has_next_page, use next_cursor for subsequent pages
Limitations:
  • Public tweets only
  • Rate limits apply (cached automatically)
  • Trending topics: Returns ~30 topics per country
  • Tweet replies: Pagination required for large threads

LinkedIn

Capabilities:
  • Get public profile information
  • Enrich company data
  • Search profiles
  • Get profile posts
Example:
result = scraper.get_profile_info(
    profile_url="https://linkedin.com/in/example"
)

Instagram

Capabilities:
  • Get profile posts and reels
  • Search reels by keyword
  • Get post comments
  • Search hashtags and locations
Example:
result = scraper.get_profile_posts(
    handle="example_user",
    limit=20
)

TikTok

Capabilities:
  • Get profile videos
  • Search videos by hashtag or keyword
  • Get video details and comments
  • Fetch profile information
Example:
result = scraper.get_profile_videos(
    username="example_user",
    limit=10
)

YouTube

Capabilities:
  • Get video metadata and transcripts
  • Fetch channel info and videos
  • Search videos by keyword
  • Get video comments
Example:
result = scraper.get_video_details(
    video_url="https://youtube.com/watch?v=..."
)

Facebook

Capabilities:
  • Get posts from profiles/pages
  • Get group posts with sorting
  • Get post comments
  • Fetch profile/page information
Example:
result = scraper.get_profile_posts(
    profile_url="https://facebook.com/example",
    limit=20
)

Public Data Only

All social scraping tools can only access public data. This is a hard limitation for privacy and terms-of-service compliance.

What You Can Access

✅ Public posts and tweets ✅ Public profiles and pages ✅ Public comments and replies ✅ Public groups and communities ✅ Public hashtags and searches

What You Cannot Access

❌ Private accounts ❌ Direct messages (DMs) ❌ Restricted content ❌ Private groups ❌ Content requiring login

Caching

Social scraping tools use caching to reduce API calls and improve performance.

Cache TTL

Different methods have different cache durations:
TTLMethods
300s (5 min)Profile info, post details
120s (2 min)Posts, comments, feeds
60s (1 min)Search results

Bypassing Cache

Force fresh data when needed:
result = scraper.get_profile_posts(
    handle="example_user",
    bypass_cache=True  # Force fresh data
)
Use bypass_cache=True when you need real-time data, such as checking for new posts immediately after they’re published.

Batch Operations

Use batch methods for multiple items - significantly faster than sequential calls.

Example: Batch Profile Info

# BAD - Sequential (slow)
for handle in ["user1", "user2", "user3"]:
    result = scraper.get_profile_info(handle=handle)

# GOOD - Parallel (10x faster)
results = scraper.get_profile_info_batch(
    handles=["user1", "user2", "user3"],
    max_workers=10
)

Available Batch Methods

Most tools support batch operations:
  • get_profile_info_batch
  • get_profile_posts_batch
  • get_post_details_batch
  • get_post_comments_batch

Pagination

Many methods support pagination for large result sets.

Example: Paginated Posts

# First page
result1 = scraper.get_profile_posts(handle="user", limit=20)
posts = result1["items"]

# Next page if available
if result1.get("has_more") and result1.get("next_max_id"):
    result2 = scraper.get_profile_posts(
        handle="user",
        next_max_id=result1["next_max_id"]
    )
    posts.extend(result2["items"])

Use Cases

Social Listening

Monitor mentions, hashtags, and keywords across platforms.

Content Research

Research trending content and topics.

Competitor Analysis

Track competitor posts and engagement.

Lead Generation

Find potential leads from public profiles.

Best Practices

Don’t fetch 100 items just to use 5. Use reasonable limits based on actual needs.
Always use batch methods when processing multiple items. They’re much faster.
Don’t bypass cache unless you need real-time data. Caching improves performance.
Use pagination for large result sets instead of high limits.

Limitations

Platform-Specific Limits

Each platform has its own limitations:
  • Instagram: Comments endpoint returns max ~300 entries
  • Facebook: Maximum 100 items per request
  • Reddit: Rate limits apply to API calls
  • Twitter: API rate limits and restrictions

General Limitations

  • Public data only: Cannot access private content
  • No authentication: Cannot use logged-in features
  • Rate limits: Platform-specific rate limits apply
  • Terms of service: Must comply with each platform’s ToS
Always comply with each platform’s Terms of Service. Rilo provides tools, but you’re responsible for how you use them.

Troubleshooting

  • Verify the profile/account is public
  • Check that the URL/username is correct
  • Ensure you’re not hitting rate limits
  • Try bypassing cache
  • Reduce request frequency
  • Use caching to avoid repeated calls
  • Implement retry logic with backoff
  • Verify the content is public
  • Check account privacy settings
  • Use public alternatives if available

Social scraping is a powerful feature for data collection and monitoring. Always respect platform terms of service and user privacy.