Skip to main content
Rilo’s SocialScrapingLibrary provides tools to scrape public data from major social media platforms. Perfect for social listening, content monitoring, and data collection.

Overview

SocialScrapingLibrary supports:
  • Reddit: Posts, comments, subreddits
  • Twitter/X: Tweets, profiles, searches
  • LinkedIn: Public profiles and posts
  • Instagram: Posts, profiles, reels, hashtags
  • TikTok: Videos, profiles, hashtags
  • YouTube: Videos, channels, comments
  • Facebook: Posts, profiles, groups
Social scraping tools can only access public data. Private accounts, DMs, and restricted content are not accessible. This is for privacy and terms-of-service compliance.

Supported Platforms

Reddit

Capabilities:
  • Get best posts from subreddits
  • Search posts by keyword
  • Get post comments
  • Filter by upvotes, time, etc.
Example:
from library.social_scraping_library import SocialScrapingLibrary

scraper = SocialScrapingLibrary()
result = scraper.get_posts_best(
    subreddits=["r/python", "r/programming"],
    limit=10,
    min_upvotes=50
)

Twitter/X

Capabilities:
  • Get tweets from profiles
  • Search tweets by keyword
  • Get tweet details and replies
  • Filter by engagement metrics
Example:
result = scraper.get_tweets(
    username="example_user",
    limit=20
)

LinkedIn

Capabilities:
  • Get public profile information
  • Enrich company data
  • Search profiles
  • Get profile posts
Example:
result = scraper.get_profile_info(
    profile_url="https://linkedin.com/in/example"
)

Instagram

Capabilities:
  • Get profile posts and reels
  • Search reels by keyword
  • Get post comments
  • Search hashtags and locations
Example:
result = scraper.get_profile_posts(
    handle="example_user",
    limit=20
)

TikTok

Capabilities:
  • Get profile videos
  • Search videos by hashtag or keyword
  • Get video details and comments
  • Fetch profile information
Example:
result = scraper.get_profile_videos(
    username="example_user",
    limit=10
)

YouTube

Capabilities:
  • Get video metadata and transcripts
  • Fetch channel info and videos
  • Search videos by keyword
  • Get video comments
Example:
result = scraper.get_video_details(
    video_url="https://youtube.com/watch?v=..."
)

Facebook

Capabilities:
  • Get posts from profiles/pages
  • Get group posts with sorting
  • Get post comments
  • Fetch profile/page information
Example:
result = scraper.get_profile_posts(
    profile_url="https://facebook.com/example",
    limit=20
)

Public Data Only

All social scraping tools can only access public data. This is a hard limitation for privacy and terms-of-service compliance.

What You Can Access

✅ Public posts and tweets ✅ Public profiles and pages ✅ Public comments and replies ✅ Public groups and communities ✅ Public hashtags and searches

What You Cannot Access

❌ Private accounts ❌ Direct messages (DMs) ❌ Restricted content ❌ Private groups ❌ Content requiring login

Caching

Social scraping tools use caching to reduce API calls and improve performance.

Cache TTL

Different methods have different cache durations:
TTLMethods
300s (5 min)Profile info, post details
120s (2 min)Posts, comments, feeds
60s (1 min)Search results

Bypassing Cache

Force fresh data when needed:
result = scraper.get_profile_posts(
    handle="example_user",
    bypass_cache=True  # Force fresh data
)
Use bypass_cache=True when you need real-time data, such as checking for new posts immediately after they’re published.

Batch Operations

Use batch methods for multiple items - significantly faster than sequential calls.

Example: Batch Profile Info

# BAD - Sequential (slow)
for handle in ["user1", "user2", "user3"]:
    result = scraper.get_profile_info(handle=handle)

# GOOD - Parallel (10x faster)
results = scraper.get_profile_info_batch(
    handles=["user1", "user2", "user3"],
    max_workers=10
)

Available Batch Methods

Most tools support batch operations:
  • get_profile_info_batch
  • get_profile_posts_batch
  • get_post_details_batch
  • get_post_comments_batch

Pagination

Many methods support pagination for large result sets.

Example: Paginated Posts

# First page
result1 = scraper.get_profile_posts(handle="user", limit=20)
posts = result1["items"]

# Next page if available
if result1.get("has_more") and result1.get("next_max_id"):
    result2 = scraper.get_profile_posts(
        handle="user",
        next_max_id=result1["next_max_id"]
    )
    posts.extend(result2["items"])

Use Cases

Social Listening

Monitor mentions, hashtags, and keywords across platforms.

Content Research

Research trending content and topics.

Competitor Analysis

Track competitor posts and engagement.

Lead Generation

Find potential leads from public profiles.

Best Practices

Don’t fetch 100 items just to use 5. Use reasonable limits based on actual needs.
Always use batch methods when processing multiple items. They’re much faster.
Don’t bypass cache unless you need real-time data. Caching improves performance.
Use pagination for large result sets instead of high limits.

Limitations

Platform-Specific Limits

Each platform has its own limitations:
  • Instagram: Comments endpoint returns max ~300 entries
  • Facebook: Maximum 100 items per request
  • Reddit: Rate limits apply to API calls
  • Twitter: API rate limits and restrictions

General Limitations

  • Public data only: Cannot access private content
  • No authentication: Cannot use logged-in features
  • Rate limits: Platform-specific rate limits apply
  • Terms of service: Must comply with each platform’s ToS
Always comply with each platform’s Terms of Service. Rilo provides tools, but you’re responsible for how you use them.

Troubleshooting

  • Verify the profile/account is public
  • Check that the URL/username is correct
  • Ensure you’re not hitting rate limits
  • Try bypassing cache
  • Reduce request frequency
  • Use caching to avoid repeated calls
  • Implement retry logic with backoff
  • Verify the content is public
  • Check account privacy settings
  • Use public alternatives if available

Social scraping is a powerful feature for data collection and monitoring. Always respect platform terms of service and user privacy.