Skip to main content
The ChatFeatured AWS Data Firehose integration captures CloudFront real-time logs at the edge and streams them to ChatFeatured. This is ideal for sites using CloudFront CDN and needing enterprise-grade log ingestion. Status: Available | Complexity: Advanced | Setup time: 30-45 minutes | Cost: ~0.160.16-0.50/month for typical sites
Advanced Setup: This integration requires AWS account knowledge and multiple service configurations (Lambda, S3, CloudFront, Data Firehose). We recommend having AWS experience or consulting AWS documentation.

Why AWS Data Firehose?

  • Edge-level tracking — Captures logs at CloudFront edge locations
  • Real-time delivery — Logs stream to ChatFeatured in real-time
  • Built for scale — Handles high-volume traffic efficiently
  • Enterprise-grade — Designed for large organizations
  • Cost-effective — Pay per GB of logs processed

Prerequisites

Before starting, ensure you have:
  • AWS Account with CloudFront and IAM permissions
  • Active CloudFront distribution with content
  • Python 3.12 knowledge (for Lambda understanding)
  • Your ChatFeatured API Key (starts with aa_)
  • S3 bucket access (for log backup)
If you’re new to AWS, consider starting with the Cloudflare integration instead. It’s simpler to set up and covers similar use cases.

Architecture Overview

CloudFront (Content Delivery)

CloudFront Real-time Logs

Data Firehose (Streaming Service)

Lambda (Transform Logs)

ChatFeatured Analytics

S3 (Backup for failures)
The flow works like this: CloudFront captures all requests at edge locations, sends them to Data Firehose in real-time, Lambda transforms them into ChatFeatured’s format, and Firehose delivers them. Failed deliveries back up to S3 for retry.

Step-by-Step Setup

Step 1: Create Lambda Transformation Function

Why? Data Firehose needs a Lambda to transform CloudFront logs into ChatFeatured format.
  1. Sign in to AWS Lambda Console
  2. Click Create function
  3. Select Author from scratch
  4. Configure:
    • Function name: chatfeatured-cloudfront-log-processor
    • Runtime: Python 3.12
    • Architecture: x86_64
  5. Click Create function
Replace the function code with:
import json
import base64
import urllib.request
import urllib.error
import os

def lambda_handler(event, context):
    """Transform CloudFront logs to ChatFeatured format"""
    
    output = []
    
    for record in event['records']:
        # Decode the log data
        payload = base64.b64decode(record['data']).decode('utf-8')
        
        # Parse CloudFront log line
        fields = payload.strip().split('\t')
        
        if len(fields) < 12:
            continue
        
        try:
            # Extract CloudFront fields
            log_entry = {
                'timestamp': f"{fields[0]}T{fields[1]}Z",  # date + time
                'host': fields[4],  # cs(Host)
                'method': fields[5],  # cs-method
                'pathname': fields[7],  # cs-uri-stem
                'query_params': fields[11] if len(fields) > 11 else '',
                'status': int(fields[8]),  # sc-status
                'ip': fields[2],  # c-ip
                'userAgent': fields[10] if len(fields) > 10 else '',
                'referer': fields[9] if len(fields) > 9 else '',
                'bytes_sent': int(fields[3]) if len(fields) > 3 else 0,
            }
            
            # Only send to ChatFeatured if it's an HTML page (200-299 status)
            if 200 <= log_entry['status'] < 300:
                send_to_chatfeatured(log_entry)
            
            # Prepare output for Firehose
            output_record = {
                'recordId': record['recordId'],
                'result': 'Ok',
                'data': base64.b64encode(
                    (json.dumps(log_entry) + '\n').encode('utf-8')
                ).decode('utf-8')
            }
        except Exception as e:
            print(f"Error processing record: {e}")
            output_record = {
                'recordId': record['recordId'],
                'result': 'ProcessingFailed',
            }
        
        output.append(output_record)
    
    return {'records': output}

def send_to_chatfeatured(log_entry):
    """Send log directly to ChatFeatured"""
    
    api_key = os.environ.get('CHATFEATURED_API_KEY')
    endpoint = 'https://ingest.chatfeatured.com/v1/logs/aws_data_firehose_cloudfront'
    
    try:
        req = urllib.request.Request(
            endpoint,
            data=json.dumps(log_entry).encode('utf-8'),
            headers={
                'Content-Type': 'application/json',
                'X-API-Key': api_key,
            }
        )
        urllib.request.urlopen(req, timeout=5)
    except urllib.error.URLError as e:
        print(f"Failed to send log: {e}")
  1. Click Deploy
  2. Note your Lambda function ARN (shown at top-right)
Copy the Lambda ARN somewhere safe—you’ll need it when creating the Data Firehose stream.

Step 2: Add Environment Variable to Lambda

  1. In Lambda settings, go to Configuration → Environment variables
  2. Click Edit
  3. Add:
    • Key: CHATFEATURED_API_KEY
    • Value: Your ChatFeatured API Key (starts with aa_)
  4. Click Save
Treat your API Key like a password. Never hardcode it in Lambda code or commit it to version control. Always use environment variables.

Step 3: Create S3 Backup Bucket

  1. Open S3 Console
  2. Click Create bucket
  3. Bucket name: chatfeatured-cloudfront-logs-backup-[your-account-id]
  4. Region: Same as your CloudFront distribution
  5. Click Create bucket
This bucket stores logs that fail to deliver to ChatFeatured. You can replay them later or investigate issues.

Step 4: Create Data Firehose Delivery Stream

  1. Go to Amazon Data Firehose Console
  2. Click Create Firehose stream
  3. Configure source and destination:
    • Source: Direct PUT
    • Destination: HTTP Endpoint
  4. Stream name: chatfeatured-cloudfront-analytics
  5. Processing section:
    • Enable: “Transform source records”
    • Select your Lambda from Step 1
  6. HTTP endpoint destination:
    • URL: https://ingest.chatfeatured.com/v1/logs/aws_data_firehose_cloudfront
    • Access key: Your ChatFeatured API Key
    • Enable: GZIP compression
  7. S3 backup:
    • Backup mode: “Failed data only”
    • Select bucket from Step 3
  8. Click Create Firehose stream

Step 5: Configure CloudFront Real-time Logs

  1. Go to CloudFront Console
  2. Select your distribution
  3. Click MonitoringLogsReal-time logs
  4. Click Create configuration
  5. Configure:
    • Name: chatfeatured-agent-analytics
    • Sampling rate: 100 (capture all requests)
    • Log fields: Select these:
      • date, time, c-ip, cs-method, cs(Host), cs-uri-stem
      • cs-uri-query, cs(User-Agent), cs(Referer), sc-status
      • sc-bytes, time-taken
    • Log recipients: Select your Firehose stream from Step 4
    • Cache behaviors: “All cache behaviors”
  6. Click Create configuration
CloudFront will now stream real-time logs to ChatFeatured!

Verifying the Setup

Check Data Flow

  1. Generate traffic to your CloudFront distribution (visit your site)
  2. Wait 2-3 minutes for logs to stream
  3. Go to ChatFeatured Analytics Dashboard
  4. You should see incoming traffic
Open your site in multiple browsers or use curl to generate test traffic if your site doesn’t get organic visits quickly.

Monitor CloudFront

  1. In CloudFront Monitoring tab, check Real-time logs status
  2. Should show Successful deliveries counter incrementing

Check Lambda Logs

  1. In Lambda Monitor tab, check Recent invocations
  2. Click invocation to see logs
  3. Verify no errors in execution

Check Firehose

  1. In Data Firehose console, select your stream
  2. Check Delivery stream metrics:
    • Incoming records should be > 0
    • Delivered bytes should be increasing

Advanced Configuration

Filtering by CloudFront Behavior

To track only certain paths (e.g., /blog/* but not /api/*):
  1. In CloudFront, create cache behaviors for different path patterns
  2. Add separate Real-time log configurations per behavior
  3. Route to different Firehose streams if desired

Cost Optimization

Current: Logs every request Option 1: Sampling
  • In CloudFront Real-time logs, reduce sampling rate to 10%
  • Reduces cost by 90% but loses granularity
Start with 100% sampling to understand your traffic, then reduce if costs are high.
Option 2: Filtered Logs
  • Modify Lambda to filter certain paths (skip /health, /metrics)
  • Keeps all traffic but filters noise

Lambda Error Handling

The Lambda already includes basic error handling:
# Only logs successful responses (200-299)
if 200 <= log_entry['status'] < 300:
    send_to_chatfeatured(log_entry)
For more granular control, modify the filter:
# Track any errors (400+)
if 400 <= log_entry['status'] < 600:
    log_entry['type'] = 'error'
    send_to_chatfeatured(log_entry)

Troubleshooting

Cause: Data Firehose not delivering or Lambda failingSolution:
  • Check CloudFront Real-time logs configuration shows “Enabled”
  • Verify Firehose has recent deliveries in console
  • Check Lambda logs for errors: Lambda → Monitor → Logs
  • Verify ChatFeatured API Key is correct in Lambda env vars
  • Wait 2-3 minutes before checking (there’s natural delay)
Cause: Syntax error or missing environment variableSolution:
  • Check Lambda code for syntax errors
  • Verify CHATFEATURED_API_KEY environment variable is set
  • Check CloudWatch logs for specific error messages
  • Test Lambda with sample CloudFront log data
  • Ensure Python 3.12 runtime is selected
Cause: API Key is invalid or expiredSolution:
  • Generate new API Key in ChatFeatured
  • Update CHATFEATURED_API_KEY in Lambda environment variables
  • Click Deploy on Lambda function
  • Wait a few minutes for changes to propagate
Cause: ChatFeatured endpoint not responding or rate-limitedSolution:
  • Check endpoint URL is correct: https://ingest.chatfeatured.com/v1/logs/aws_data_firehose_cloudfront
  • Verify ChatFeatured account isn’t rate-limited
  • Check S3 bucket to see failed logs: s3://bucket-name/failed_data/
  • Review CloudWatch Logs for Firehose errors
  • Contact ChatFeatured support if rate limiting persists
Cause: High CloudFront traffic or overly verbose loggingSolution:
  • Reduce CloudFront Real-time log sampling rate
  • Filter paths in Lambda to exclude static assets
  • Implement cost budget alerts in AWS
  • Review cost estimation below and adjust sampling accordingly
Cause: Lambda resources insufficient for log volumeSolution:
  • Increase Lambda timeout (currently 5 seconds)
  • Increase Lambda memory allocation (default 128 MB)
  • Reduce CloudFront sampling rate to decrease volume
  • Check CloudWatch metrics for performance bottlenecks

Cost Estimation

Typical costs for 10M requests/month:
ServiceRateEstimate
CloudFront Real-time Logs$0.01/1M lines$0.10
Data Firehose$0.029/GB$0.05
LambdaFree tier covers$0.00
S3 BackupStandard pricing$0.01
Total~$0.16/month
For 100M requests/month: ~1.50/monthFor1Brequests/month: 1.50/month **For 1B requests/month:** ~15/month
Most of the cost comes from CloudFront real-time logs and Data Firehose. Lambda and S3 are typically free tier or minimal. Use sampling to reduce CloudFront logs cost if needed.

Performance Notes

  • Latency: Minimal, handled asynchronously by Firehose
  • Log delay: 1-3 minute delay from request to ChatFeatured
  • Throughput: Can handle millions of requests/hour
  • Reliability: Automatic retry with S3 backup

Security Best Practices

  • Store API Key in Lambda environment variables (encrypted at rest)
  • Use IAM roles with minimal permissions
  • Enable S3 encryption for backup bucket
  • Monitor CloudFront logs for suspicious activity
Don’t hardcode API Key in Lambda code
Don’t make S3 bucket publicly accessible
Don’t grant Lambda unnecessary permissions

See Also