Why AWS Data Firehose?
- Edge-level tracking — Captures logs at CloudFront edge locations
- Real-time delivery — Logs stream to ChatFeatured in real-time
- Built for scale — Handles high-volume traffic efficiently
- Enterprise-grade — Designed for large organizations
- Cost-effective — Pay per GB of logs processed
Prerequisites
Before starting, ensure you have:- AWS Account with CloudFront and IAM permissions
- Active CloudFront distribution with content
- Python 3.12 knowledge (for Lambda understanding)
- Your ChatFeatured API Key (starts with
aa_) - S3 bucket access (for log backup)
Architecture Overview
The flow works like this: CloudFront captures all requests at edge locations, sends them to Data Firehose in real-time, Lambda transforms them into ChatFeatured’s format, and Firehose delivers them. Failed deliveries back up to S3 for retry.
Step-by-Step Setup
Step 1: Create Lambda Transformation Function
Why? Data Firehose needs a Lambda to transform CloudFront logs into ChatFeatured format.- Sign in to AWS Lambda Console
- Click Create function
- Select Author from scratch
- Configure:
- Function name:
chatfeatured-cloudfront-log-processor - Runtime: Python 3.12
- Architecture: x86_64
- Function name:
- Click Create function
- Click Deploy
- Note your Lambda function ARN (shown at top-right)
Step 2: Add Environment Variable to Lambda
- In Lambda settings, go to Configuration → Environment variables
- Click Edit
- Add:
- Key:
CHATFEATURED_API_KEY - Value: Your ChatFeatured API Key (starts with
aa_)
- Key:
- Click Save
Step 3: Create S3 Backup Bucket
- Open S3 Console
- Click Create bucket
- Bucket name:
chatfeatured-cloudfront-logs-backup-[your-account-id] - Region: Same as your CloudFront distribution
- Click Create bucket
This bucket stores logs that fail to deliver to ChatFeatured. You can replay them later or investigate issues.
Step 4: Create Data Firehose Delivery Stream
- Go to Amazon Data Firehose Console
- Click Create Firehose stream
- Configure source and destination:
- Source: Direct PUT
- Destination: HTTP Endpoint
- Stream name:
chatfeatured-cloudfront-analytics - Processing section:
- Enable: “Transform source records” ✓
- Select your Lambda from Step 1
- HTTP endpoint destination:
- URL:
https://ingest.chatfeatured.com/v1/logs/aws_data_firehose_cloudfront - Access key: Your ChatFeatured API Key
- Enable: GZIP compression ✓
- URL:
- S3 backup:
- Backup mode: “Failed data only”
- Select bucket from Step 3
- Click Create Firehose stream
Step 5: Configure CloudFront Real-time Logs
- Go to CloudFront Console
- Select your distribution
- Click Monitoring → Logs → Real-time logs
- Click Create configuration
- Configure:
- Name:
chatfeatured-agent-analytics - Sampling rate: 100 (capture all requests)
- Log fields: Select these:
- date, time, c-ip, cs-method, cs(Host), cs-uri-stem
- cs-uri-query, cs(User-Agent), cs(Referer), sc-status
- sc-bytes, time-taken
- Log recipients: Select your Firehose stream from Step 4
- Cache behaviors: “All cache behaviors”
- Name:
- Click Create configuration
Verifying the Setup
Check Data Flow
- Generate traffic to your CloudFront distribution (visit your site)
- Wait 2-3 minutes for logs to stream
- Go to ChatFeatured Analytics Dashboard
- You should see incoming traffic
Monitor CloudFront
- In CloudFront Monitoring tab, check Real-time logs status
- Should show Successful deliveries counter incrementing
Check Lambda Logs
- In Lambda Monitor tab, check Recent invocations
- Click invocation to see logs
- Verify no errors in execution
Check Firehose
- In Data Firehose console, select your stream
- Check Delivery stream metrics:
- Incoming records should be > 0
- Delivered bytes should be increasing
Advanced Configuration
Filtering by CloudFront Behavior
To track only certain paths (e.g.,/blog/* but not /api/*):
- In CloudFront, create cache behaviors for different path patterns
- Add separate Real-time log configurations per behavior
- Route to different Firehose streams if desired
Cost Optimization
Current: Logs every request Option 1: Sampling- In CloudFront Real-time logs, reduce sampling rate to 10%
- Reduces cost by 90% but loses granularity
- Modify Lambda to filter certain paths (skip
/health,/metrics) - Keeps all traffic but filters noise
Lambda Error Handling
The Lambda already includes basic error handling:Troubleshooting
No Data Appearing in ChatFeatured
No Data Appearing in ChatFeatured
Cause: Data Firehose not delivering or Lambda failingSolution:
- Check CloudFront Real-time logs configuration shows “Enabled”
- Verify Firehose has recent deliveries in console
- Check Lambda logs for errors: Lambda → Monitor → Logs
- Verify ChatFeatured API Key is correct in Lambda env vars
- Wait 2-3 minutes before checking (there’s natural delay)
Lambda Function Failing
Lambda Function Failing
Cause: Syntax error or missing environment variableSolution:
- Check Lambda code for syntax errors
- Verify
CHATFEATURED_API_KEYenvironment variable is set - Check CloudWatch logs for specific error messages
- Test Lambda with sample CloudFront log data
- Ensure Python 3.12 runtime is selected
'HTTP 401 Unauthorized' Error
'HTTP 401 Unauthorized' Error
Firehose Backing Up to S3
Firehose Backing Up to S3
Cause: ChatFeatured endpoint not responding or rate-limitedSolution:
- Check endpoint URL is correct:
https://ingest.chatfeatured.com/v1/logs/aws_data_firehose_cloudfront - Verify ChatFeatured account isn’t rate-limited
- Check S3 bucket to see failed logs:
s3://bucket-name/failed_data/ - Review CloudWatch Logs for Firehose errors
- Contact ChatFeatured support if rate limiting persists
High AWS Costs
High AWS Costs
Cause: High CloudFront traffic or overly verbose loggingSolution:
- Reduce CloudFront Real-time log sampling rate
- Filter paths in Lambda to exclude static assets
- Implement cost budget alerts in AWS
- Review cost estimation below and adjust sampling accordingly
Lambda Timeout or Memory Issues
Lambda Timeout or Memory Issues
Cause: Lambda resources insufficient for log volumeSolution:
- Increase Lambda timeout (currently 5 seconds)
- Increase Lambda memory allocation (default 128 MB)
- Reduce CloudFront sampling rate to decrease volume
- Check CloudWatch metrics for performance bottlenecks
Cost Estimation
Typical costs for 10M requests/month:| Service | Rate | Estimate |
|---|---|---|
| CloudFront Real-time Logs | $0.01/1M lines | $0.10 |
| Data Firehose | $0.029/GB | $0.05 |
| Lambda | Free tier covers | $0.00 |
| S3 Backup | Standard pricing | $0.01 |
| Total | ~$0.16/month |
Most of the cost comes from CloudFront real-time logs and Data Firehose. Lambda and S3 are typically free tier or minimal. Use sampling to reduce CloudFront logs cost if needed.
Performance Notes
- Latency: Minimal, handled asynchronously by Firehose
- Log delay: 1-3 minute delay from request to ChatFeatured
- Throughput: Can handle millions of requests/hour
- Reliability: Automatic retry with S3 backup
Security Best Practices
- Store API Key in Lambda environment variables (encrypted at rest)
- Use IAM roles with minimal permissions
- Enable S3 encryption for backup bucket
- Monitor CloudFront logs for suspicious activity
See Also
- Cloudflare Workers — Simpler Cloudflare setup
- Vercel Log Drains — For Vercel hosting