Data Provenance and Set Sources
Learn how to track the origin of your trading card data using the Set Sources API to maintain data quality, provide proper attribution, and verify information accuracy.
What are Set Sources?
Set Sources allow you to track where your trading card set data came from. Every piece of information in your database - whether it's a checklist, metadata, or images - originated from somewhere. The Set Sources API provides a standardized way to record and manage these data origins.
Why Track Data Sources?
Data Provenance
Understanding where your data came from is essential for:
- Transparency - Users can see where information originated
- Credibility - Verified sources build trust in your data
- Compliance - Some data sources require attribution
- Quality Control - Track which sources provide accurate information
Verification
Source tracking enables you to:
- Mark sources as verified after validation
- Identify which data needs review
- Update information from authoritative sources
- Remove unreliable data sources
Attribution
Properly crediting data sources:
- Respects intellectual property
- Maintains good relationships with data providers
- Meets licensing requirements
- Builds community trust
Source Types
The API supports three distinct types of sources:
Checklist Sources
Track where your card lists came from.
Common checklist sources:
- Trading card databases (TCDB, COMC)
- Price guides (Beckett, PSA)
- Manufacturer checklists
- Community-contributed lists
- Retailer catalogs
Example:
{
"source_type": "checklist",
"source_name": "COMC Database",
"source_url": "https://www.comc.com"
}
Metadata Sources
Track where set information originated.
Metadata includes:
- Set name and year
- Manufacturer details
- Print run information
- Set descriptions
- Release dates
Common metadata sources:
- CardboardConnection
- Manufacturer press releases
- Industry publications
- Collector guides
- Historical archives
Example:
{
"source_type": "metadata",
"source_name": "CardboardConnection",
"source_url": "https://www.cardboardconnection.com"
}
Image Sources
Track where card images were obtained.
Common image sources:
- Trading Card Database
- COMC scans
- Personal collection photos
- Official manufacturer images
- Community contributions
Example:
{
"source_type": "images",
"source_name": "Trading Card Database",
"source_url": "https://www.tradingcarddb.com"
}
Key Fields
source_type
The type of data this source provides: checklist, metadata, or images.
source_name
A human-readable name for the source (e.g., "Beckett Price Guide", "TCDB").
source_url
The URL where this data can be found or verified. This should be as specific as possible.
verified_at
An ISO 8601 timestamp indicating when this source was last verified as accurate. null if unverified.
set_id
The UUID of the set this source applies to. Each source is specific to one set.
Common Use Cases
When Importing Data
Record sources immediately when importing new sets:
# Import set data from external source
set_data = import_from_beckett(set_name)
set_id = create_set(set_data)
# Record where this data came from
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'checklist',
'source_name': 'Beckett Online Price Guide',
'source_url': 'https://www.beckett.com/price-guides'
}
})
Multiple Sources Per Set
A single set can have multiple sources for different types of data:
# Different sources for different data types
sources = [
{
'type': 'checklist',
'name': 'COMC Database',
'url': 'https://www.comc.com'
},
{
'type': 'metadata',
'name': 'CardboardConnection',
'url': 'https://www.cardboardconnection.com'
},
{
'type': 'images',
'name': 'Trading Card Database',
'url': 'https://www.tradingcarddb.com'
}
]
Verification Workflow
Track when sources are verified:
from datetime import datetime
# After manually verifying the source
client.set_sources.update(source_id, {
'type': 'set_sources',
'id': source_id,
'attributes': {
'verified_at': datetime.utcnow().isoformat() + 'Z'
}
})
Displaying Attribution
Show users where your data comes from:
# Get set with all sources
response = client.sets.get(set_id, include='sources')
# Display attribution
if 'included' in response:
sources = [s for s in response['included'] if s['type'] == 'set_sources']
print("Data Sources:")
for source in sources:
attrs = source['attributes']
verified = " ✓" if attrs.get('verified_at') else ""
print(f" {attrs['source_type']}: {attrs['source_name']}{verified}")
Best Practices
1. Record Sources Immediately
Add source information when importing data, not later. It's harder to remember where data came from after the fact.
2. Be Specific with URLs
Use the most specific URL possible. Link to the exact page or resource, not just the homepage.
# Good - specific URL
source_url = "https://www.beckett.com/price-guides/basketball/1986-fleer"
# Less useful - generic URL
source_url = "https://www.beckett.com"
3. Verify Periodically
Data sources can change or become unavailable. Periodically check that:
- URLs are still valid
- Data hasn't been updated at the source
- Source is still considered reliable
4. Track Multiple Source Types
Don't assume one source provides everything. Track separate sources for checklists, metadata, and images.
5. Update When Sources Change
If you get better data from a new source, update or replace the source record to reflect current information.
6. Provide Attribution in Your App
Use source data to:
- Display "Data provided by..." credits
- Link back to source websites
- Show verification status to users
- Build trust in your data quality
Example Workflows
Complete Set Import with Sources
def import_set_with_sources(source_data):
"""Import a set and track all data sources"""
# 1. Import the set data
set_response = client.sets.create({
'type': 'sets',
'attributes': source_data['set_attributes']
})
set_id = set_response['data']['id']
# 2. Track checklist source
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'checklist',
'source_name': source_data['checklist_source']['name'],
'source_url': source_data['checklist_source']['url']
}
})
# 3. Track metadata source if different
if source_data.get('metadata_source'):
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'metadata',
'source_name': source_data['metadata_source']['name'],
'source_url': source_data['metadata_source']['url']
}
})
return set_id
Audit Data Quality
def audit_unverified_sources():
"""Find sets with unverified sources"""
# Get all sources
sources = client.set_sources.list()
unverified = [
s for s in sources['data']
if not s['attributes'].get('verified_at')
]
print(f"Found {len(unverified)} unverified sources")
for source in unverified:
attrs = source['attributes']
print(f"\nSet ID: {attrs['set_id']}")
print(f"Type: {attrs['source_type']}")
print(f"Source: {attrs['source_name']}")
print(f"URL: {attrs['source_url']}")
Integration with Other Features
Set Completion Tracking
Combine source tracking with collection management to show users both what they own and where the data came from.
See the Collection Management Guide for practical integration examples.
Card Images
When uploading card images, track the image source separately from the checklist source:
# Upload image
image_response = upload_card_image(card_id, image_file)
# Track image source
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'images',
'source_name': 'Personal Collection Scans',
'source_url': 'https://myapp.com/user/123/scans'
}
})
Next Steps
- Code Examples - See working code for all Set Sources operations
- Collection Management Guide - Integrate source tracking into your collection app
- API Reference - Complete endpoint documentation
- Data Models - SetSource model schema
Troubleshooting
One Source Per Type Per Set
Each set can only have one source of each type. Attempting to create a second checklist source for the same set will fail with a uniqueness error.
Solution: Update the existing source or delete it first.
Invalid Source Types
The API only accepts three source types: checklist, metadata, and images.
Solution: Ensure you're using one of the valid source type values.
Missing set_id
Every source must be associated with a specific set.
Solution: Include the set_id when creating a source and ensure the set exists.