Skip to main content

Data Provenance and Set Sources

Learn how to track the origin of your trading card data using the Set Sources API to maintain data quality, provide proper attribution, and verify information accuracy.

What are Set Sources?

Set Sources allow you to track where your trading card set data came from. Every piece of information in your database - whether it's a checklist, metadata, or images - originated from somewhere. The Set Sources API provides a standardized way to record and manage these data origins.

Why Track Data Sources?

Data Provenance

Understanding where your data came from is essential for:

  • Transparency - Users can see where information originated
  • Credibility - Verified sources build trust in your data
  • Compliance - Some data sources require attribution
  • Quality Control - Track which sources provide accurate information

Verification

Source tracking enables you to:

  • Mark sources as verified after validation
  • Identify which data needs review
  • Update information from authoritative sources
  • Remove unreliable data sources

Attribution

Properly crediting data sources:

  • Respects intellectual property
  • Maintains good relationships with data providers
  • Meets licensing requirements
  • Builds community trust

Source Types

The API supports three distinct types of sources:

Checklist Sources

Track where your card lists came from.

Common checklist sources:

  • Trading card databases (TCDB, COMC)
  • Price guides (Beckett, PSA)
  • Manufacturer checklists
  • Community-contributed lists
  • Retailer catalogs

Example:

{
"source_type": "checklist",
"source_name": "COMC Database",
"source_url": "https://www.comc.com"
}

Metadata Sources

Track where set information originated.

Metadata includes:

  • Set name and year
  • Manufacturer details
  • Print run information
  • Set descriptions
  • Release dates

Common metadata sources:

  • CardboardConnection
  • Manufacturer press releases
  • Industry publications
  • Collector guides
  • Historical archives

Example:

{
"source_type": "metadata",
"source_name": "CardboardConnection",
"source_url": "https://www.cardboardconnection.com"
}

Image Sources

Track where card images were obtained.

Common image sources:

  • Trading Card Database
  • COMC scans
  • Personal collection photos
  • Official manufacturer images
  • Community contributions

Example:

{
"source_type": "images",
"source_name": "Trading Card Database",
"source_url": "https://www.tradingcarddb.com"
}

Key Fields

source_type

The type of data this source provides: checklist, metadata, or images.

source_name

A human-readable name for the source (e.g., "Beckett Price Guide", "TCDB").

source_url

The URL where this data can be found or verified. This should be as specific as possible.

verified_at

An ISO 8601 timestamp indicating when this source was last verified as accurate. null if unverified.

set_id

The UUID of the set this source applies to. Each source is specific to one set.

Common Use Cases

When Importing Data

Record sources immediately when importing new sets:

# Import set data from external source
set_data = import_from_beckett(set_name)
set_id = create_set(set_data)

# Record where this data came from
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'checklist',
'source_name': 'Beckett Online Price Guide',
'source_url': 'https://www.beckett.com/price-guides'
}
})

Multiple Sources Per Set

A single set can have multiple sources for different types of data:

# Different sources for different data types
sources = [
{
'type': 'checklist',
'name': 'COMC Database',
'url': 'https://www.comc.com'
},
{
'type': 'metadata',
'name': 'CardboardConnection',
'url': 'https://www.cardboardconnection.com'
},
{
'type': 'images',
'name': 'Trading Card Database',
'url': 'https://www.tradingcarddb.com'
}
]

Verification Workflow

Track when sources are verified:

from datetime import datetime

# After manually verifying the source
client.set_sources.update(source_id, {
'type': 'set_sources',
'id': source_id,
'attributes': {
'verified_at': datetime.utcnow().isoformat() + 'Z'
}
})

Displaying Attribution

Show users where your data comes from:

# Get set with all sources
response = client.sets.get(set_id, include='sources')

# Display attribution
if 'included' in response:
sources = [s for s in response['included'] if s['type'] == 'set_sources']

print("Data Sources:")
for source in sources:
attrs = source['attributes']
verified = " ✓" if attrs.get('verified_at') else ""
print(f" {attrs['source_type']}: {attrs['source_name']}{verified}")

Best Practices

1. Record Sources Immediately

Add source information when importing data, not later. It's harder to remember where data came from after the fact.

2. Be Specific with URLs

Use the most specific URL possible. Link to the exact page or resource, not just the homepage.

# Good - specific URL
source_url = "https://www.beckett.com/price-guides/basketball/1986-fleer"

# Less useful - generic URL
source_url = "https://www.beckett.com"

3. Verify Periodically

Data sources can change or become unavailable. Periodically check that:

  • URLs are still valid
  • Data hasn't been updated at the source
  • Source is still considered reliable

4. Track Multiple Source Types

Don't assume one source provides everything. Track separate sources for checklists, metadata, and images.

5. Update When Sources Change

If you get better data from a new source, update or replace the source record to reflect current information.

6. Provide Attribution in Your App

Use source data to:

  • Display "Data provided by..." credits
  • Link back to source websites
  • Show verification status to users
  • Build trust in your data quality

Example Workflows

Complete Set Import with Sources

def import_set_with_sources(source_data):
"""Import a set and track all data sources"""

# 1. Import the set data
set_response = client.sets.create({
'type': 'sets',
'attributes': source_data['set_attributes']
})
set_id = set_response['data']['id']

# 2. Track checklist source
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'checklist',
'source_name': source_data['checklist_source']['name'],
'source_url': source_data['checklist_source']['url']
}
})

# 3. Track metadata source if different
if source_data.get('metadata_source'):
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'metadata',
'source_name': source_data['metadata_source']['name'],
'source_url': source_data['metadata_source']['url']
}
})

return set_id

Audit Data Quality

def audit_unverified_sources():
"""Find sets with unverified sources"""

# Get all sources
sources = client.set_sources.list()

unverified = [
s for s in sources['data']
if not s['attributes'].get('verified_at')
]

print(f"Found {len(unverified)} unverified sources")

for source in unverified:
attrs = source['attributes']
print(f"\nSet ID: {attrs['set_id']}")
print(f"Type: {attrs['source_type']}")
print(f"Source: {attrs['source_name']}")
print(f"URL: {attrs['source_url']}")

Integration with Other Features

Set Completion Tracking

Combine source tracking with collection management to show users both what they own and where the data came from.

See the Collection Management Guide for practical integration examples.

Card Images

When uploading card images, track the image source separately from the checklist source:

# Upload image
image_response = upload_card_image(card_id, image_file)

# Track image source
client.set_sources.create({
'type': 'set_sources',
'attributes': {
'set_id': set_id,
'source_type': 'images',
'source_name': 'Personal Collection Scans',
'source_url': 'https://myapp.com/user/123/scans'
}
})

Next Steps

Troubleshooting

One Source Per Type Per Set

Each set can only have one source of each type. Attempting to create a second checklist source for the same set will fail with a uniqueness error.

Solution: Update the existing source or delete it first.

Invalid Source Types

The API only accepts three source types: checklist, metadata, and images.

Solution: Ensure you're using one of the valid source type values.

Missing set_id

Every source must be associated with a specific set.

Solution: Include the set_id when creating a source and ensure the set exists.