• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
    • Open-Source
      • Disclaimer
      • Deployment
      • Support
    • Enterprise
      • Overview
      • Support
      • Hosting Costs
        • AWS
        • Azure
        • DigitalOcean
        • Google Cloud Platform
        • EU Hosting Companies
        • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Debug hanging / freezing process
          • Horizontal scaling (Sharding & replication)
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
          • Overview
          • Ingestion lag
          • Jobs not executing
          • Scheduled tasks not executing
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Using a CDP
    • Deploying a reverse proxy
    • Library comparison
    • Badge
    • Browser extensions
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Node.js
      • Go
      • Python
      • Rust
      • Java
      • PHP
      • Ruby
      • Elixir
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • Migrate between PostHog instances
    • Migrate from Amplitude
    • Migrate to PostHog Cloud EU
    • To another self-hosted instance
    • Export your events
    • Overview
    • Tutorial
    • Troubleshooting
    • Developer reference
    • Using the PostHog API
    • Jobs
    • Testing
    • TypeScript types
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • SOC 2
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Rudderstack Import
      • Segment Connector
      • Shopify Connector
      • Stripe Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • Avo Inspector
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Pace Integration
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Advanced GeoIP Enricher
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
      • Pineapple Mode
  • Tutorials
    • Actions
    • Apps
    • Cohorts
    • Configuration
    • Data management
    • Dashboards
    • Experimentation
    • Feature flags
    • Funnels
    • Group analytics
    • Heatmaps
    • Insights
    • Path analysis
    • Retention
    • Session recording
    • Toolbar
    • Trends
  • Support
  • Glossary
  • Docs

  • Overview
    • Quickstart with PostHog Cloud
    • Overview
    • Open-Source
      • Disclaimer
      • Deployment
      • Support
    • Enterprise
      • Overview
      • Support
      • Hosting Costs
        • AWS
        • Azure
        • DigitalOcean
        • Google Cloud Platform
        • EU Hosting Companies
        • Other platforms
      • Instance settings
      • Environment variables
      • Securing PostHog
      • Monitoring with Grafana
      • Running behind a proxy
      • Configuring email
      • Helm chart configuration
      • Deploying ClickHouse using Altinity.Cloud
      • Configuring Slack
      • Overview
        • Overview
        • Upgrade notes
        • Overview
        • 0001-events-sample-by
        • 0002_events_sample_by
        • 0003_fill_person_distinct_id2
        • ClickHouse
          • Backup
          • Debug hanging / freezing process
          • Horizontal scaling (Sharding & replication)
          • Kafka Engine
          • Resize disk
          • Restore
          • Vertical scaling
        • Kafka
          • Resize disk
          • Log retention
        • PostgreSQL
          • Resize disk
          • Troubleshooting long-running migrations
        • Plugin server
          • Overview
          • Ingestion lag
          • Jobs not executing
          • Scheduled tasks not executing
        • MinIO
        • Redis
        • Zookeeper
      • Disaster recovery
    • Troubleshooting and FAQs
    • Overview
    • Ingest live data
    • Ingest historical data
    • Identify users
    • User properties
    • Using a CDP
    • Deploying a reverse proxy
    • Library comparison
    • Badge
    • Browser extensions
      • Snippet installation
      • Android
      • iOS
      • JavaScript
      • Flutter
      • React Native
      • Node.js
      • Go
      • Python
      • Rust
      • Java
      • PHP
      • Ruby
      • Elixir
      • Docusaurus v2
      • Gatsby
      • Google Tag Manager
      • Next.js
      • Nuxt.js
      • Retool
      • RudderStack
      • Segment
      • Sentry
      • Slack
      • Shopify
      • WordPress
      • Message formatting
      • Microsoft Teams
      • Slack
      • Discord
    • Migrate between PostHog instances
    • Migrate from Amplitude
    • Migrate to PostHog Cloud EU
    • To another self-hosted instance
    • Export your events
    • Overview
    • Tutorial
    • Troubleshooting
    • Developer reference
    • Using the PostHog API
    • Jobs
    • Testing
    • TypeScript types
    • Overview
    • POST-only public endpoints
    • Actions
    • Annotations
    • Cohorts
    • Dashboards
    • Event definitions
    • Events
    • Experiments
    • Feature flags
    • Funnels
    • Groups
    • Groups types
    • Insights
    • Invites
    • Members
    • Persons
    • Plugin configs
    • Plugins
    • Projects
    • Property definitions
    • Session recordings
    • Trends
    • Users
    • Data model
    • Overview
    • Data model
    • Ingestion pipeline
    • ClickHouse
    • Querying data
    • Overview
    • GDPR guidance
    • HIPAA guidance
    • CCPA guidance
    • SOC 2
    • Data egress & compliance
    • Data deletion
    • Overview
    • Code of conduct
    • Recognizing contributions
  • Using PostHog

  • Table of contents
      • Dashboards
      • Funnels
      • Group Analytics
      • Insights
      • Lifecycle
      • Path analysis
      • Retention
      • Stickiness
      • Trends
      • Heatmaps
      • Session Recording
      • Correlation Analysis
      • Experimentation
      • Feature Flags
      • Actions
      • Annotations
      • Cohorts
      • Data Management
      • Events
      • Persons
      • Sessions
      • UTM segmentation
      • Team collaboration
      • Organizations & projects
      • Settings
      • SSO & SAML
      • Toolbar
      • Notifications & alerts
    • Overview
      • Amazon Kinesis Import
      • BitBucket Release Tracker
      • Event Replicator
      • GitHub Release Tracker
      • GitHub Star Sync
      • GitLab Release Tracker
      • Heartbeat
      • Ingestion Alert
      • Email Scoring
      • n8n Connector
      • Orbit Connector
      • Redshift Import
      • Rudderstack Import
      • Segment Connector
      • Shopify Connector
      • Stripe Connector
      • Twitter Followers Tracker
      • Zendesk Connector
      • Airbyte Exporter
      • Amazon S3 Export
      • Avo Inspector
      • BigQuery Export
      • Customer.io Connector
      • Databricks Export
      • Engage Connector
      • GCP Pub/Sub Connector
      • Google Cloud Storage Export
      • Hubspot Connector
      • Intercom Connector
      • PagerDuty Connector
      • PostgreSQL Export
      • Redshift Export
      • RudderStack Export
      • Salesforce Connector
      • Sendgrid Connector
      • Sentry Connector
      • Snowflake Export
      • Twilio Connector
      • Variance Connector
      • Pace Integration
      • Zapier Connector
      • Downsampler
      • Event Sequence Timer
      • First Time Event Tracker
      • Property Filter
      • Property Flattener
      • Schema Enforcer
      • Taxonomy Standardizer
      • Unduplicator
      • Advanced GeoIP Enricher
      • Automatic Cohort Creator
      • Currency Normalizer
      • GeoIP Enricher
      • Timestamp Parser
      • URL Normalizer
      • User Agent Populator
      • Pineapple Mode
  • Tutorials
    • Actions
    • Apps
    • Cohorts
    • Configuration
    • Data management
    • Dashboards
    • Experimentation
    • Feature flags
    • Funnels
    • Group analytics
    • Heatmaps
    • Insights
    • Path analysis
    • Retention
    • Session recording
    • Toolbar
    • Trends
  • Support
  • Glossary

Deleting bulk data

Important: Bulk data deletion is done at your own risk. We do not yet provide a standard way to delete data in bulk, so you must be extremely careful when performing such an operation. We do not take responsibility for any loss of data as a result of this process.

To delete bulk data from your database (e.g. all events), the current suggested way is to directly interact with the PostgreSQL or ClickHouse instance and use SQL queries to delete the desired data.

  • PostgreSQL Docs
  • ClickHouse Docs

Alternatively, one of our contributors has suggested the following Python script for deleting data:

Python
#!/usr/bin/python
import logging
import os
from datetime import datetime, timedelta
import django
django.setup()
from posthog.models import Event, ElementGroup
from django.utils import timezone
max_age_days = int(os.getenv("POSTHOG_CLEANUP_OLDER_THAN_DAYS", 30))
step_size = int(os.getenv("POSTHOG_CLEANUP_BATCH_SIZE", 1000))
dry_run = True if os.getenv("POSTHOG_CLEANUP_DRY_RUN", "False").lower() in ["true", "yes", "1"] else False
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %H:%M:%S ')
def get_events_older_than(older_than):
return Event.objects.filter(timestamp__lt=timezone.make_aware(datetime.now() - timedelta(older_than))).values_list(
'pk', flat=True)
def get_non_referenced_event_groups():
event_group_hashes = Event.objects.all().values_list('elements_hash', flat=True)
return ElementGroup.objects.exclude(hash__in=list(event_group_hashes))
def delete_items(item_type, items):
if dry_run:
logging.info("Skipping delete of items in dry run mode...")
return
item_type.objects.filter(id__in=list(items)).delete()
def delete_items_batched(item_type, items, logging_indent=6 * " "):
number_of_items = len(items)
logging.info("%sDeleting %d items of type %s using batches of %d size:", logging_indent, number_of_items,
item_type.__name__, step_size)
last_id = 0
while last_id + step_size <= number_of_items:
delete_items(item_type, items[last_id:last_id + step_size])
logging.info("%s %d%%", logging_indent, int(last_id / number_of_items * 100))
last_id += step_size
delete_items(item_type, items[last_id:])
logging.info("%s 100%%", logging_indent)
if __name__ == "__main__":
logging.info("Running cleanup of PostHog...")
start_time = datetime.now()
logging.info(" - Deleting all events older than %d days:", max_age_days)
delete_items_batched(Event, get_events_older_than(max_age_days))
logging.info(" - Deleting all elements and element groups not referenced by any event anymore:")
delete_items_batched(ElementGroup, get_non_referenced_event_groups())
logging.info("Cleanup finished, total duration: %s", datetime.now() - start_time)
Want more tips? Get them delivered twice a month.

Questions?

Was this page useful?

Contributor

  • Yakko Majuri
    Yakko Majuri

Share

2,218 views

Filed under...

  • configuration
  • data management
  • Product

  • Overview
  • Pricing
  • Product analytics
  • Session recording
  • A/B testing
  • Feature flags
  • Apps
  • Customer stories
  • PostHog vs...
  • Docs

  • Quickstart guide
  • Self-hosting
  • Installing PostHog
  • Building an app
  • API
  • Webhooks
  • How PostHog works
  • Data privacy
  • Using PostHog

  • Product manual
  • Apps manuals
  • Tutorials
  • Community

  • Questions?
  • Product roadmap
  • Contributors
  • Partners
  • Newsletter
  • Merch
  • PostHog FM
  • PostHog on GitHub
  • Handbook

  • Getting started
  • Company
  • Strategy
  • How we work
  • Small teams
  • People & Ops
  • Engineering
  • Product
  • Design
  • Marketing
  • Customer success
  • Company

  • About
  • Team
  • Investors
  • Press
  • Blog
  • FAQ
  • Support
  • Careers
© 2023 PostHog, Inc.
  • Code of conduct
  • Privacy policy
  • Terms