Query time reduced from over 90 seconds with MySQL to less than 0.5 second with ClickHouse
DoubleCloud outperformed Druid in the client’s POC criteria, such as the total cost of ownership, availability zones, and breadth of platform integration amongst other test criteria
Multiple data sources unified into a single ClickHouse instance for sports analytics, leveraging Data Transfer for seamless integration
LSports is a world-leading provider of real-time sports data, specializing in delivering comprehensive sports data solutions to media companies and sportsbooks. They offer a wide array of data products and engagement tools, including TRADE360, OddService, Scouts Feed, and a LSports API. LSports stands out for its ability to provide tailored, technologically advanced data services, covering a broad range of sports, leagues, and betting markets. Their engagement tools, such as BetBooster and SCOREFRAME, enhance user experience by providing AI-based sports betting tips and detailed live match tracking. Their commitment to high-quality, accurate data collection and analysis empowers sportsbooks to create engaging and effective customer offerings.
LSports' primary challenge involved the need for a robust system capable of handling complex and varied data structures in real-time. They were specifically looking for a solution that could efficiently process vast amounts of sports data, including both historical and current event data, for their real-time monitoring and analytics services.
One of the key services that required real-time data queries was their internal solution provided to customers, focused on real-time monitoring of sport events. This service encompassed LSports' comprehensive inventory of sports events, detailing the supported bets and the markets available in the sports industry. The existing MySQL system was inadequate for this extensive data load, leading to unacceptably long query times. The inefficiency in data processing, coupled with the challenge of integrating different data sources into a single, coherent system, significantly inhibited LSports' ability to offer timely and precise sports data analytics.
After evaluating different solutions including Apache Druid and Firebolt, LSports selected DoubleCloud’s solution for their superior performance in handling large data sets. The primary criteria for choosing DoubleCloud included its ability to meet LSports' rigorous success criteria, which other technologies failed to achieve. LSports found that while Druid could not fulfill even 50% of their success criteria, the DoubleCloud platform met all their requirements. These included crucial criteria such as the total cost of ownership, availability zones, DataDog integration, VPC peering, and RabbitMQ support, among others. This was partly due to the in-depth understanding and expertise that DoubleCloud demonstrated regarding ClickHouse’s capabilities and its comparative advantages over other technologies like Druid.
The implementation of ClickHouse by DoubleCloud was characterized by remarkable efficiency and timeline. After the initial setup of the system, which took about two hours, LSports dedicated an additional three hours to integrating and connecting various Kafka topics and other services. As part of this integration, the DoubleCloud’s Data Transfer service was utilized for synchronizing metadata related to sports types, leagues, and other relevant data.
The Lsports’ data architecture ingests betting data across two key streams: In-play, which handles data for sports events currently in progress, and Pre-match, which deals with data for future events. A critical component of the architecture is the unification of these two streams into a single source, ensuring all relevant fixtures and market data are cohesively available.
Initially, fixture data is extracted from MySQL, the single source of truth, and imported into ClickHouse via a CSV format. These data are represented in the 'Initial Fixture Ingestion Materialized View, ' capturing a snapshot of relevant fixtures — those currently active or scheduled for the future, excluding historical data which are no longer pertinent.
The 'fixtures_import Merge Tree Engine' then processes this data, feeding it into the 'fixtures_updates' table. This table, powered by a 'Replacing Merge Tree Engine', is dynamically updated with fixture information and designed to self-clean, purging data 10 days after an event’s conclusion to maintain relevance and efficiency. For real-time updates, a Kafka Engine consumes data from the 'DI.Fixture' topic, which provides metadata updates on fixtures such as changes in participants or locations.
Lsports’ data pipeline
Lsports’ data pipeline
Furthermore, the system employs a 'fixtures_dictionary', which works as a rapid-access data structure to fetch fixture details swiftly. The 'fixtures_dictionary' is essential for connecting market updates to fixtures when market data contains only fixture IDs. It allows for quick retrieval of detailed fixture information to enrich market updates with necessary context. The 'monitoring fixtures Replacing Merge Tree Engine' integrates this fixture metadata, resulting in a comprehensive, flat table for monitoring purposes that obviates the need for complex joins and ensures high performance.
This architecture is scalable, with a three-node ClickHouse cluster at its core, and it interfaces with a Kubernetes cluster to serve data to the UI. This setup ensures the service can handle the high throughput and dynamic nature of sports betting data, providing clients with real-time, accurate information necessary for betting activities.
Finally, this architectural approach enabled the seamless maintenance of up-to-date and accurate metadata within the ClickHouse system, supporting their real-time data analytics needs. The comprehensive onboarding process, completed in just under a day, marked a significant improvement in setup time and efficiency, enhancing LSports' operational workflow.
The migration to DoubleCloud and ClickHouse brought about a significant improvement in LSports' data handling capabilities, cutting down query times drastically and enabling efficient real-time data analytics. This technical enhancement has improved LSports' capacity to handle complex queries and large data volumes more efficiently, reinforcing their position in the sports data industry. Also, DoubleCloud proved far more cost-effective than other solutions on the market, offering enhanced performance at a lower cost.
The plans of LSports include migrating from MSK to Managed Kafka by DoubleCloud and further integrating ClickHouse to achieve specific technical objectives. LSports intends to leverage DoubleCloud’s platform advanced data processing and real-time analytics services to support the development of new sports data products. These services will require high-speed data processing capabilities for handling live event data and complex analytics tasks. Additionally, LSports also is focusing on leveraging ClickHouse’s scalability features to manage increasing data volumes and a growing customer base, ensuring the system’s responsiveness and reliability for the whole platform.
Start building your data infrastructure today with our free trial
or contact us, and solution architects will help you with the request.
ClickHouse® is a trademark of ClickHouse, Inc. https://clickhouse.com