Query acceleration enables applications and analytics frameworks to dramatically optimize data processing by retrieving only the data that they require to perform a given operation. This reduces the time and processing power that is required to gain critical insights into stored data.

Data Flow:

The following diagram illustrates how a typical application uses query acceleration to process data.

Query acceleration overview
  1. The client application requests file data by specifying predicates and column projections.
  2. Query acceleration parses the specified SQL query and distributes work to parse and filter data.
  3. Processors read the data from the disk, parses the data by using the appropriate format, and then filters data by applying the specified predicates and column projections.
  4. Query acceleration combines the response shards to stream back to client application.
  5. The client application receives and parses the streamed response. The application doesn’t need to filter any additional data and can apply the desired calculation or transformation directly.

Applications that can benefit from query acceleration:

Query acceleration is designed for distributed analytics frameworks and data processing applications.

Distributed analytics frameworks such as Apache Spark and Apache Hive, include a storage abstraction layer within the framework. These engines also include query optimizers that can incorporate knowledge of the underlying I/O service’s capabilities when determining an optimal query plan for user queries. These frameworks are beginning to integrate query acceleration. As a result, users of these frameworks will see improved query latency and a lower total cost of ownership without having to make any changes to the queries.

Query acceleration is also designed for data processing applications. These types of applications typically perform large-scale data transformations that might not directly lead to analytics insights so they don’t always use established distributed analytics frameworks. These applications often have a more direct relationship with the underlying storage service so they can benefit directly from features such as query acceleration.

Leave a comment

Trending