{"id":48510,"date":"2025-02-12T08:03:12","date_gmt":"2025-02-12T08:03:12","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=48510"},"modified":"2025-02-12T08:03:12","modified_gmt":"2025-02-12T08:03:12","slug":"what-is-presto","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/what-is-presto\/","title":{"rendered":"What is Presto?"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\"><strong>What is Presto in the Context of Amazon Athena?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Presto<\/strong> is an <strong>open-source distributed SQL query engine<\/strong> designed for <strong>fast and interactive querying of large datasets<\/strong>. In the context of <strong>Amazon Athena<\/strong>, <strong>Presto<\/strong> serves as the <strong>underlying query engine<\/strong> that powers Athena\u2019s ability to run SQL queries on data stored in <strong>Amazon S3<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Amazon Athena uses <strong>Presto<\/strong> under the hood to process SQL queries, enabling <strong>ad-hoc analysis<\/strong> of structured and semi-structured data (like JSON, Parquet, ORC, and Avro) without requiring any data loading or complex ETL processes.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Features of Presto in Amazon Athena<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>SQL Compatibility<\/strong>\n<ul class=\"wp-block-list\">\n<li>Supports <strong>ANSI SQL<\/strong> syntax, allowing users to run standard SQL queries on large datasets stored in S3.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Distributed Architecture<\/strong>\n<ul class=\"wp-block-list\">\n<li>Presto runs queries in parallel across multiple nodes for faster performance and scalability.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Schema-on-Read<\/strong>\n<ul class=\"wp-block-list\">\n<li>Unlike traditional databases that require structured schemas, Presto queries data in its <strong>raw format<\/strong> (e.g., CSV, JSON, Parquet) directly from S3.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Supports Multiple Data Formats<\/strong>\n<ul class=\"wp-block-list\">\n<li>Works with various formats such as <strong>Parquet, ORC, JSON, CSV<\/strong>, and even unstructured data stored in S3.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Low-Latency Queries<\/strong>\n<ul class=\"wp-block-list\">\n<li>Presto is optimized for fast query execution, making it suitable for <strong>interactive analysis<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How Presto Enhances Athena\u2019s Capabilities<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Serverless and Scalable<\/strong><br>Presto\u2019s distributed architecture allows Athena to scale without infrastructure management.<\/li>\n\n\n\n<li><strong>Ad-hoc Queries on Large Datasets<\/strong><br>Presto can query petabytes of data stored in Amazon S3 without the need for extraction or transformation.<\/li>\n\n\n\n<li><strong>High Query Performance<\/strong><br>Presto\u2019s <strong>in-memory execution model<\/strong> ensures low-latency responses, even for complex queries.<\/li>\n\n\n\n<li><strong>Cross-Source Querying (Beyond S3)<\/strong><br>While Athena focuses on S3, Presto can also connect to <strong>other data sources<\/strong> like MySQL, PostgreSQL, Kafka, and Cassandra in custom environments.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why Presto for Athena (Compared to Traditional Query Engines)?<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><strong>Parameter<\/strong><\/th><th><strong>Presto (Athena)<\/strong><\/th><th><strong>Traditional SQL Engines (MySQL, Postgres)<\/strong><\/th><\/tr><\/thead><tbody><tr><td><strong>Architecture<\/strong><\/td><td>Distributed, in-memory<\/td><td>Single-node or clustered<\/td><\/tr><tr><td><strong>Data Processing<\/strong><\/td><td>Schema-on-read (no data loading)<\/td><td>Requires data ingestion and loading<\/td><\/tr><tr><td><strong>Scalability<\/strong><\/td><td>Highly scalable<\/td><td>Limited by database size and cluster capacity<\/td><\/tr><tr><td><strong>Supported Formats<\/strong><\/td><td>JSON, Parquet, ORC, Avro<\/td><td>Structured (tables only)<\/td><\/tr><tr><td><strong>Use Case<\/strong><\/td><td>Ad-hoc analysis of big data<\/td><td>Transactional and small-scale analytics<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Common Use Cases of Presto in Athena<\/strong><\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Log Analysis<\/strong>: Analyze large volumes of application logs stored in S3.<\/li>\n\n\n\n<li><strong>Data Lake Querying<\/strong>: Perform SQL queries directly on S3-based data lakes.<\/li>\n\n\n\n<li><strong>Ad-hoc Business Intelligence<\/strong>: Integrate Athena with BI tools like Qlik, Tableau, or Power BI.<\/li>\n\n\n\n<li><strong>ETL and Data Transformation<\/strong>: Pre-process data from S3 for other analytical services.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In Amazon Athena, <strong>Presto is the core engine<\/strong> that enables <strong>high-performance SQL querying on S3 data<\/strong> without managing infrastructure. Presto\u2019s <strong>distributed architecture<\/strong> and <strong>schema-on-read<\/strong> capabilities make it a perfect fit for <strong>big data analytics<\/strong>, <strong>data lakes<\/strong>, and <strong>real-time ad-hoc queries<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Presto in the Context of Amazon Athena? Presto is an open-source distributed SQL query engine designed for fast and interactive querying of large datasets. In&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[2],"tags":[],"class_list":["post-48510","post","type-post","status-publish","format-standard","hentry","category-uncategorised"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/48510","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=48510"}],"version-history":[{"count":1,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/48510\/revisions"}],"predecessor-version":[{"id":48511,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/48510\/revisions\/48511"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=48510"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=48510"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=48510"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}