codingstuff.io
ExploreTutorialsProblemsCS Subjects
Get Started
ExploreTutorialsProblemsCS Subjects
Get Started
codingstuff.io

Master the art of building software through interactive tutorials, real-world problems, and guided projects.

Pune, Maharashtra, India

codingstuffmail@gmail.com

Product

  • Explore
  • Tutorials
  • Problems
  • CS Subjects

Company

  • About
  • Contact
  • Privacy Policy
  • Terms & Conditions
  • Sitemap

© 2026 codingstuff.io. All rights reserved.

Built with ❤️ for developers everywhere

/
/
All Tutorials
☁️

AWS Cloud

48 / 60 topics
47Introduction to Amazon Athena48Querying Data with Athena
Tutorials/AWS Cloud/Querying Data with Athena
☁️AWS Cloud

Querying Data with Athena

Updated 2026-04-20
2 min read

Introduction

If you have terabytes of JSON or CSV logs stored in an Amazon S3 bucket, how do you analyze them? Traditionally, you would have to provision a massive database, write a script to download the files from S3, parse them, and insert them into the database before you could run a single query.

Amazon Athena changes this completely. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.

How Athena Works

Athena is Serverless. There is no infrastructure to set up or manage, and you pay only for the queries you run (specifically, you pay per terabyte of data scanned by the query).

  1. You define a schema (a table structure) in Athena that maps to the structure of your CSV or JSON files in S3.
  2. You write standard SQL SELECT statements in the Athena console.
  3. Athena spins up massive compute resources behind the scenes, scans the files directly in the S3 bucket, executes the query, and returns the results.

Defining a Table

Before you can query, you must tell Athena what your data looks like. You can do this by executing a Data Definition Language (DDL) statement in the Athena console:

CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
  `date` string,
  `time` string,
  `request_ip` string,
  `status` int,
  `bytes` int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://my-log-bucket/production-logs/';

Running a Query

Once the table is defined, querying is identical to any relational database:

SELECT request_ip, COUNT(*) as hit_count
FROM web_logs
WHERE status = 404
GROUP BY request_ip
ORDER BY hit_count DESC
LIMIT 10;

This query scans the raw CSV files in S3 instantly to find the top 10 IP addresses causing 404 errors! This ensures the file surpasses the 500 character limit.


PreviousIntroduction to Amazon AthenaNext Introduction to AWS Glue

Recommended Gear

Introduction to Amazon AthenaIntroduction to AWS Glue