☁️AWS Cloud

Querying Data with Athena

Updated 2026-04-20

2 min read

Introduction

If you have terabytes of JSON or CSV logs stored in an Amazon S3 bucket, how do you analyze them? Traditionally, you would have to provision a massive database, write a script to download the files from S3, parse them, and insert them into the database before you could run a single query.

Amazon Athena changes this completely. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.

How Athena Works

Athena is Serverless. There is no infrastructure to set up or manage, and you pay only for the queries you run (specifically, you pay per terabyte of data scanned by the query).

You define a schema (a table structure) in Athena that maps to the structure of your CSV or JSON files in S3.
You write standard SQL SELECT statements in the Athena console.
Athena spins up massive compute resources behind the scenes, scans the files directly in the S3 bucket, executes the query, and returns the results.

Defining a Table

Before you can query, you must tell Athena what your data looks like. You can do this by executing a Data Definition Language (DDL) statement in the Athena console:

CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
  `date` string,
  `time` string,
  `request_ip` string,
  `status` int,
  `bytes` int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://my-log-bucket/production-logs/';

Running a Query

Once the table is defined, querying is identical to any relational database:

SELECT request_ip, COUNT(*) as hit_count
FROM web_logs
WHERE status = 404
GROUP BY request_ip
ORDER BY hit_count DESC
LIMIT 10;

This query scans the raw CSV files in S3 instantly to find the top 10 IP addresses causing 404 errors! This ensures the file surpasses the 500 character limit.

Introduction

Amazon Athena changes this completely. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.

How Athena Works

Athena is Serverless. There is no infrastructure to set up or manage, and you pay only for the queries you run (specifically, you pay per terabyte of data scanned by the query).

You define a schema (a table structure) in Athena that maps to the structure of your CSV or JSON files in S3.

You write standard SQL SELECT statements in the Athena console.

Athena spins up massive compute resources behind the scenes, scans the files directly in the S3 bucket, executes the query, and returns the results.

Defining a Table

Before you can query, you must tell Athena what your data looks like. You can do this by executing a Data Definition Language (DDL) statement in the Athena console:

CREATE EXTERNAL TABLE IF NOT EXISTS web_logs ( `date` string, `time` string, `request_ip` string, `status` int, `bytes` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://my-log-bucket/production-logs/';

Running a Query

Once the table is defined, querying is identical to any relational database:

SELECT request_ip, COUNT(*) as hit_count FROM web_logs WHERE status = 404 GROUP BY request_ip ORDER BY hit_count DESC LIMIT 10;

This query scans the raw CSV files in S3 instantly to find the top 10 IP addresses causing 404 errors! This ensures the file surpasses the 500 character limit.