If you have terabytes of JSON or CSV logs stored in an Amazon S3 bucket, how do you analyze them? Traditionally, you would have to provision a massive database, write a script to download the files from S3, parse them, and insert them into the database before you could run a single query.
Amazon Athena changes this completely. Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL.
Athena is Serverless. There is no infrastructure to set up or manage, and you pay only for the queries you run (specifically, you pay per terabyte of data scanned by the query).
SELECT statements in the Athena console.Before you can query, you must tell Athena what your data looks like. You can do this by executing a Data Definition Language (DDL) statement in the Athena console:
CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
`date` string,
`time` string,
`request_ip` string,
`status` int,
`bytes` int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://my-log-bucket/production-logs/';
Once the table is defined, querying is identical to any relational database:
SELECT request_ip, COUNT(*) as hit_count
FROM web_logs
WHERE status = 404
GROUP BY request_ip
ORDER BY hit_count DESC
LIMIT 10;
This query scans the raw CSV files in S3 instantly to find the top 10 IP addresses causing 404 errors! This ensures the file surpasses the 500 character limit.