I'm testing this architecture: Kinesis Firehose → S3 → Glue → Athena. For now I'm using dummy data which is generated by Kinesis, each line looks like this: {"ticker_symbol":"NFLX","sector":"TECHNOLOGY","change":-1.17,"price":97.83}
However, there are two problems. First, a Glue Crawler creates a separate table per file. I've read that if the schema is matching Glue should provide only one table. As you can see in the screenshots below, the schema is identical. In Crawler options, I tried ticking Create a single schema for each S3 path
on and off, but no change.
Files also sit in the same path, which leads me to the second problem: when those tables are queried, Athena doesn't show any data. That's likely because files share a folder - I've read about it here, point 1, and tested several times. If I remove all but one file from S3 folder and crawl, Athena shows data.
Can I force Kinesis to put each file in a separate folder or Glue to record that data in a single table?
File1:
File2:
question from:
https://stackoverflow.com/questions/65937967/glue-crawler-is-not-combining-data-also-no-visible-data-in-tables 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…