Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

aws glue - AWS Crawler S3 Target Path Changes But Old Path Tables Included

I have a AWS Crawler which I am switching the s3 target path in order to switch the underlying table source. The problem is that the tables are being created from both targets:

configuration:

aws glue get-crawler --name sand-main 
{
    "Crawler": {
        "Name": "sand-main",
        "Role": "Crawler-sand",
        "Targets": {
            "S3Targets": [
                {
                    "Path": "s3://sand-main-green/main",
                    "Exclusions": [
                        "checkpoints/**",
                        "IsActive.txt",
                        "isactive.txt"
                    ]
                }
            ],
            "JdbcTargets": [],
            "MongoDBTargets": [],
            "DynamoDBTargets": [],
            "CatalogTargets": []
        },
        "DatabaseName": "sand_main",
        "Description": "",
        "Classifiers": [],
        "RecrawlPolicy": {
            "RecrawlBehavior": "CRAWL_EVERYTHING"
        },
        "SchemaChangePolicy": {
            "UpdateBehavior": "UPDATE_IN_DATABASE",
            "DeleteBehavior": "DELETE_FROM_DATABASE"
        },
        "LineageConfiguration": {
            "CrawlerLineageSettings": "DISABLE"
        },
        "State": "READY",
        "CrawlElapsedTime": 0,
        "CreationTime": "2020-09-30T14:07:25-06:00",
        "LastUpdated": "2021-01-28T11:32:15-07:00",
        "LastCrawl": {
            "Status": "SUCCEEDED",
            "LogGroup": "/aws-glue/crawlers",
            "LogStream": "sand-main",
            "MessagePrefix": "5bb1907d-2847-46ef-8712-3a50deb2b7a0",
            "StartTime": "2021-01-28T11:32:35-07:00"
        },
        "Version": 24,
        "Configuration": "{"Version":1.0,"CrawlerOutput":{"Partitions":{"AddOrUpdateBehavior":"InheritFromTable"}},"Grouping":{"TableGroupingPolicy":"CombineCompatibleSchemas"}}"
    }
}

The path I have a lambda that will switch from: "Path": "s3://sand-main-green/main" To: "Path": "s3://sand-main-blue/main"

But I end up with tables:

Name -> Location
test -> s3://sand-main-blue/main/test

test_2398l50df -> s3://sand-main-green/main/test

I have DELETE_IN_DATABASE so I would expect the old s3 paths to be deleted. It feels like the crawler retains the history of its s3 targets. I do not want this behavior

question from:https://stackoverflow.com/questions/65943326/aws-crawler-s3-target-path-changes-but-old-path-tables-included

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...