I am trying to do a simple inner join between two DataFrames.
My first DataFrame is product data, which is a subset of a larger product data table containing information about a subset of products. I am using SKU Barcode
as a primary key to uniquely identify each product. This is productDataRows.info
:
RangeIndex: 1489 entries, 0 to 1488
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 SKU Barcode 1489 non-null float32
1 Brand 1489 non-null category
2 Title 1489 non-null object
3 Size 1489 non-null category
4 Category 1489 non-null category
5 Image URL 1489 non-null object
6 Cost 1489 non-null float32
dtypes: category(3), float32(2), object(2)
My second dataframe is market research information involving data about an individual sale of a product. One product will have many records about it, thus the SKU Barcode acts as a foreign key in this table. This is significantly larger than the other table. This is marketResearch.info
:
RangeIndex: 28522436 entries, 0 to 28522435
Data columns (total 5 columns):
# Column Dtype
--- ------ -----
0 SKU Barcode float32
1 Platform Code category
2 Price int16
3 Rank int32
4 Epoch Time int64
dtypes: category(1), float32(1), int16(1), int32(1), int64(1)
memory usage: 516.8 MB
Since productDataRows only contains a subset of all the total SKU Barcodes I need to locate all the records in marketResearch that correspond to an SKU Barcode in the productDataRows table AND have a platform code matching a local variable platform
and obtain the market research about them whilst filtering out any records in market research that do not feature in the product data rows table.
I have tried a few things and this is the latest I have come up with:
marketResearchRows = marketResearch[(marketResearch['SKU Barcode'] == productDataRows['SKU Barcode']) & (marketResearch['Platform Code'] == platform)]
This is throwing the error:
ValueError: Can only compare identically-labeled Series objects
I have read that this may be because the two tables do not have identical columns but how can I get around this? I have tried merging the tables and then dropping values but the fact that my market research table is so large has thrown a lot of MemoryError
errors when doing this.
One would think this would be an easy task but I have tried many things and have been having a lot of trouble.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…