Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

pandas - Slow loop python to search data in antoher data frame in python

I have two data frames : one with all my data (called 'data') and one with latitudes and longitudes of different stations where each observation starts and ends (called 'info'), I am trying to get a data frame where I'll have the latitude and longitude next to each station in each observation, my code in python :

for i in range(0,15557580):
    for j in range(0,542):
         if data.year[i] == '2018' and data.station[i]==info.station[j]:
             data.latitude[i] = info.latitude[j]
             data.longitude[i] = info.longitude[j]
             break

but since I have about 15 million observation , doing it, takes a lot of time, is there a quicker way of doing it ?

Thank you very much (I am still new to this)

edit :

my file info looks like this (about 500 observation, one for each station)

enter image description here

my file data like this (theres other variables not shown here) (about 15 million observations , one for each travel)

enter image description here

and what i am looking to get is that when the stations numbers match that the resulting data would look like this :

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is one solution. You can also use pandas.merge to add 2 new columns to data and perform the equivalent mapping.

# create series mappings from info
s_lat = info.set_index('station')['latitude']
s_lon = info.set_index('station')['latitude']

# calculate Boolean mask on year
mask = data['year'] == '2018'

# apply mappings, if no map found use fillna to retrieve original data
data.loc[mask, 'latitude'] = data.loc[mask, 'station'].map(s_lat)
                                 .fillna(data.loc[mask, 'latitude'])

data.loc[mask, 'longitude'] = data.loc[mask, 'station'].map(s_lon)
                                  .fillna(data.loc[mask, 'longitude'])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...