본문 바로가기

학교공부/pandas

pandas airbnb 실습

airbnb data로 set 만들기

 

1. Dataframe: listing each room_id, host_id with total score in two sorting ways 1) index = (room_id, host_id) 2) column = total_score: overall_satisfaction + reviews * 0.378 3) output = 1. sorted total_score in ascending 2. sorted total_score in descending = sorted_total_score_ascend.csv, sorted_total_score_descend.csv

 

df_1 = df_air
df_1['total_score'] = df_air['overall_satisfaction']+df_air['reviews']*0.378
df_1 = df_1.filter(['total_score'])
df_1.index = df_air[['room_id', 'host_id']].apply(tuple, axis=1)
df_1

 

 

2. Dataframe: listing average of factors by grouped neighborhood 1) index = (neighborhood) 2) column = avg of reviews | avg of overall_satisfaction | avg of price | max of reviews | min of reviews | max of price | min of price 3) output = 1. sorted neighborhood in ascending = sorted_neighborhood_factors.csv

 

group

df_2 = df_air[['neighborhood','reviews','overall_satisfaction','price']]


df_2.groupby(['neighborhood']).agg({'reviews':['mean','max','min'],
                                    'overall_satisfaction':['mean'],              
                                    'price':['mean','max','min']
                                   })


 

 

 

 

3. cut


#3번

df3 = df_air
bins = [-1,100,200,300,400,500,1000,5000]
labels=['0-100','100-200','200-300','300-400','400-500','500-1000','1000-5000']

test = df3.groupby(pd.cut(df3['price'], bins=bins, labels=labels)).agg({'accommodates':['mean','median','min'],
                                    'bedrooms':['mean','median'],              
                                    'reviews':['mean','median'],
                                    'neighborhood':lambda x: list(x),
                                    
                                   })
test['length'] = test['neighborhood']['<lambda>'].str.len()

test.columns = ["_".join(x) for x in test.columns.ravel()]

file_name = "./sort_ranged_price.csv"
test.to_csv(file_name)
test




 

 

 

 

 

 

4. plot subplot

plt.figure(figsize=(10,16))
plt.subplot(411)
df2['reviews_mean'].plot()
plt.subplot(412)
df2['overall_satisfaction_mean'].plot()

plt.subplot(413)
df2['price_mean'].plot()