Spatial Autocorrelation Estimation Method

Dear Roger,

Thank you for your reply. I disabled HTML; my e-mails should be now in plain text.

I will give a better context for my desired outcome.

I am taking Airbnb's listings information for New York City available on: http://insideairbnb.com/get-the-data.html

I save every listings.csv.gz file available for NYC (2015-01 to 2019-09) - in total, 54 files/time periods - as a YYYY-MM-DD.csv file into a Listings/ folder. When importing all these 54 files into one single data set, I create a new "date_compiled" variable/column.

In total, after the data cleansing process, I have a little more 2 million observations.

I created 54 timedummy variables for each time period available.

I want to estimate using a hedonic spatial timedummy model the impact of a variety of characteristics which potentially determine the daily rate on Airbnb listings through time in New York City (e.g. characteristics of the listing as number of bedrooms, if the host if professional, proximity to downtown (New York City Hall) and nearest subway station from the listing, income per capita, etc.).

My dependent variable is price (log price, common in the related literature for hedonic prices).

The OLS model is done.

For the spatial model, I am assuming that hosts, when deciding the pricing of their listings, take not only into account its structural and location characteristics, but also the prices charged by near listings with similar characteristics - spatial
autocorrelation is then present, at least spatial dependence is present in the dependent variable.

As I wrote in my previous post, I was willing to consider the neighbor itself as a neighbor.

Parts of my code can be found below:

########

## packages

packages_install <- function(packages){
 new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]
 if (length(new.packages))
 install.packages(new.packages, dependencies = TRUE)
 sapply(packages, require, character.only = TRUE)
}

packages_required <- c("bookdown", "cowplot", "data.table", "dplyr", "e1071", "fastDummies", "ggplot2", "ggrepel", "janitor", "kableExtra", "knitr", "lubridate", "nngeo", "plm", "RColorBrewer", "readxl", "scales", "sf", "spdep", "stargazer", "tidyverse")
packages_install(packages_required)

# Working directory
setwd("C:/Users/User/R")

## shapefile_us

# Shapefile zips import and Coordinate Reference System (CRS) transformation
# Shapefile download: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip
shapefile_us <- sf::st_read(dsn = "Shapefile", layer = "cb_2018_us_zcta510_500k")

# Columns removal
shapefile_us <- shapefile_us %>% select(-c(AFFGEOID10, GEOID10, ALAND10, AWATER10))

# Column rename: ZCTA5CE10
setnames(shapefile_us, old=c("ZCTA5CE10"), new=c("zipcode"))

# Column class change: zipcode
shapefile_us$zipcode <- as.character(shapefile_us$zipcode)

## polygon_nyc

# Zip code not available in shapefile: 11695
polygon_nyc <- shapefile_us %>% filter(zipcode %in% zips_nyc)

## weight_matrix

# Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)
polygon_nyc_nb <- poly2nb((polygon_nyc %>% select(-borough)), queen=TRUE)

# Include neighbour itself as a neighbour
# for(i in 1:length(polygon_nyc_nb)){polygon_nyc_nb[[i]]=as.integer(c(i,polygon_nyc_nb[[i]]))}
polygon_nyc_nb <- include.self(polygon_nyc_nb)

# Weights to each neighboring polygon
lw <- nb2listw(neighbours = polygon_nyc_nb, style="W", zero.policy=TRUE)

## listings

# Data import
files <- list.files(path="Listings/", pattern=".csv", full.names=TRUE)
listings <- setNames(lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE, encoding="UTF-8")), files)
listings <- mapply(cbind, listings, date_compiled = names(listings))
listings <- listings %>% bind_rows

# Characters removal
listings$date_compiled <- gsub("Listings/", "", listings$date_compiled)
listings$date_compiled <- gsub(".csv", "", listings$date_compiled)
listings$price <- gsub("\\$", "", listings$price)
listings$price <- gsub(",", "", listings$price)

## timedummy

timedummy <- sapply("date_compiled_", paste, unique(listings$date_compiled), sep="")
timedummy <- paste(timedummy, sep = "", collapse = " + ")
timedummy <- gsub("-", "_", timedummy)

## OLS regression

# Pooled cross-section data - Randomly sampled cross sections of Airbnb listings price at different points in time
regression <- plm(formula=as.formula(paste("log_price ~ #some variables", timedummy, sep = "", collapse = " + ")), data=listings, model="pooling", index="id")

########

Some of my id's repeat in multiple time periods.

I use NYC's zip codes to left join my data with the neighborhood zip code specific characteristics, such as income per capita to that specific zip code, etc.

Now I want to apply the hedonic model with the timedummy variables.

Do you know how to proceed? 1) Which package to use (spdep/splm)?; 2) Do I have to join the polygon_nyc (by zip code) to my listings data set, and then calculate the weight matrix "lw"?

Again, thank you very much for the help provided until now.

Best regards,
Robert

Spatial Autocorrelation Estimation Method

Thread (12 messages)