代写数据分析作业,对手机市场数据进行分析。
Introduction
This assignment broadly deals with location-based mobile marketing. You have
data from a location-based marketing agency which handles geo-fencing
campaigns on behalf of advertisers. Due to the very large volume of data, you
are given a random sample for two campaigns of a single advertiser - AMC
Theaters. The advertising impressions are inserted into the mobile app being
used on the device. The data include the following elements: impression size
(e.g., 320x50 pixels), app category (e.g., IAB1), app review volume and
valence, device OS (e.g., iOS), geo-fence lat/long coordinates, mobile device
lat/long coordinates, and click outcome (0 or 1). The column names are self-
explanatory, although we have provided a data dictionary file on Canvas.
Analysis
Data Processing
- a. Create dummy variable imp_large for the large impression
- b. Create dummy variables cat_entertainment, cat_social and cat_tech for app categories
- c. Create dummy variable os_ios for iOS devices
- d. Create variable distance using Harvesine formula to calculate the distance for a pair of latitude/longitude coordinates. Distance (in kilometers) = 6371 * acos( cos( radians(LATITUDE1) ) * cos( radians( LATITUDE2 ) ) * cos( radians( LONGITUDE1 ) - radians(LONGITUDE2) ) + sin( radians(LATITUDE1) ) * sin( radians( LATITUDE2 ) ) )
- e. Create variable distance_squared by squaring variable distance
- f. Create variable ln_app_review_vol by taking natural log of app_review_vol
Descriptive Statistics
- a. Summarize the data by calculating the summary statistics (i.e., mean, median, std. dev., minimum and maximum) for didclick, distance, imp_large, cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and app_review_val.
- b. Report the correlations among the above variables.
- c. Plot the relationship of distance (x-axis) and click-through-rate (y-axis), and any other pairs of variables of interest.
Logistics Regression
- a. Specify the following Logistic regression model:
Dependent variable: didclick
Independent variables: distance, distance_squared, imp_large,
cat_entertainment, cat_social, cat_tech, os_ios, ln_app_review_vol and
app_review_val. - b. Estimate the model in R (using the glm function) and report coefficients and p-value of the estimates. - c. Discuss your findings and their implications, limiting your answer to a page or so.