Monday, April 18, 2016

GIS 5935 Lab 15: Dasymetric Mapping

Dasymetric mapping is essentially breaking down larger aggregations of data into smaller areas or units. It is often used in population density estimates. For example, it could be the process of taking aggregated statewide population data and breaking it down to the county level. This week's assignment was along those lines. For this analysis, I was to take raster data and use its imperviousness to estimate prospective student populations for eight high schools. This was done by joining the raster's zonal statistics table to census vector data, then running an Ordinary Least Squares analysis on that to determine the estimated population. From there, I had to clip the OLS result to each school boundary and determine its area and new estimated population. The results of my estimated population vs. a reference, "true" population can be seen below:

School Reference Population Estimated Population Error Abs(Error)
Hagerty 4706 4214 492 492
Lake Brantley 6313 6094 219 219
Seminole 11776 10881 895 895
Winter Springs 5693 3863 1830 1830
Lyman 7853 8477 -624 624
Oviedo 4780 4750 30 30
Lake Howell 8585 6561 2024 2024
Lake Mary 5014 4885 129 129
Total: 54720 49725 4995 6243
Accuracy (%): 11.40899123

Tuesday, April 12, 2016

GIS5935 Lab 14: Modifiable Areal Unit Problem

A prime example of the modifiable areal unit problem is the delineation of political districts. Gerrymandering occurs when boundaries for political districts are manipulated to help gain a particular advantage. The purpose of this analysis was to measure gerrymandering affected the boundaries of a set of districts. The two ways they are affected are compactness, where the geometric boundaries take on unusual shapes, and community, where counties or other communities are divided into multiple districts. 

In this analysis, to measure the compactness of the districts to find the ones with the oddest geometric properties, I created a ratio of the perimeter of each boundary vs. the area, and found the ten worst districts based on compactness. To measure the community, I had to determine which districts broke up the boundaries of the most counties, while excluding any districts that broke up a county but fell completely within a county to account for higher population densities. The first example below shows compactness, and the second shows community:






Monday, April 4, 2016

GIS5935 Lab 13: Effects of Scale

This analysis consisted of comparing two DEMs, one from LIDAR and one from SRTM. Both were set to the same coordinate system and the same cell size of 90 meters, then compared a number of ways. To compare the two DEMS, I first looked at the differences in elevation and slope. I checked the minimum and maximum elevation for each one, as well as the minimum, maximum and average slope. I also compared the aspect of each of them to find any major differences in the direction that each cell faced.

The slope and elevation differences can be seen here:

LIDAR SRTM
Maximum Elevation 1063.67 1053
Minimun Elevation 4.31 12
Maximum Slope 50.46 45.77
Minimum Slope 0.89 1.53
Average Slope 29.65 27.49

The differences in aspect can be seen in this comparison:

The comparison of the two DEMs resulted in a slight, noticeable difference in each area. For example, the maximum elevation for the LIDAR data was 1063 meters, while the SRTM data was ten meters lower at 1053. The minimum elevations were also 4.3 and 12 meters, respectively. The image above also shows the differences in aspect that can be seen throughout each DEM. These can possibly be explained by how each DEM was developed. Since the LIDAR data was developed from instruments much closer to the ground than the SRTM (which was collected from a space shuttle), subtle differences in aspect are bound to arise.
 There was also a slight difference in slope between the two, LIDAR having an average 29.65 degree slope and SRTM having a 27.49 degree slope. This again could probably be attributed to how each set of data was developed. Given how the ground is detected using LIDAR at a much more specific level, as opposed to SRTM, which was an entire earth-encompassing project, there is likely to be a little more generalization throughout the STRM data. The higher slope average of the LIDAR data suggests slightly less generalization and therefore slightly more accuracy.


Wednesday, March 30, 2016

GIS5935 Lab 12: Spatial Regression in ArcGIS

Spatial regression using tools in ArcGIS was the topic for this week. The two tools used were Ordinary Least Squares and Geographically weighted regression. The second part of the lab consisted of carrying out a regression analysis using both tools and comparing the results. Two shapefiles were used as the data for this, one consisting of the locations of all crimes reported for a specific county in one year, and the other consisting census tracts for the county with different demographic variables. From there, one crime was selected on which to perform the analysis, then 3 variables were also selected. Then the crime rate for each census tract was determined to be used at the dependent variable

First, the OLS tool was ran on the selected variables. Then the GWR tool was ran on the same variables. For the GWR, I tried using both adaptive and fixed kernel types, then ran the Global Moran's I tool on both results to see which one resulted in a better performing model. Using the adaptive kernel type seemed to work better. From there I read through the statistics of all of the results to compare and see how the model improved.

The GWR improved on the OLS a good bit. The adjusted R-squared improved from about 35% with the OLS to about 40% with the GWR. The AIC was also lowered from 1916 to 1909.

Tuesday, March 22, 2016

GIS 5935 Lab 11: Regression in ArcGIS

This week's assignment consisted of using tools in ArcGIS to perform a regression analysis. Previously, similar assessments to these had been carried out in Excel, but ArcGIS takes it a couple steps further. While in Excel you can determine all of the necessary elements for a regression analysis such as correlation, adjusted R-squared, P-value and everything else for all of your variables, ArcGIS helps to determine the performance of the analysis and which variables should be included or excluded.

The performance of the model is determined using the Ordinary Least Squares tool, which generates a regression analysis using dependent and independent variables from a feature class. The results of this tool can be viewed to help to determine which variables work better, and which ones could be biased or redundant and should possibly be excluded. One way this tool works very well is how it analyzes the residuals and determines spatial autocorrelation and whether explanatory variables are missing. If there is an issue, it advises to use the Spatial Autocorrelation tool on the residuals, which tells whether the residuals are randomly distributed or clustered. This is an especially useful tool for improving models because it's a simple, straightforward method to determine the distribution of each variable and can help pinpoint issues and potential problems with the regression analysis.

Monday, March 14, 2016

GIS 5935 Lab 10: Bivariate Regression

For this assignment, I used a regression analysis to determine the missing rainfall data for Station A between 1931 and 1949. In order to accomplish this, I first had to determine the slope and the intercept coefficient for the relationship between the two sets of available data for the variables. Then I multiplied the the slope with the value of Station B for each year, and added the intercept coefficient to determine the rainfall that was missing for each year for Station A. While something like rainfall is impossible to precisely predict, the statistics used here could be very useful in similar scenarios. It's not very different from recent previous assignments, like surface interpolation. It may not be precise, but it gives you a good idea of what the reality probably is or was.

The results can be seen below:


Year Station B Station A
1931 1005.84 1013.45
1932 1148.08 1133.81 Slope: 0.846171
1933 691.39 747.37 Intercept:  162.3421
1934 1328.25 1286.27
1935 1042.42 1044.40
1936 1502.41 1433.64
1937 1027.18 1031.51
1938 995.93 1005.07
1939 1323.59 1282.33
1940 946.19 962.98
1941 989.58 999.70
1942 1124.60 1113.94
1943 955.04 970.47
1944 1215.64 1190.98
1945 1418.22 1362.40
1946 1323.34 1282.11
1947 1391.75 1340.00
1948 1338.97 1295.34
1949 1204.47 1181.53

Monday, March 7, 2016

GIS5935 Lab 8: Surface Interpolation

In the first part of this lab, I compared the results of using Spline and IDW interpolation methods to create a Digital Elevation Model. This was done by using elevation data points as the input for each technique, then running each tool with the same parameters and comparing the results.

Overall, the difference in the results of the two interpolation methods wasn't extremely substantial, but there were some notable differences. Throughout a majority of the data, the difference in elevation between the two datasets is anywhere from 2 to 12 feet. But there are also several places where the difference is 30 to 40 feet. The areas with these larger differences, however, are mostly the areas without elevation data points, which shows how each interpolation process will give slightly different results. 

Below is map layout that shows the areas of difference in the two methods:






GIS5935: DEM Accuracy

The purpose of this lab was to determine the accuracy of a Digital Elevation Model. In order to do this, first, "true" test points had to be acquired. For this project, this consisted of field data collected using high-accuracy survey methods. The test points were essentially combined with the DEM using the Extract Values to Points tool in ArcGIS. Then, the elevation of the DEM at each point was compared to the true elevation. Using an Excel spreadsheet, the DEM's elevation was subtracted from the field data to find the difference, and this was used to find statistics for the Room Mean Square Error, the 95th percentile, and the 68th percentile. The results can be observed and compared to each Land Cover classification to find trends and consistencies within each type and help to determine any bias. Below are the results for this particular analysis:

Land Cover:                         A                             B                             C
Sample Size:                       48                           55                           45          
Accuracy 68th:                    0.001                     0.023                     0.049                               
Accuracy 95th:                    0.027                     0.194                     0.233
RMSE:                                   0.105                     0.181                     0.246    
                                              
Land Cover:                        D                             E                              Combined
Sample Size:                       98                           41                           287
Accuracy 68th:                    0.051                     0.035                     0.029
Accuracy 95th:                    0.214                     0.147                     0.185

RMSE:                                   0.394                     0.199                     0.276     

Monday, February 22, 2016

GIS 5935: TINs and DEMs

This week's assignment was working with TINs and DEMs, with emphasis on the differences between the two. The TIN is an interesting data model, being made up of a network of triangles based on elevation points. Each triangle can vary in elevation, but slope and aspect remain the same throughout each one. The number and location of elevation points used to create the TIN are important. In areas with more topographical variance, it's necessary to use more data points, whereas in generally flat areas, not as many are needed. This creates a network of triangles of varying sizes.

The image below is an example of a typical TIN, with symbology shown for the nodes, edges, and contours.


Tuesday, February 16, 2016

Location-Allocation Analysis

This week's lab was a location-allocation analysis. To carry this out, we were first supposed to run a location-allocation analysis in the Network Analyst extension using a given set of data for distribution centers as the Facilities and customers as the Demand Points. In these results, however there was a number of customers that fell into market areas that differed from the distribution centers. To correct this, we were to reassign the market areas.

To carry out this part of the analysis, I had to use data provided that joined the demand points and the market areas and join this data to the demand points from the Network Analysis to determine which customers were assigned to a different distribution center. From there, I had to use the Summary Statistics tool to count each combination of Facilities and Market Areas to determine the facility that had the most customers in each market area. Finally, I had to create a new feature class for the newly assigned market areas.

Below is an image of the feature class of the new market areas:


Monday, February 8, 2016

GIS5935 Lab 5: Vehicle Routing Problem

For this analysis, we were to run a vehicle routing problem for a day's worth of pickups for a distribution center in south Florida. To do this, we were to first use 14 of the 22 routes (trucks), one depot which was the distribution center, 14 route zones, and there was a total of 128 orders (pickups). Once the VRP was solved with this data, it the analysis was run again to compare the difference with the use of two more trucks. This was done by changing the properties of two more routes to be included in the analysis.

The addition of two more trucks made quite an improvement. First, the original analysis left 6 orders unassigned, while the second one didn't leave any. Also, the second analysis only had one time violation, compared to the 10 that the first one had. The revenue also increased in the second analysis, from 32,000 to 33,625.

The image below shows following the addition of two more trucks:


Sunday, January 31, 2016

GIS5935 Lab 2: Determining Quality of Road Networks

This lab was a test for the horizontal accuracy of two network datasets for the city of Albuquerque, New Mexico. The first one is a StreetMap USA network, compiled from TIGER 2000 data. The second one, ABQ_Streets is centerline data provided by the City of Albuquerque Planning Department. The independent dataset used was a digitized shapefile for all of the test points (27 intersections), that was created using digital orthophotos from 2006. First, 27 well-defined locations were determined that were intersections in both datasets. Then the "true" intersection was determined using the orthophotos. Then the X and Y coordinates for all three datasets (independent, StreetmapUSA, and ABQ_Streets) were added and and the NSSDA statistics were determined. Sampling locations can be seen below:



Horizontal Positional Accuracy:
Using the national standard for spatial data accuracy, the ABQ_Streets dataset tested 22.908 feet horizontal accuracy at 95% confidence level.

the StreetMap_USA dataset tested 147.857 feet horizontal accuracy at 95% confidence level.

Monday, January 25, 2016

GIS5935: Completeness of Road Networks

This assessment was meant to determine and compare the completeness of two road networks for the same county.
First, the total length of each road network was determined. From there, a grid polygon was used to determine the completeness on a more specific level. This was carried out by using a combination of spatial analysis tools found in ArcGIS including Intersect, Dissolve, and Spatial Join tools. Once the completeness for both networks in each polygon was determined, the differences in length between each network were found for every polygon, and can be seen in the choropleth map below.


Monday, January 11, 2016

GIS5935 Lab 1: Accuracy & Precision

This lab dealt with measuring horizontal accuracy and precision. To measure horizontal precision, an "average" location has to be determined from the given data. After that, distances from the observations to that average must be measured, in order to find what distance corresponds to a specified percentage of the observations or given data. To measure horizontal accuracy, the "true" location must be determined or given, and then the distance to the average needs to be measured.

An example of precision estimates:
In this case, horizontal and vertical precision are at 4.4 and 5.7 meters, respectively, while the horizontal and vertical accuracy have 3.25 and 5.96 meter discrepancies, respectively.