DataFrame¶
Get layer as pandas DataFrame¶
LayersSet
provides a method to create a pandas DataFrame from a osgeo.ogr.Layer
. Layer zero is the default layer number.
The DataFrame:
- uses the feature ID as index
- has a special column named
_GEOM_
, which contains all information about the layer.
Note
If you intent to convert the DataFrame back to layer
- do not remove the
_GEOM_
column.- Do not rename the columns if the ogr field types should to be maintained
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
df = lrs.data_frame()
print df
_GEOM_ ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 \
FID
0 Polygon 86 DEU Germany 1 Baden-Württemberg 1
1 Polygon 86 DEU Germany 1 Baden-Württemberg 1
... ... ... ... ... ... ... ...
11300 Polygon 86 DEU Germany 16 Thüringen 402
11301 Polygon 86 DEU Germany 16 Thüringen 403
[11302 rows x 17 columns]
Show layers as pandas DataFrame¶
The method show()
creates a DataFrame
object and sets some display properties:
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
lrs.show(width=300, max_rows=6)
_GEOM_ ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 \
FID
0 Polygon 86 DEU Germany 1 Baden-Württemberg 1
1 Polygon 86 DEU Germany 1 Baden-Württemberg 1
2 Polygon 86 DEU Germany 1 Baden-Württemberg 1
3 Polygon 86 DEU Germany 1 Baden-Württemberg 1
4 Polygon 86 DEU Germany 1 Baden-Württemberg 1
... ... ... ... ... ... ... ...
11297 Polygon 86 DEU Germany 16 Thüringen 402
11298 Polygon 86 DEU Germany 16 Thüringen 402
11299 Polygon 86 DEU Germany 16 Thüringen 402
11300 Polygon 86 DEU Germany 16 Thüringen 402
11301 Polygon 86 DEU Germany 16 Thüringen 403
[11302 rows x 17 columns]
pandas methods can be used:
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
df = lrs.data_frame()
df_nrw = df[df['NAME_1']=='Nordrhein-Westfalen']
df_nrw = df_nrw.drop(['ID_0', 'ISO', 'NAME_0', 'ID_1'], axis=1)
print df_nrw
_GEOM_ NAME_1 NAME_3
FID
5946 Polygon Nordrhein-Westfalen Bielefeld
5947 Polygon Nordrhein-Westfalen Bochum
5948 Polygon Nordrhein-Westfalen Bonn
5949 Polygon Nordrhein-Westfalen Ahaus
5950 Polygon Nordrhein-Westfalen Bocholt
... ... ... ...
6337 Polygon Nordrhein-Westfalen Sonsbeck
6338 Polygon Nordrhein-Westfalen Voerde (Niederrhein)
6339 Polygon Nordrhein-Westfalen Wesel
6340 Polygon Nordrhein-Westfalen Xanten
6341 Polygon Nordrhein-Westfalen Wuppertal
[396 rows x 3 columns]
_GEOM_ column¶
The column _GEOM_ is central in the DataFrame: Behind each column cell there is an object
DataFrameFeature
, which contains the osgeo.ogr.Layer
and the corresponding feature id of the row.
With that, any attribute or method described in http://gdal.org/python/osgeo.ogr.Geometry-class.html.
can be used:
AddGeometry, AddGeometryDirectly, AddPoint, AddPointM, AddPointZM, AddPoint_2D, Area,
AssignSpatialReference, Boundary, Buffer, Centroid, Clone, CloseRings, Contains, ConvexHull,
CoordinateDimension, Crosses, DelaunayTriangulation, Destroy, Difference, Disjoint, Distance,
Empty, Equal, Equals, ExportToGML, ExportToIsoWkb, ExportToIsoWkt, ExportToJson, ExportToKML,
ExportToWkb, ExportToWkt, FlattenTo2D, GetArea, GetBoundary, GetCoordinateDimension,
GetCurveGeometry, GetDimension, GetEnvelope, GetEnvelope3D, GetGeometryCount, GetGeometryName,
GetGeometryRef, GetGeometryType, GetLinearGeometry, GetM, GetPoint, GetPointCount, GetPointZM,
GetPoint_2D, GetPoints, GetSpatialReference, GetX, GetY, GetZ, HasCurveGeometry, Intersect,
Intersection, Intersects, Is3D, IsEmpty, IsMeasured, IsRing, IsSimple, IsValid, Length, Overlaps,
PointOnSurface, Segmentize, Set3D, SetCoordinateDimension, SetMeasured, SetPoint, SetPointM,
SetPointZM, SetPoint_2D, Simplify, SimplifyPreserveTopology, SymDifference, SymmetricDifference,
Touches, Transform, TransformTo, Union, UnionCascaded, Value, Within, WkbSize, next
The method DataFrameFeature.apply(method)
applies the given method to all geometries in the column _GEOM_.
Example: area calculation
The example below shows how to calculate areas in km²:
- Read layers with LayersReader
- Transform the coordinate system from WGS84 into UTM Zone32 and return a LayersWriter in memory.
- The tranformed LayersWriter instance is transformed into a data frame
- The method ‘GetArea’ is applied to each geometry g. The returned values are saved in a new DataFrame column ‘area_km2’
- Print only columns ‘GEOM’, ‘NAME_4’, and ‘area_km2’
1 2 3 4 5 | lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp') # Read
lrs = lrs.transform(epsg=32632) # Transform to UTM Zone 32 and return in 'Memory' (RAM)
df = lrs.data_frame() # to DataFrame
df['area_km2'] = df['_GEOM_'].apply(lambda g: g.apply('GetArea') / 1000000.0)
print df[['_GEOM_', 'NAME_4', 'area_km2']]
|
geom NAME_4 area_km2
0 Polygon Allmendingen 46.050899
1 Polygon Altheim 7.711316
2 Polygon Berghülen 26.007805
3 Polygon Blaubeuren 79.237158
4 Polygon Blaustein 54.932116
... ... ... ...
11297 Polygon Sachsenhausen 4.862127
11298 Polygon Schwerstedt 6.889861
11299 Polygon Vippachedelhausen 10.378227
11300 Polygon Wohlsborn 4.077278
11301 Polygon Weimar 84.403305
[11302 rows x 3 columns]
Example: distance calculation
The example below shows how to calculate distances in km of each city to the centroid of the city Wuppertal applying the method
Distance
:
# Read, transform to UTM Zone32, return in 'Memory' (RAM), get DataFrame and filter it
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp').transform(epsg=32632)
df = lrs.data_frame()[['geom', 'NAME_4']] # to DataFrame
# Get the centroid of the City Wuppertal as ``osgeo.ogr.Geometry``
geom_wuppertal = df[df['NAME_4']=='Wuppertal'].geom.apply(lambda g: g.apply('Centroid')).iloc[0]
# Apply the method ``osgeo.ogr.Geometry.Distance`` and convert distance to km
df['distWupper'] = df['geom'].apply(lambda g: g.apply('Distance', geom_wuppertal) / 1000.0)
print df[['geom', 'NAME_4', 'distWupper']]
geom NAME_4 distWupper
0 Polygon Allmendingen 367.195868
1 Polygon Altheim 372.542163
2 Polygon Berghülen 358.381501
3 Polygon Blaubeuren 361.697021
4 Polygon Blaustein 360.106787
... ... ... ...
11297 Polygon Sachsenhausen 293.474305
11298 Polygon Schwerstedt 287.096263
11299 Polygon Vippachedelhausen 281.862430
11300 Polygon Wohlsborn 293.756043
11301 Polygon Weimar 285.625870
[11302 rows x 3 columns]
Save DataFrame as layers¶
from girs.feat.layers import data_frame_to_layer
lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp') # Read
print lrs.get_field_definitions_data_frame()
lrs = data_frame_to_layer(lrs.data_frame()) # , 'D:/tmp/girs/DEU_adm_shp/DEU_adm4_from_df.shp')
print lrs.get_field_definitions_data_frame()