Understanding the format behind GML spatial data files
and loading them into Python
Data Cleaning & Preparation
Vocabulary, then a short RCN GML fragment. Syntax and namespaces follow in later sections.
Data Cleaning & Preparation
Hierarchical text · elements and attributes
XML is plain-text markup: nested elements, optional attributes, one document tree. You define tags or adopt a standard (e.g. GML); a parser builds the tree.
Data Cleaning & Preparation
Beyond this course
.docx / .xlsx), RSS, Android layouts, build filesData Cleaning & Preparation
Declaration (line 1) · comments (2–6, ignored by parsers) · <gml:boundedBy … /> = empty element
Data Cleaning & Preparation
One example record (LOK_1) as CSV, JSON, and XML. Later material assumes XML because GML is XML.
Data Cleaning & Preparation
Comma-separated values · flat tables
Comma-Separated Values — one row per record, columns for fields. Typical for spreadsheets and SQL exports.
id,pow,cena
LOK_1,60.25,750000
Data Cleaning & Preparation
JavaScript Object Notation · nested objects
JSON — objects { }, arrays [ ], key–value pairs. Common in REST APIs and document stores.
{
"id": "LOK_1",
"pow": 60.25,
"uom": "m2",
"cena": 750000
}
Data Cleaning & Preparation
eXtensible Markup Language · tagged trees
XML — elements in angle brackets, attributes, a strict document tree. Vocabularies such as GML are defined with schemas (e.g. XSD).
uom="m2" can sit on the same element as the value.Data Cleaning & Preparation
Elements and attributes — toy examples first, then the same ideas in RCN GML.
Data Cleaning & Preparation
Opening tag · content · closing tag
<city>Poznań</city>
Data Cleaning & Preparation
Same structure — longer names, namespace prefix
rcn: is a namespace prefix — the element is still opening tag · text · closing tag.
Full meaning of prefixes in the namespaces section.
Data Cleaning & Preparation
Bare value → unit metadata on the opening tag
<pow>60.25</pow>
60.25 has no unit (m², ha, km²…) — for analysis, units must travel with the number, not only in a separate column or comment.
Put metadata on the same element
<pow uom="m2">60.25</pow>
Value unchanged; uom="m2" is unit of measure (real pattern from RCN_Lokal).
Syntax: name="value" inside <…>; quotes; several attributes = space-separated (order does not matter).
Data Cleaning & Preparation
Fragments of real geometry and property fields
srsName, gml:id — CRS + stable idcount, srsDimension — how many coordinates, 2D vs 3Duom — units (check before comparing numbers)xsi:nil — “no value here” (e.g. missing geometry)Data Cleaning & Preparation
One root element, parent–child nesting, and paths to data — then a real RCN_Lokal fragment.
Data Cleaning & Preparation
Not a single table — a hierarchy of elements
Each element has at most one parent (except the root, which has none). Children are fully inside the parent — that nesting is how “this address belongs to this flat” is represented.
gml:FeatureCollection).gml:featureMember and typed features such as rcn:RCN_Lokal.RCN_Adres → miejscowosc).Data Cleaning & Preparation
Scalars vs nested block (address)
FeatureCollection
└─ featureMember
└─ RCN_Lokal
├─ idLokalu · rooms · floor · area · price …
└─ adresBudynkuZLokalem
└─ RCN_Adres → miejscowosc · ulica · numer …
Flat fields sit directly under RCN_Lokal; the address is a subtree (container → RCN_Adres → fields).
Data Cleaning & Preparation
RCN_Lokal (trimmed)Real excerpt — indentation follows depth
gml:id (stable id in the file).
uom="m2".
adresBudynkuZLokalem → RCN_Adres → fields).
Data Cleaning & Preparation
Prefixes (gml:, rcn:, …) point at standard vocabularies; you bind them once, then query with full names in code.
Data Cleaning & Preparation
Many standards · same local names · different meaning
GML, RCN, XSD, and others all define tags like id or name. A namespace ties each
prefix to one vocabulary so gml:id (geometry id) and rcn:id (cadastre id) never clash.
prefix:localName is a QName — the colon is only punctuation, not special syntax.
FeatureCollection, Polygon, …).RCN_Lokal, RCN_Transakcja, …).xsi:nil, schemaLocation, …Data Cleaning & Preparation
xmlns: on the rootEach line binds a prefix to a URI — a stable vocabulary id
gml / rcn / xsi — main geometry, RCN features, schema hints (see previous slide).xlink: — xlink:href points at another feature's gml:id.gmd / gco / gts — ISO metadata / time; often declared; fewer direct paths in exercises.
Declared once on FeatureCollection — like imports for the whole document.
Data Cleaning & Preparation
Identifiers — not necessarily pages to open in a browser
http://… and urn:… strings are globally unique vocabulary names. Parsing does not require downloading them.
ns = {
'gml': 'http://www.opengis.net/gml/3.2',
'rcn': 'urn:gugik:specyfikacje:gmlas:rejestrcennieruchomosci:1.0',
'xlink': 'http://www.w3.org/1999/xlink',
}
tree.findall('gml:featureMember/rcn:RCN_Lokal', ns)
Elements may appear as {http://www.opengis.net/gml/3.2}Polygon — Clark notation — in errors and repr.
Data Cleaning & Preparation
What the Poznań .gml represents in business terms: sales, properties, parcels, buildings, flats, and deeds — as linked records, not one nested story.
Data Cleaning & Preparation
Rejestr Cen Nieruchomości — price and transaction facts
The file is a long sequence of gml:featureMember blocks. Each block is one business object
(one transaction, one parcel, one apartment, …). The same real-world deal is split across several such objects,
connected by identifiers — not by putting everything inside one big XML subtree.
RCN_Transakcja, RCN_Lokal, …) and follow links to assemble a full case.Data Cleaning & Preparation
Polish tag names — plain-language role
Data Cleaning & Preparation
A graph of IDs — not one nested XML document per deal
Each feature has a gml:id. Elsewhere, xlink:href attributes store references to another feature's id
(deed, property bundle, parcel, …). So the business view is: many rows + foreign-key-style links, same idea as joins in a database.
Typical path to “full picture”: Transakcja → nieruchomosc href → RCN_Nieruchomosc → hrefs to Dzialka / Budynek / Lokal; podstawaPrawna → Dokument.
Data Cleaning & Preparation
Price on the row · rest via xlink:href
Wrapper + id · scalars (incl. gross price) · link to deed · link to property aggregate — then follow those ids elsewhere in the file.
Data Cleaning & Preparation
Open gml_xml_tasks.ipynb and work with
Baza_danych_RCN_Poznan_2021-2025.gml
Data Cleaning & Preparation