Parsing VOEvent XML packets¶
Getting started¶
In [1]:
from __future__ import print_function
import voeventparse as vp
IPython Tip #1: In IPython (terminal or notebook) you can quickly check the docstring for something by putting a question mark in front, e.g.
In [2]:
# Uncomment the following and hit enter:
# ?vp.load
Alternatively, you can always read the docs, which include autogenerated API specs.
Ok, let’s load up a voevent (click here to see the raw XML in your browser):
In [3]:
with open('voevent.xml','rb') as f:
v = vp.load(f)
IPython Tip #2: We also get tab-completion. Simply start typing the name of a function (or even just the ‘.’ operator) and hit tab to see valid possible options - this is handy for exploring VOEvent packets:
In [4]:
# Uncomment the following and hit tab:
# v.W
Accessing data¶
Text-values¶
XML Tip #1: An XML packet is a tree-structure made composed of elements. We can dig into the tree structure of the VOEvent, and inspect values:
In [5]:
# v.Who.Date.text
# vp.pull_isotime(v)
In [6]:
print("Inferred reason is", v.Why.Inference.Name.text)
print( "(A string of length {})".format(len(v.Why.Inference.Name.text)))
type(v.Why.Inference.Name.text)
Inferred reason is GRB121212A
(A string of length 10)
Out[6]:
str
Attributes¶
XML Tip #2: Note that there are two
ways to store data
in an XML packet: * A single string can be stored as an element’s
text-value - like the two we just saw. * Alternatively, we can attach a
number of key-value strings to an element, storing them as
attributes. We can
access these via attrib
, which behaves like a Python dictionary,
e.g.:
In [7]:
print(v.attrib['ivorn'])
print(v.attrib['role'])
ivo://example/exciting_events#123
test
In [8]:
v.Why.Inference.attrib
Out[8]:
{'relation': 'identified', 'probability': '0.1'}
In [9]:
print(vp.prettystr(v.Who))
<Who>
<AuthorIVORN>ivo://hotwired.org</AuthorIVORN>
<Date>1970-01-01T00:00:00</Date>
<Author>
<title>Hotwired VOEvent Tutorial</title>
</Author>
</Who>
‘Sibling’ elements and list-style access¶
So far, each of the elements we’ve accessed has been the only one of
that name - i.e. our VOEvent has only one Who
child-element,
likewise there’s only one Inference
under the Why
entry in this
particular packet. But that’s not always the case; for example the
What
section contains a Group
with two child-elements called
Param
:
In [10]:
print(vp.prettystr(v.What.Group))
<Group name="source_flux">
<Param dataType="float" name="peak_flux" ucd="em.radio.100-200MHz" unit="Janskys" value="0.0015">
<Description>Peak Flux</Description>
</Param>
<Param dataType="float" name="int_flux" ucd="em.radio.100-200MHz" unit="Janskys" value="2.0e-3">
<Description>Integrated Flux</Description>
</Param>
</Group>
So how do we access all of these? This is where we start getting into the details of lxml.objectify syntax (which voevent-parse uses under the hood). lxml.objectify uses a neat, but occasionally confusing, trick: when we access a child-element by name, what’s returned behaves like a list:
In [11]:
v.What[0] # v.What behaves like a list!
Out[11]:
<Element What at 0x7f37283b79c8>
However, to save having to type something like
v.foo[0].bar[0].baz[0]
, the first element of the list can also be
accessed without the [0]
operator (aka ‘syntactic
sugar’):
In [12]:
v.What is v.What[0]
Out[12]:
True
Knowing that it’s ‘just a list’, we have a couple of options, we can iterate:
In [13]:
for par in v.What.Group.Param:
print(par.Description)
Peak Flux
Integrated Flux
Or we can check the length, access elements by index, etc:
In [14]:
len(v.What.Group.Param)
Out[14]:
2
In [15]:
v.What.Group.Param[1].Description
Out[15]:
'Integrated Flux'
Note that another example of this ‘syntactic sugar’ is that we can
display the text-value of an element without adding the .text
suffix.
However, see below for why it’s a good idea to always use .text
when
you really do want the text-value of an element:
In [16]:
print(v.Why.Inference.Name) # More syntax sugar - if it has a string-value but no children, print the string
print(v.Why.Inference.Name.text) # The safe option
print(v.Why.Inference.Name.text[:3]) # Indexing on the string as you'd expect
print(v.Why.Inference.Name[:3]) # This is indexing on the *list of elements*, not the string!
GRB121212A
GRB121212A
GRB
['GRB121212A']
If that all sounds awfully messy, help is at hand: you’re most likely to
encounter sibling elements under the What
entry of a VOEvent, and
voevent-parse has a pair of functions to convert that to nested
dictionary-like structures for you:
In [17]:
# Consult the docstring
# ?vp.get_toplevel_params
# ?vp.get_grouped_params
In [18]:
grouped_params = vp.get_grouped_params(v)
# what_dict
list(grouped_params['source_flux'].items())
Out[18]:
[('peak_flux',
{'value': '0.0015', 'dataType': 'float', 'ucd': 'em.radio.100-200MHz', 'unit': 'Janskys', 'name': 'peak_flux'}),
('int_flux',
{'value': '2.0e-3', 'dataType': 'float', 'ucd': 'em.radio.100-200MHz', 'unit': 'Janskys', 'name': 'int_flux'})]
In [19]:
grouped_params['source_flux']['peak_flux']['value']
Out[19]:
'0.0015'
Advanced usage¶
Since voevent-parse uses lxml.objectify, the full power of the LXML library is available when handling VOEvents loaded with voevent-parse.
Iterating over child-elements¶
We already saw how you can access a group of child-elements by name, in list-like fashion. But you can also iterate over all the children of an element, even if you don’t know the names (‘tags’, in XML-speak) ahead of time:
In [20]:
for child in v.Who.iterchildren():
print(child.tag, child.text, child.attrib)
AuthorIVORN ivo://hotwired.org {}
Date 1970-01-01T00:00:00 {}
Author None {}
In [21]:
for child in v.WhereWhen.ObsDataLocation.ObservationLocation.iterchildren():
print(child.tag, child.text, child.attrib)
AstroCoordSystem None {'id': 'UTC-FK5-GEO'}
AstroCoords None {'coord_system_id': 'UTC-FK5-GEO'}
Querying a VOEvent¶
Another powerful technique is to find elements using Xpath or ElementPath queries, but this is beyond the scope of this tutorial: we leave you with just a single example:
In [22]:
v.find(".//Param[@name='int_flux']").attrib['value']
Out[22]:
'2.0e-3'
Final words¶
Congratulations! You should now be able to extract data from just about any VOEvent packet. Note that voevent-parse comes with a few convenience routines to help with common, tedious operations, but you can always compose your own.
If you put together something that you think others could use (or find a bug!), pull requests are welcome.
Next stop: authoring your own VOEvent.