xml_miner.selectors package¶
Submodules¶
xml_miner.selectors.selector_utils module¶
utils and constants functions used by the selector and selectors class
-
xml_miner.selectors.selector_utils.
selector_attribute
(selectors, attribute_name) → str¶ fetch the selector attribute, and check the consistency of all selectors
params: - selectors: a list of selector object - attribute_name: name of the attribute
output: attibute_value: string
-
xml_miner.selectors.selector_utils.
valid_field_name
(tag_name: str = '') → bool¶ simple validation function:
params: - tag_name: string
output: - True/False
xml_miner.selectors.trxml_selector module¶
selector class for trxml
-
class
xml_miner.selectors.trxml_selector.
TRXMLSelector
(selector: str)¶ Bases:
xml_miner.selectors.xml_selector.XMLSelector
trxml selector:
- subclass of XMLSelector
- method to select values from trxml
-
field_value_from_item
(item) → str¶ given an item and a field_name, get the value of that field
-
parse_trxml_selector
()¶ converting the trxml selector to (itemgroup, index, field):
params:
- selector: string
output:
- itemgroup, index, field
conversion rules:
- ig.index.field -> (ig, index, field) - ig.*.field -> (ig, *, field) - ig.field -> (ig, *, field)
-
select_field_with_xpath
(xml_tree)¶ select the field using the selector xpath
-
select_value_with_xpath
(xml_tree) → str¶ get the value of the field where the selector matches
xml_miner.selectors.trxml_selectors module¶
TRXML Selectors class
-
class
xml_miner.selectors.trxml_selectors.
TRXMLSelectors
(selectors: List[str], trxml_selector_type=None, shared_itemgroup_name=None)¶ Bases:
object
TRXMLSelectors: - array of TRXMLSelector class - method to select values on trxml doc level or from each items
-
classmethod
from_itemgroup_and_fields
(itemgroup: str, fields: str)¶ construct from itemgroup and fields, only for trxml
- input:
- ItemGroup, e.g. experienceitem
- Fields, e.g. jobtitle,startdate,enddate
-
classmethod
from_selector_string
(selector_string: str)¶ construct the selectors from string
- input:
- selector string
-
select_trxml_fields
(trxml)¶ select values from all fields matching selectors
-
classmethod
xml_miner.selectors.xml_selector module¶
XML selector
xml_miner.selectors.xml_selectors module¶
XML Selectors class
-
class
xml_miner.selectors.xml_selectors.
XMLSelectors
(selectors: List[str])¶ Bases:
object
XMLSelectors: - array of XMLSelector class - method to select values from xml object
-
classmethod
from_selector_string
(selector_string: str)¶ construct xml selector from input string
-
select_xml_fields
(xml_tree)¶ select all values matches the selector
-
classmethod
Module contents¶
xml selectors and trxml selectors classes
-
class
xml_miner.selectors.
XMLSelectors
(selectors: List[str])¶ Bases:
object
XMLSelectors: - array of XMLSelector class - method to select values from xml object
-
classmethod
from_selector_string
(selector_string: str)¶ construct xml selector from input string
-
select_xml_fields
(xml_tree)¶ select all values matches the selector
-
classmethod
-
class
xml_miner.selectors.
TRXMLSelectors
(selectors: List[str], trxml_selector_type=None, shared_itemgroup_name=None)¶ Bases:
object
TRXMLSelectors: - array of TRXMLSelector class - method to select values on trxml doc level or from each items
-
classmethod
from_itemgroup_and_fields
(itemgroup: str, fields: str)¶ construct from itemgroup and fields, only for trxml
- input:
- ItemGroup, e.g. experienceitem
- Fields, e.g. jobtitle,startdate,enddate
-
classmethod
from_selector_string
(selector_string: str)¶ construct the selectors from string
- input:
- selector string
-
select_trxml_fields
(trxml)¶ select values from all fields matching selectors
-
classmethod