xml_miner.selectors package

Submodules

xml_miner.selectors.selector_utils module

utils and constants functions used by the selector and selectors class

xml_miner.selectors.selector_utils.selector_attribute(selectors, attribute_name) → str

fetch the selector attribute, and check the consistency of all selectors

params: - selectors: a list of selector object - attribute_name: name of the attribute

output: attibute_value: string

xml_miner.selectors.selector_utils.valid_field_name(tag_name: str = '') → bool

simple validation function:

params: - tag_name: string

output: - True/False

xml_miner.selectors.trxml_selector module

selector class for trxml

class xml_miner.selectors.trxml_selector.TRXMLSelector(selector: str)

Bases: xml_miner.selectors.xml_selector.XMLSelector

trxml selector:

  • subclass of XMLSelector
  • method to select values from trxml
field_value_from_item(item) → str

given an item and a field_name, get the value of that field

parse_trxml_selector()

converting the trxml selector to (itemgroup, index, field):

params:

  • selector: string

output:

  • itemgroup, index, field

conversion rules:

- ig.index.field    ->    (ig, index, field)
- ig.*.field        ->    (ig, *, field)
- ig.field          ->    (ig, *, field)
select_field_with_xpath(xml_tree)

select the field using the selector xpath

select_value_with_xpath(xml_tree) → str

get the value of the field where the selector matches

xml_miner.selectors.trxml_selectors module

TRXML Selectors class

class xml_miner.selectors.trxml_selectors.TRXMLSelectors(selectors: List[str], trxml_selector_type=None, shared_itemgroup_name=None)

Bases: object

TRXMLSelectors: - array of TRXMLSelector class - method to select values on trxml doc level or from each items

classmethod from_itemgroup_and_fields(itemgroup: str, fields: str)

construct from itemgroup and fields, only for trxml

input:
  • ItemGroup, e.g. experienceitem
  • Fields, e.g. jobtitle,startdate,enddate
classmethod from_selector_string(selector_string: str)

construct the selectors from string

input:
  • selector string
select_trxml_fields(trxml)

select values from all fields matching selectors

xml_miner.selectors.xml_selector module

XML selector

class xml_miner.selectors.xml_selector.XMLSelector(selector: str)

Bases: object

XMLSelector: - select all values of nodes matches selector

select_all_fields(xml_tree)

select all fields match selectors

select_all_values(xml_tree) → List[str]

select all values match selectors

xml_miner.selectors.xml_selectors module

XML Selectors class

class xml_miner.selectors.xml_selectors.XMLSelectors(selectors: List[str])

Bases: object

XMLSelectors: - array of XMLSelector class - method to select values from xml object

classmethod from_selector_string(selector_string: str)

construct xml selector from input string

select_xml_fields(xml_tree)

select all values matches the selector

Module contents

xml selectors and trxml selectors classes

class xml_miner.selectors.XMLSelectors(selectors: List[str])

Bases: object

XMLSelectors: - array of XMLSelector class - method to select values from xml object

classmethod from_selector_string(selector_string: str)

construct xml selector from input string

select_xml_fields(xml_tree)

select all values matches the selector

class xml_miner.selectors.TRXMLSelectors(selectors: List[str], trxml_selector_type=None, shared_itemgroup_name=None)

Bases: object

TRXMLSelectors: - array of TRXMLSelector class - method to select values on trxml doc level or from each items

classmethod from_itemgroup_and_fields(itemgroup: str, fields: str)

construct from itemgroup and fields, only for trxml

input:
  • ItemGroup, e.g. experienceitem
  • Fields, e.g. jobtitle,startdate,enddate
classmethod from_selector_string(selector_string: str)

construct the selectors from string

input:
  • selector string
select_trxml_fields(trxml)

select values from all fields matching selectors