xml_miner package

Submodules

xml_miner.mine_trxml module

the trxml selector script

xml_miner.mine_trxml.get_args()

get arguments

xml_miner.mine_trxml.main()

apply selectors to trxml files

xml_miner.mine_xml module

the xml selector script

xml_miner.mine_xml.get_args()

get arguments

xml_miner.mine_xml.main()

apply selectors to xml files

xml_miner.miner module

apply selector on input data, and output it to a csv file

class xml_miner.miner.CommonMiner(selectors)

Bases: object

CommonMiner:

shared class for both xml and trxml

static normalize_string(line: str) → str
normalization selected values: - replace
with ‘__NEWLINE__’
  • replace with 4 space ‘ ‘
class xml_miner.miner.TRXMLMiner(selectors, itemgroup=None, fields=None)

Bases: xml_miner.miner.CommonMiner

TRXMLPorcessor: - iterate over the trxml files and select values - output selected values to a file, and print summary

load_data(source)

load the data into a data generator

params:
  • source: data source
output:
  • yeild trxml
mine(source)

iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file

params:
source: data source
output:
generate selected values per doc
mine_and_save(source: str, output_file: str)

iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file

params:
source (string): data source output_file (string): the output filename
read_selectors(selector: str, itemgroup: str = '', fields: str = '')

read selector strings and construct selector object

params:
  • selector: input selector strings
  • itemgroup: input itemgroup strings
  • fields: input fields strings
output:
  • selectors: TRXMLSelectors object
class xml_miner.miner.XMLMiner(selectors, with_field_name=False)

Bases: xml_miner.miner.CommonMiner

XMLPorcessor: - iterate over the xml files and select values - output selected values to a file, and print summary

load_data(source: str, query: str = None, as_user: str = None, as_pass: str = None)

load the data into a data generator

params:
  • source: data source
  • annotation server parameters: query, as_user, as_pass
output:
  • yeild xml
mine(source: str, query: str = None, as_user: str = None, as_pass: str = None)

iterate the input data (xml obj), apply selector on each xml, and yield the selected values

params:
  • source: data source
  • annotation server parameters: query, as_user, as_pass
output:
  • iterate over selected fields per doc
mine_and_save(source: str, output_file: str, query: str = None, as_user: str = None, as_pass: str = None)

iterate the selected values and save/print to ouput

params:
  • source: data source
  • output_file (string): the output filename
  • annotation server parameters: query, as_user, as_pass
output file format:
  • no field name: filename value
  • with field name: filename, value, field_name
read_selectors(selector: str)

read selector strings and construct selectors object

params:
  • selector: input selector strings
output:
  • selectors: XMLSelectors object

Module contents

Top-level package for xml-miner

xml_miner.define_logger(mod_name)

Set the default logging configuration

xml_miner.set_logging_level(level=30)

Change logging level