xml_miner package¶

Submodules¶

xml_miner.mine_trxml module¶

the trxml selector script

xml_miner.mine_trxml.get_args()¶: get arguments

xml_miner.mine_trxml.main()¶: apply selectors to trxml files

xml_miner.mine_xml module¶

the xml selector script

xml_miner.mine_xml.get_args()¶: get arguments

xml_miner.mine_xml.main()¶: apply selectors to xml files

xml_miner.miner module¶

apply selector on input data, and output it to a csv file

class xml_miner.miner.CommonMiner(selectors)¶

Bases: object

CommonMiner:

shared class for both xml and trxml

static normalize_string(line: str) → str¶

normalization selected values: - replace

with ‘__NEWLINE__’

replace with 4 space ‘ ‘

class xml_miner.miner.TRXMLMiner(selectors, itemgroup=None, fields=None)¶

Bases: xml_miner.miner.CommonMiner

TRXMLPorcessor: - iterate over the trxml files and select values - output selected values to a file, and print summary

load_data(source)¶

load the data into a data generator

params:

source: data source

output:

yeild trxml

mine(source)¶

iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file

params:: source: data source
output:: generate selected values per doc

mine_and_save(source: str, output_file: str)¶

iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file

params:: source (string): data source output_file (string): the output filename

read_selectors(selector: str, itemgroup: str = '', fields: str = '')¶

read selector strings and construct selector object

params:

selector: input selector strings
itemgroup: input itemgroup strings
fields: input fields strings

output:

selectors: TRXMLSelectors object

class xml_miner.miner.XMLMiner(selectors, with_field_name=False)¶

Bases: xml_miner.miner.CommonMiner

XMLPorcessor: - iterate over the xml files and select values - output selected values to a file, and print summary

load_data(source: str, query: str = None, as_user: str = None, as_pass: str = None)¶

load the data into a data generator

params:

source: data source
annotation server parameters: query, as_user, as_pass

output:

yeild xml

mine(source: str, query: str = None, as_user: str = None, as_pass: str = None)¶

iterate the input data (xml obj), apply selector on each xml, and yield the selected values

params:

source: data source
annotation server parameters: query, as_user, as_pass

output:

iterate over selected fields per doc

mine_and_save(source: str, output_file: str, query: str = None, as_user: str = None, as_pass: str = None)¶

iterate the selected values and save/print to ouput

params:

source: data source
output_file (string): the output filename
annotation server parameters: query, as_user, as_pass

output file format:

no field name: filename value
with field name: filename, value, field_name

read_selectors(selector: str)¶

read selector strings and construct selectors object

params:

selector: input selector strings

output:

selectors: XMLSelectors object

Module contents¶

Top-level package for xml-miner

xml_miner.define_logger(mod_name)¶: Set the default logging configuration

xml_miner.set_logging_level(level=30)¶: Change logging level

xml_miner package¶

Subpackages¶

Submodules¶

xml_miner.mine_trxml module¶

xml_miner.mine_xml module¶

xml_miner.miner module¶

Module contents¶

Related Topics