xml_miner package¶
Submodules¶
xml_miner.mine_trxml module¶
the trxml selector script
-
xml_miner.mine_trxml.
get_args
()¶ get arguments
-
xml_miner.mine_trxml.
main
()¶ apply selectors to trxml files
xml_miner.mine_xml module¶
the xml selector script
-
xml_miner.mine_xml.
get_args
()¶ get arguments
-
xml_miner.mine_xml.
main
()¶ apply selectors to xml files
xml_miner.miner module¶
apply selector on input data, and output it to a csv file
-
class
xml_miner.miner.
CommonMiner
(selectors)¶ Bases:
object
CommonMiner:
shared class for both xml and trxml
-
static
normalize_string
(line: str) → str¶ - normalization selected values: - replace
- with ‘__NEWLINE__’
- replace with 4 space ‘ ‘
-
static
-
class
xml_miner.miner.
TRXMLMiner
(selectors, itemgroup=None, fields=None)¶ Bases:
xml_miner.miner.CommonMiner
TRXMLPorcessor: - iterate over the trxml files and select values - output selected values to a file, and print summary
-
load_data
(source)¶ load the data into a data generator
- params:
- source: data source
- output:
- yeild trxml
-
mine
(source)¶ iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file
- params:
- source: data source
- output:
- generate selected values per doc
-
mine_and_save
(source: str, output_file: str)¶ iterate the input data (trxml obj), apply selector on each trxml, and output the selected values to a csv file
- params:
- source (string): data source output_file (string): the output filename
-
read_selectors
(selector: str, itemgroup: str = '', fields: str = '')¶ read selector strings and construct selector object
- params:
- selector: input selector strings
- itemgroup: input itemgroup strings
- fields: input fields strings
- output:
- selectors: TRXMLSelectors object
-
-
class
xml_miner.miner.
XMLMiner
(selectors, with_field_name=False)¶ Bases:
xml_miner.miner.CommonMiner
XMLPorcessor: - iterate over the xml files and select values - output selected values to a file, and print summary
-
load_data
(source: str, query: str = None, as_user: str = None, as_pass: str = None)¶ load the data into a data generator
- params:
- source: data source
- annotation server parameters: query, as_user, as_pass
- output:
- yeild xml
-
mine
(source: str, query: str = None, as_user: str = None, as_pass: str = None)¶ iterate the input data (xml obj), apply selector on each xml, and yield the selected values
- params:
- source: data source
- annotation server parameters: query, as_user, as_pass
- output:
- iterate over selected fields per doc
-
mine_and_save
(source: str, output_file: str, query: str = None, as_user: str = None, as_pass: str = None)¶ iterate the selected values and save/print to ouput
- params:
- source: data source
- output_file (string): the output filename
- annotation server parameters: query, as_user, as_pass
- output file format:
- no field name: filename value
- with field name: filename, value, field_name
-
read_selectors
(selector: str)¶ read selector strings and construct selectors object
- params:
- selector: input selector strings
- output:
- selectors: XMLSelectors object
-