xml_miner.data_utils package¶
Submodules¶
xml_miner.data_utils.asclient module¶
A module to communicate the TK annotation server
-
class
xml_miner.data_utils.asclient.ASClient(host: str, port: str, as_user: str = '', as_pass: str = '')¶ Bases:
objectPython version of annotation server client
-
BUFFER_SIZE= 2048¶
-
ENCODING= 'utf-8'¶
-
check_user_password()¶ send server the username and password and confirm the loggin
-
close_socket()¶ shutdown the connect
-
get_docs(query='') → Iterator[str]¶ get all queried documents
-
get_ids(query: str = '') → List[str]¶ get all document ids of all the queried documents
-
make_connection() → bool¶ Build connect to the Annotation Server
-
prepare_message(message: str) → str¶ prepare the query message: - add new line - encode with utf8
-
send_and_receive(query: str) → str¶ send the query to AS and decode the received response
-
socket_output() → str¶ Receive response from Annotation Server and decode
-
xml_miner.data_utils.data_loader module¶
A module to load input xml/trxml from different source
-
class
xml_miner.data_utils.data_loader.DataLoader(data_generator=None)¶ Bases:
objectDataLoader: - load data from different resources - generate xml input for downstream tasks
-
TRXML_HEADER= '<TextractorResult '¶
-
XML_HEADER= '<begin '¶
-
classmethod
load_from_as(host, port, query, as_user='', as_pass='')¶ create the document loader object from AnnotationServer
- params:
- host (string): hostname of the annotationserver port (int): port of the AnnotationServer query (string): query to select documents as_user: AnnotationServer username as_pass: AnnotationServer password
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_dir(input_dir)¶ create the document loader object from dir
- params:
- input_dir (string): director contains xml files
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_mtrxml(input_mxml)¶ create the document loader object from mxml
- params:
- mxml (string): a mxml files
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_mxml(input_mxml)¶ create the document loader object from mxml
- params:
- mxml (string): a mxml files
- output:
- DataLoader object: a iterator object to generate xml string
-
-
xml_miner.data_utils.data_loader.load_from_file(input_dir, files)¶ load document from a list files read from a directory
- params:
- input_dir (string): directory contains all files files (list): a list files
- output:
- xml string: a iterator object to generate xml string
-
xml_miner.data_utils.data_loader.load_from_string(xml_string, header_line)¶ load document string, the string might contain multiple xml files
- params:
- xml_string (string): input xml strings
- output:
- xml string: a iterator object to generate xml string
xml_miner.data_utils.data_saver module¶
A Module to output the selected values to the chosen format
Module contents¶
DataLoader and DataSaver classes
-
class
xml_miner.data_utils.DataLoader(data_generator=None)¶ Bases:
objectDataLoader: - load data from different resources - generate xml input for downstream tasks
-
TRXML_HEADER= '<TextractorResult '¶
-
XML_HEADER= '<begin '¶
-
classmethod
load_from_as(host, port, query, as_user='', as_pass='')¶ create the document loader object from AnnotationServer
- params:
- host (string): hostname of the annotationserver port (int): port of the AnnotationServer query (string): query to select documents as_user: AnnotationServer username as_pass: AnnotationServer password
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_dir(input_dir)¶ create the document loader object from dir
- params:
- input_dir (string): director contains xml files
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_mtrxml(input_mxml)¶ create the document loader object from mxml
- params:
- mxml (string): a mxml files
- output:
- DataLoader object: a iterator object to generate xml string
-
classmethod
load_from_mxml(input_mxml)¶ create the document loader object from mxml
- params:
- mxml (string): a mxml files
- output:
- DataLoader object: a iterator object to generate xml string
-