This chapter contains detailed API documentation for HappyBase. It is suggested to read the user guide first to get a general idea about how HappyBase works.
The HappyBase API is organised as follows:
Connection to an HBase Thrift server.
The host and port parameters specify the host name and TCP port of the HBase Thrift server to connect to. If omitted or None, a connection to the default port on localhost is made. If specifed, the timeout parameter specifies the socket timeout in milliseconds.
If autoconnect is True (the default) the connection is made directly, otherwise Connection.open() must be called explicitly before first use.
The optional table_prefix and table_prefix_separator arguments specify a prefix and a separator string to be prepended to all table names, e.g. when Connection.table() is invoked. For example, if table_prefix is myproject, all tables tables will have names like myproject_XYZ.
The optional compat parameter sets the compatibility level for this connection. Older HBase versions have slightly different Thrift interfaces, and using the wrong protocol can lead to crashes caused by communication errors, so make sure to use the correct one. This value can be either the string 0.92 (the default) for use with HBase 0.92.x and later versions, or 0.90 for use with HBase 0.90.x.
The optional transport parameter specifies the Thrift transport mode to use. Supported values for this parameter are buffered (the default) and framed. Make sure to choose the right one, since otherwise you might see non-obvious connection errors or program hangs when making a connection. HBase versions before 0.94 always use the buffered transport. Starting with HBase 0.94, the Thrift server optionally uses a framed transport, depending on the parameter passed to the hbase-daemon.sh start thrift command. The default -threadpool mode uses the buffered transport; the -hsha, -nonblocking, and -threadedselector modes use the framed transport.
New in version 0.5: timeout parameter
New in version 0.4: table_prefix_separator parameter
New in version 0.4: support for framed Thrift transports
Parameters: |
|
---|
Open the underlying transport to the HBase instance.
This method opens the underlying Thrift transport (TCP connection).
Close the underyling transport to the HBase instance.
This method closes the underlying Thrift transport (TCP connection).
Return a table object.
Returns a happybase.Table instance for the table named name. This does not result in a round-trip to the server, and the table is not checked for existence.
The optional use_prefix parameter specifies whether the table prefix (if any) is prepended to the specified name. Set this to False if you want to use a table that resides in another ‘prefix namespace’, e.g. a table from a ‘friendly’ application co-hosted on the same HBase instance. See the table_prefix parameter to the Connection constructor for more information.
Parameters: |
|
---|---|
Returns: | Table instance |
Return type: |
Return a list of table names available in this HBase instance.
If a table_prefix was set for this Connection, only tables that have the specified prefix will be listed.
Returns: | The table names |
---|---|
Return type: | List of strings |
Create a table.
Parameters: |
|
---|
The families parameter is a dictionary mapping column family names to a dictionary containing the options for this column family, e.g.
families = {
'cf1': dict(max_versions=10),
'cf2': dict(max_versions=1, block_cache_enabled=False),
'cf3': dict(), # use defaults
}
connection.create_table('mytable', families)
These options correspond to the ColumnDescriptor structure in the Thrift API, but note that the names should be provided in Python style, not in camel case notation, e.g. time_to_live, not timeToLive. The following options are supported:
Delete the specified table.
New in version 0.5: the disable parameter
In HBase, a table always needs to be disabled before it can be deleted. If the disable parameter is True, this method first disables the table if it wasn’t already and then deletes it.
Parameters: |
|
---|
Enable the specified table.
Parameters: | name (str) – The table name |
---|
Disable the specified table.
Parameters: | name (str) – The table name |
---|
Return whether the specified table is enabled.
Parameters: | name (str) – The table name |
---|---|
Returns: | whether the table is enabled |
Return type: | bool |
Compact the specified table.
Parameters: |
|
---|
HBase table abstraction class.
This class cannot be instantiated directly; use Connection.table() instead.
Retrieve the column families for this table.
Returns: | Mapping from column family name to settings dict |
---|---|
Return type: | dict |
Retrieve the regions for this table.
Returns: | regions for this table |
---|---|
Return type: | list of dicts |
Retrieve a single row of data.
This method retrieves the row with the row key specified in the row argument and returns the columns and values for this row as a dictionary.
The row argument is the row key of the row. If the columns argument is specified, only the values for these columns will be returned instead of all available columns. The columns argument should be a list or tuple containing strings. Each name can be a column family, such as cf1 or cf1: (the trailing colon is not required), or a column family with a qualifier, such as cf1:col1.
If specified, the timestamp argument specifies the maximum version that results may have. The include_timestamp argument specifies whether cells are returned as single values or as (value, timestamp) tuples.
Parameters: |
|
---|---|
Returns: | Mapping of columns (both qualifier and family) to values |
Return type: | dict |
Retrieve multiple rows of data.
This method retrieves the rows with the row keys specified in the rows argument, which should be should be a list (or tuple) of row keys. The return value is a list of (row_key, row_dict) tuples.
The columns, timestamp and include_timestamp arguments behave exactly the same as for row().
Parameters: |
|
---|---|
Returns: | List of mappings (columns to values) |
Return type: | list of dicts |
Retrieve multiple versions of a single cell from the table.
This method retrieves multiple versions of a cell (if any).
The versions argument defines how many cell versions to retrieve at most.
The timestamp and include_timestamp arguments behave exactly the same as for row().
Parameters: |
|
---|---|
Returns: | cell values |
Return type: | list of values |
Create a scanner for data in the table.
This method returns an iterable that can be used for looping over the matching rows. Scanners can be created in two ways:
The row_start and row_stop arguments specify the row keys where the scanner should start and stop. It does not matter whether the table contains any rows with the specified keys: the first row after row_start will be the first result, and the last row before row_stop will be the last result. Note that the start of the range is inclusive, while the end is exclusive.
Both row_start and row_stop can be None to specify the start and the end of the table respectively. If both are omitted, a full table scan is done. Note that this usually results in severe performance problems.
Alternatively, if row_prefix is specified, only rows with row keys matching the prefix will be returned. If given, row_start and row_stop cannot be used.
The columns, timestamp and include_timestamp arguments behave exactly the same as for row().
The filter argument may be a filter string that will be applied at the server by the region servers.
If limit is given, at most limit results will be returned.
The batch_size argument specifies how many results should be retrieved per batch when retrieving results from the scanner. Only set this to a low value (or even 1) if your data is large, since a low batch size results in added round-trips to the server.
Compatibility note: The filter argument is only available when using HBase 0.92 (or up). In HBase 0.90 compatibility mode, specifying a filter raises an exception.
Parameters: |
|
---|---|
Returns: | generator yielding the rows matching the scan |
Return type: | iterable of (row_key, row_data) tuples |
Store data in the table.
This method stores the data in the data argument for the row specified by row. The data argument is dictionary that maps columns to values. Column names must include a family and qualifier part, e.g. cf:col, though the qualifier part may be the empty string, e.g. cf:.
Note that, in many situations, batch() is a more appropriate method to manipulate data.
New in version 0.7: wal parameter
Parameters: |
|
---|
Delete data from the table.
This method deletes all columns for the row specified by row, or only some columns if the columns argument is specified.
Note that, in many situations, batch() is a more appropriate method to manipulate data.
New in version 0.7: wal parameter
Parameters: |
|
---|
Create a new batch operation for this table.
This method returns a new Batch instance that can be used for mass data manipulation. The timestamp argument applies to all puts and deletes on the batch.
If given, the batch_size argument specifies the maximum batch size after which the batch should send the mutations to the server. By default this is unbounded.
The transaction argument specifies whether the returned Batch instance should act in a transaction-like manner when used as context manager in a with block of code. The transaction flag cannot be used in combination with batch_size.
The wal argument determines whether mutations should be written to the HBase Write Ahead Log (WAL). This flag can only be used with recent HBase versions. If specified, it provides a default for all the put and delete operations on this batch. This default value can be overridden for individual operations using the wal argument to Batch.put() and Batch.delete().
New in version 0.7: wal parameter
Parameters: |
|
---|---|
Returns: | Batch instance |
Return type: |
Retrieve the current value of a counter column.
This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to 0.
Note that application code should never store a incremented or decremented counter value directly; use the atomic Table.counter_inc() and Table.counter_dec() methods for that.
Parameters: |
|
---|---|
Returns: | counter value |
Return type: | int |
Set a counter column to a specific value.
This method stores a 64-bit signed integer value in the specified column.
Note that application code should never store a incremented or decremented counter value directly; use the atomic Table.counter_inc() and Table.counter_dec() methods for that.
Parameters: |
|
---|
Atomically increment (or decrements) a counter column.
This method atomically increments or decrements a counter column in the row specified by row. The value argument specifies how much the counter should be incremented (for positive values) or decremented (for negative values). If the counter column did not exist, it is automatically initialised to 0 before incrementing it.
Parameters: |
|
---|---|
Returns: | counter value after incrementing |
Return type: | int |
Atomically decrement (or increments) a counter column.
This method is a shortcut for calling Table.counter_inc() with the value negated.
Returns: | counter value after decrementing |
---|---|
Return type: | int |
Batch mutation class.
This class cannot be instantiated directly; use Table.batch() instead.
Send the batch to the server.
Store data in the table.
See Table.put() for a description of the row, data, and wal arguments. The wal argument should normally not be used; its only use is to override the batch-wide value passed to Table.batch().
Delete data from the table.
See Table.put() for a description of the row, data, and wal arguments. The wal argument should normally not be used; its only use is to override the batch-wide value passed to Table.batch().
Thread-safe connection pool.
New in version 0.5.
The size parameter specifies how many connections this pool manages. Additional keyword arguments are passed unmodified to the happybase.Connection constructor, with the exception of the autoconnect argument, since maintaining connections is the task of the pool.
Parameters: |
|
---|
Obtain a connection from the pool.
This method must be used as a context manager, i.e. with Python’s with block. Example:
with pool.connection() as connection:
pass # do something with the connection
If timeout is specified, this is the number of seconds to wait for a connection to become available before NoConnectionsAvailable is raised. If omitted, this method waits forever for a connection to become available.
Parameters: | timeout (int) – number of seconds to wait (optional) |
---|---|
Returns: | active connection from the pool |
Return type: | happybase.Connection |
Exception raised when no connections are available.
This happens if a timeout was specified when obtaining a connection, and no connection became available within the specified timeout.
New in version 0.5.