HBase Data Types

21
HBase Data Types Nick Dimiduk, Hortonworks @xefyr n10k.com

description

Describes the motivations behind the new DataType interface, considerations when adding types to a schemaless database, and example usage.

Transcript of HBase Data Types

Page 1: HBase Data Types

HBase Data Types Nick Dimiduk, Hortonworks @xefyr n10k.com

Page 2: HBase Data Types

Agenda

•  Motivations •  Progress thus far •  Future work •  Examples •  More Examples

2014-­‐11-­‐18   2  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 3: HBase Data Types

Why introduce types?

•  Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ •  Rule of least surprise •  Interoperability across tools •  Distill best practices

2014-­‐11-­‐18   3  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 4: HBase Data Types

Considerations

•  Opt-in for current users •  Easy transition for existing applications •  Client-side only mostly –  Filters, Split policies, Coprocessors, Block encoding

•  Avoid POJO constraints –  No required base-class/interface –  No magic (avoid ASM, ORM)

•  Non-Java clients •  HBASE-8089

2014-­‐11-­‐18   4  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 5: HBase Data Types

2014-­‐11-­‐18   5  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 6: HBase Data Types

Inspiration

•  Orderly •  PostgreSQL / PostGIS

•  HBASE-7221 •  HBASE-7692

2014-­‐11-­‐18   6  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 7: HBase Data Types

Features: Encoding

•  Order preservation •  Override direction (ASC/DSC) •  Fixed, variable-width •  Null-able •  Self-identifying •  Efficient

2014-­‐11-­‐18   7  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 8: HBase Data Types

Features: API

•  Complex type encoding – Compound rowkey pattern – Order preservation – Nullable fields

•  Runtime metadata •  User-extensible

2014-­‐11-­‐18   8  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 9: HBase Data Types

Implementation$HBASE-8089

Page 10: HBase Data Types

Implementation: Encoding

2014-­‐11-­‐18   10  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

o.a.h.h.util.Bytes

•  numeric •  boolean •  int16, int32, int64 •  float32, float64 •  variable-length text

o.a.h.h.util.OrderedBytes

•  null •  numeric, +/-Inf, NaN •  int8, int16, int32, int64 •  float32, float64 •  variable-length text •  variable-length blob

Page 11: HBase Data Types

Implementation: API

2014-­‐11-­‐18   11  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

interface DataType<T>

•  decode() •  encode() •  encodedClass() •  encodedLength() •  getOrder() •  isNullable() •  isOrderPreserving() •  isSkippable() •  skip()

implements DataType

•  OrderedXXX •  RawXXX •  Struct –  StructBuilder –  StructIterator –  TerminatedWrapper –  FixedLengthWrapper

•  Union{2,3,4}

Page 12: HBase Data Types

Up Next

•  “Default” types •  More complex types

–  Arrays/Lists –  Maps/Dicts

•  Tool integration –  Apache Phoenix –  Cloudera Kite

•  Performance audit, HBASE-8694 •  Improved metadata,

HBASE-8863 –  isCastableTo –  isCoercableTo –  isComparableTo

•  TypedTable, HBASE-7941 •  Beyond Java, HBASE-10091

–  REST –  Thrift –  Shell

•  ImportTsv, HBASE-8593 •  User documentation •  Coprocessors? •  Filters? •  CAS? •  DataBlockEncoders?

2014-­‐11-­‐18   12  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 13: HBase Data Types

Examples

Page 14: HBase Data Types

A case for TypedTable

Put p = new Put(Bytes.toBytes(u.user));

p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user));

p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name));

p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email));

p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password));

2014-­‐11-­‐18   14  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 15: HBase Data Types

A case for TypedTable!

static final RawString ENC_STR = new RawString();!

static final RawLong ENC_LONG = new RawLong();!

--!

!

SimplePositionedByteRange pbr =!

new SimplePositionedByteRange(100);!

ENC_STR.encode(pbr, u.user);!

Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), pbr.getPosition()));!

p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);!

pbr.setPosition(0);!

ENC_STR.encode(pbr, u.name);!

p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);!

...!

2014-­‐11-­‐18   15  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 16: HBase Data Types

Structs: writing

!

!

!

Struct struct = new StructBuilder()!

.add(OrderedNumeric.ASCENDING)!

.add(OrderedString.ASCENDING)!

.toStruct();!

PositionedByteRange buf1 =!

new SimplePositionedByteRange(7);!

struct.encode(buf1,!

new Object[] { BigDecimal.ONE, "foo" });!

!

2014-­‐11-­‐18   16  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 17: HBase Data Types

Structs: reading

!

!

!

!

buf1.setPosition(0);!

StructIterator it = longer.iterator(buf1);!

while (it.hasNext()) {!

System.out.print(it.next() + ", ");!

}!

!

> BigDecimal.ONE, foo!

2014-­‐11-­‐18   17  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 18: HBase Data Types

Structs: schema migration

Struct addedFields = new StructBuilder()!

.add(OrderedNumeric.ASCENDING)!

.add(OrderedString.ASCENDING)!

.add(OrderedString.ASCENDING)!

.add(OrderedNumeric.ASCENDING)!

.toStruct();!

!

buf1.setPosition(0);!

StructIterator it = longer.iterator(buf1);!

while (it.hasNext()) {!

System.out.print(it.next() + ", ");!

}!

> BigDecimal.ONE, foo, null, null!

!2014-­‐11-­‐18   18  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 19: HBase Data Types

Protobuf (HBASE-11161)

!

class PBKeyValue extends PBType<CellProtos.KeyValue> {!

!

@Override!

public int encode(PositionedByteRange dst, KeyValue val) {!

CodedOutputStream os = outputStreamFromByteRange(dst);!

int before = os.spaceLeft(), after, written;!

val.writeTo(os);!

after = os.spaceLeft();!

written = before - after;!

dst.setPosition(dst.getPosition() + written);!

return written;!

}!

2014-­‐11-­‐18   19  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.  

Page 20: HBase Data Types

More Examples$https://gist.github.com/ndimiduk/bcf33f09cc7e4408f684

Page 21: HBase Data Types

Thanks!

M A N N I N G

Nick Dimiduk Amandeep Khurana

FOREWORD BY Michael Stack

hbaseinaction.com

Nick Dimiduk github.com/ndimiduk

@xefyr

n10k.com

http://s.apache.org/bGN

2014-­‐11-­‐18   21  Licensed  under  a  Crea3ve  Commons  A8ribu3on-­‐ShareAlike  3.0  Unported  License.