Functions

Some commonly used SQL Functions.

Custom Function


source

CustomFunction.over

 CustomFunction.over (partition_by)

source

CustomFunction

 CustomFunction (func_name:str, arg_names:list, window_func=False,
                 distinct_option=False)

A convenient class for creating custom functions.

Type Default Details
func_name str name of the function
arg_names list list of arg names
window_func bool False whether this can be used as a window function
distinct_option bool False whether this function can be used with the distinct option
Returns None

CustomFunction is a convenient class to create a custom SQL function with the name, and positional arguments, if you don’t need the function to be parsed differently for different dialects. If you want to implement a custom function that is parsed differently for different dialects, you can use the custom_function decorator defined in terms.

date_diff = CustomFunction('DATE_DIFF', ['interval', 'start_date', 'end_date'])
test_eq(date_diff('day', 'start_date', 'end_date').get_sql(), 'DATE_DIFF(day, start_date, end_date)')
date_diff = CustomFunction('DATE_DIFF', ['interval', 'start_date', 'end_date'])
test_eq(date_diff('month', Field('date1'), Field('date2')).get_sql(), 'DATE_DIFF(month, date1, date2)')

Commonly Used Functions


MAX

 MAX (field)

MIN

 MIN (field)

SUM

 SUM (field)

AVG

 AVG (field)

COUNT

 COUNT (field)

ABS

 ABS (field)

ROUND

 ROUND (field, decimals)

FIRST

 FIRST (field)

LAST

 LAST (field)

DateTrunc

 DateTrunc (interval, date, dialect='sql')

DateDiff

 DateDiff (interval, start_date, end_date, dialect='sql')

AddMonths

 AddMonths (date, months, dialect='sql')

Date

 Date (expression, format=None, dialect='sql')

source

convert_date_format

 convert_date_format (format, dialect='sql')

Concat

 Concat (*args)

Coalesce

 Coalesce (*args)

Cast

 Cast (field, type)

MONTHS_BETWEEN

 MONTHS_BETWEEN (start_date, end_date)

ROW_NUMBER

 ROW_NUMBER ()

RANK

 RANK ()

DENSE_RANK

 DENSE_RANK ()

PERCENT_RANK

 PERCENT_RANK ()

CUME_DIST

 CUME_DIST ()

NTILE

 NTILE (num_buckets)

LAG

 LAG (field, offset)

LEAD

 LEAD (field, offset)

FIRST_VALUE

 FIRST_VALUE (field)

LAST_VALUE

 LAST_VALUE (field)

NTH_VALUE

 NTH_VALUE (field, n)

Those that can be used as window functions:

test_eq(Max(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'MAX(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(Min(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'MIN(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(Sum(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'SUM(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(Avg(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'AVG(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(Count(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'COUNT(col1) OVER (PARTITION BY col2 ORDER BY col3)')

test_eq(RowNumber().over(Field('col1')).orderby(Field('col2')).get_sql(), 
        'ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(Rank().over(Field('col1')).orderby(Field('col2')).get_sql(), 
        'RANK() OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(DenseRank().over(Field('col1')).orderby(Field('col2')).get_sql(),
        'DENSE_RANK() OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(PercentRank().over(Field('col1')).orderby(Field('col2')).get_sql(),
        'PERCENT_RANK() OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(CumeDist().over(Field('col1')).orderby(Field('col2')).get_sql(),
        'CUME_DIST() OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(Ntile(5).over(Field('col1')).orderby(Field('col2')).get_sql(),
        'NTILE(5) OVER (PARTITION BY col1 ORDER BY col2)')
test_eq(Lag(Field('col1'), 1).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'LAG(col1, 1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(Lead(Field('col1'), 1).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'LEAD(col1, 1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(FirstValue(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'FIRST_VALUE(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(LastValue(Field('col1')).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'LAST_VALUE(col1) OVER (PARTITION BY col2 ORDER BY col3)')
test_eq(NthValue(Field('col1'), 2).over(Field('col2')).orderby(Field('col3')).get_sql(),
        'NTH_VALUE(col1, 2) OVER (PARTITION BY col2 ORDER BY col3)')

Count and Sum can have a .distinct method.

test_eq(Count(Field('col')).distinct().get_sql(), 'COUNT(DISTINCT col)')
test_eq(Sum(Field('col')).distinct().get_sql(), 'SUM(DISTINCT col)')