Public Interface
Prototypes
Module
Prototypes.custom_logger Method
custom_logger(filename; kw...)
Arguments
filename::AbstractString
: base name for the log filesoutput_dir::AbstractString=./log/
: name of directory where log files are writtenfiltered_modules_specific::Vector{Symbol}=nothing
: which modules do you want to filter out of logging (only for info and stdout) Some packages just write too much log ... filter them out but still be able to check them out in other logsfiltered_modules_all::Vector{Symbol}=nothing
: which modules do you want to filter out of logging (across all logs) Examples could be TranscodingStreams (noticed that it writes so much to logs that it sometimes slows down I/O)log_date_format::AbstractString="yyyy-mm-dd"
: time stamp format at beginning of each logged lines for dateslog_time_format::AbstractString="HH:MM:SS"
: time stamp format at beginning of each logged lines for timesdisplaysize::Tuple{Int,Int}=(50,100)
: how much to show on log (same for all logs for now!)log_format::Symbol=:log4j
: how to format the log files (stdout is always pretty); I have added an option for pretty (all or nothing for now)overwrite::Bool=false
: do we overwrite previously created log files
The custom_logger function creates four files in output_dir
for four different levels of logging: from least to most verbose:filename.info.log.jl
,filename.warn.log.jl
,filename.debug.log.jl
,filename.full.log.jl
The debug logging offers the option to filter messages from specific packages (some packages are particularly verbose) using thefilter
optional argument The full logging gets all of the debug without any of the filters. Info and warn log the standard info and warning level logging messages.
Note that the default overwrites old log files (specify overwrite=false to avoid this).
Prototypes.panel_fill! Method
panel_fill!(...)
Same as panel_fill but with modification in place
Prototypes.panel_fill Method
panel_fill(
df::DataFrame,
id_var::Symbol,
time_var::Symbol,
value_var::Union{Symbol, Vector{Symbol}};
gap::Union{Int, DatePeriod} = 1,
method::Symbol = :backwards,
uniquecheck::Bool = true,
flag::Bool = false,
merge::Bool = false
)
Arguments
df::AbstractDataFrame
: a panel datasetid_var::Symbol
: the individual index dimension of the paneltime_var::Symbol
: the time index dimension of the panel (must be integer or a date)value_var::Union{Symbol, Vector{Symbol}}
: the set of columns we would like to fill
Keywords
gap::Union{Int, DatePeriod} = 1
: the interval size for which we want to fill datamethod::Symbol = :backwards
: the interpolation method to fill the data options are::backwards
(default),:forwards
,:linear
,:nearest
email me for other interpolations (anything from Interpolations.jl is possible)uniquecheck::Bool = true
: check if panel is cleanflag::Bool = false
: flag the interpolated valuesmerge::Bool = false
: merge the new values with the input dataset
Returns
AbstractDataFrame
:
Examples
- See tests
Prototypes.tabulate Method
tabulate(df::AbstractDataFrame, cols::Union{Symbol, Array{Symbol}};
reorder_cols=true, out::Symbol=:stdout)
This was forked from TexTables.jl and was inspired by https://github.com/matthieugomez/statar
Arguments
df::AbstractDataFrame
: Input DataFrame to analyzecols::Union{Symbol, Vector{Symbol}}
: Single column name or vector of column names to tabulategroup_type::Union{Symbol, Vector{Symbol}}=:value
: Specifies how to group each column::value
: Group by the actual values in the column:type
: Group by the type of values in the columnVector{Symbol}
: Vector combining:value
and:type
for different columns
reorder_cols::Bool=true
Whether to sort the output by sortable columnsformat_tbl::Symbol=:long
How to present the results long or wide (stata twoway)format_stat::Symbol=:freq
Which statistics to present for format :freq or :pctskip_stat::Union{Nothing, Symbol, Vector{Symbol}}=nothing
do not print out all statistics (only for string)out::Symbol=:stdout
Output format::stdout
Print formatted table to standard output (returns nothing):df
Return the result as a DataFrame:string
Return the formatted table as a string
Returns
Nothing
ifout=:stdout
DataFrame
ifout=:df
String
ifout=:string
Output Format
The resulting table contains the following columns:
Specified grouping columns (from
cols
)freq
: Frequency countpct
: Percentage of totalcum
: Cumulative percentage
TO DO
allow user to specify order of columns (reorder = false flag)
Examples
See the README for more examples
# Simple frequency table for one column
tabulate(df, :country)
## Group by value type
tabulate(df, :age, group_type=:type)
# Multiple columns with mixed grouping
tabulate(df, [:country, :age], group_type=[:value, :type])
# Return as DataFrame instead of printing
result_df = tabulate(df, :country, out=:df)
Prototypes.winsorize Method
winsorize(
x::AbstractVector;
probs::Union{Tuple{Real, Real}, Nothing} = nothing,
cutpoints::Union{Tuple{Real, Real}, Nothing} = nothing,
replace::Symbol = :missing
verbose::Bool=false
)
Arguments
x::AbstractVector
: a vector of values
Keywords
probs::Union{Tuple{Real, Real}, Nothing}
: A vector of probabilities that can be used instead of cutpointscutpoints::Union{Tuple{Real, Real}, Nothing}
: Cutpoints under and above which are defined outliers. Default is (median - five times interquartile range, median + five times interquartile range). Compared to bottom and top percentile, this takes into account the whole distribution of the vectorreplace_value::Tuple
: Values by which outliers are replaced. Default to cutpoints. A frequent alternative is missing.IQR::Real
: when inferring cutpoints what is the multiplier from the median for the interquartile range. (median ± IQR * (q75-q25))verbose::Bool
: printing level
Returns
AbstractVector
: A vector the size of x with substituted values
Examples
- See tests
This code is based on Matthieu Gomez winsorize function in the statar
R package
Prototypes.xtile Method
xtile(data::Vector{T}, n_quantiles::Integer,
weights::Union{Vector{Float64}, Nothing}=nothing)::Vector{Int} where T <: Real
Create quantile groups using Julia's built-in weighted quantile functionality.
Arguments
data
: Values to groupn_quantiles
: Number of groupsweights
: Optional weights of weight type (StatasBase)
Examples
sales = rand(10_000);
a = xtile(sales, 10);
b = xtile(sales, 10, weights=Weights(repeat([1], length(sales))) );
@assert a == b