GLM Process

1-Pick distributions

  • Claim sizes and frequencies are generally modelled using Gamma and Poisson distributions respectively.
  • Otherwise, there would need to be further examination of the response variable.

2-Pick link function

  • This depends on the nature of the response variable. For example, a non-negative variable would use the log link function, whereas a variable between 0 and 1 would use a logit link function.

3-Analyse data

  • What explanatory variables have been provided?
  • What does the response variable look like by each explanatory variable? One way summaries or pivot tables could be analysed for insight.
  • Consider grouping for categorical variables.
  • Consider transformations for variables with non-linear shapes.

4-Optimising/fitting using maximum likelihood estimation (most likely done with a program).

5-Assessing output and p values

  • Some variables may be dropped based on their statistical insignificance.
  • Data mining or decision trees can help to find areas that are not fitting well and refine them.
  • Address any large individual observations or outliers distorting results.
  • Fitting curves to reduce over-fitting.
  • This may be an iterative process requiring judgment.

6-Testing how well the GLM predicts using a subset of the data

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s