Dan,
I haven't been following this thread, but your comment on patents caught
my eye.
I am not an attorney either, but I think the patent law changed within the
past 10 years to allow ideas to be patented. In fact, Guy Carpenter
recently (within the past few years) filed for and received something like
22 patents on their reserving software. Reading the patent application it
looks to me like only one of the patent items is an original idea from Guy
Carpenter. It also seemed like everything they received patents for would
have already been covered by copyright laws. If you are interested, you
should be able to do an internet search for their patent application or if
you can't find it I'm sure I have a copy someplace.
Mark
Mark Shapland, FCAS, FSA, MAAA | Senior Consulting Actuary |
mark.shapland at milliman.com
Milliman | Liberty House, Unit 809, Level 8 | Dubai International
Financial Centre | Dubai P.O. Box 506784 | United Arab Emirates
Main +971 4 386 6990 | Fax +971 4 386 6950 | Mobile +971 56 179 1532
| milliman.com
-----Original Message-----
From: r-sig-insurance-bounces at r-project.org [mailto:
r-sig-insurance-bounces at r-project.org] On Behalf Of Dan Murphy
Sent: Thursday, October 02, 2014 12:35 AM
To: Brian Fannin
Cc: R-Sig-Insurance at R-Project. Org; Edward Roche
Subject: Re: [R-sig-ins] General Insurance Data
I don't agree, Brian, that that's the question I'm begging.
I don't have all the details, but the story goes that a while back (pre21st
century?) an actuary tried to patent an actuarial method/calculation that
he/she invented. I can't even remember if that attempt was successful. But
the actuarial community was against such endeavors. I'm in that camp,
having learned on my daddy's knee that you can't patent an idea. (He was
not a patent attorney!)
OTOH, Apple Computer (and others) made a business out of patenting the
look and feel of their products, including their software. (E.g., I cannot
zoom in on an Instagram photo on my android phone, but last night was
demonstrated the three-finger-tap to do it on an iphone!) I don't believe
the tool that creates the look and feel is germane to the patent, but I'm
not a patent attorney either.
On Wed, Oct 1, 2014 at 10:42 AM, Brian Fannin <BFannin at redwoodsgroup.com>
wrote:
That begs the question of which tool you prefer for a GUI? I?ll
confess that it?s fairly easy to knock something together in Excel.
However, I?d like to move to something that has more natural support for
*From:* Dan Murphy [mailto:chiefmurphy at gmail.com]
*Sent:* Wednesday, October 01, 2014 1:22 PM
*To:* Brian Fannin
*Cc:* Christophe Dutang; Edward Roche; Markus Gesmann;
R-Sig-Insurance at R-Project. Org
*Subject:* Re: [R-sig-ins] General Insurance Data
Thank you for that reference. Looks nice. More than the original
poster was looking for?
IMO, open-source is a great vehicle for distributing the guts
engineering that accommodates data exchange and munging. OTOH, I'm not
convinced that open-source is the right vehicle for distributing the
"look and feel" of a GUI. But I'm "open" for persuasion. :)
Dan
On Tue, Sep 30, 2014 at 10:37 AM, Brian Fannin
<BFannin at redwoodsgroup.com>
wrote:
Can?t believe I forgot to mention that! I remember the presentation
and it looks to be a very powerful tool. The integration with LaTeX is
particularly nice.
*From:* Christophe Dutang [mailto:dutangc at gmail.com]
*Sent:* Tuesday, September 30, 2014 11:30 AM
*To:* Brian Fannin
*Cc:* Edward Roche; Markus Gesmann; chiefmurphy at gmail.com;
R-Sig-Insurance at R-Project. Org
*Subject:* Re: [R-sig-ins] General Insurance Data
On the GUI part, there is a private software RPGM (
http://www.pgm-solutions.com/) offering what Edward is looking for.
This tool was presented at the r in insurance conference this year also!
It is partially developed by an actuary.
Christophe
Le 30 sept. 2014 ? 15:25, Brian Fannin <BFannin at redwoodsgroup.com> a
?crit :
Edward,
If I?m reading your message right, the problem seems to break down
into two parts: 1) reshape claims data, either by aggregating
individual claim amounts or reshaping data from wide to long. 2)
Provide a GUI to guide the user through the process of munging the data
so that it may be analyzed.
On the first point, it would be helpful to identify a typical
structure which must be modified. MRMR doesn?t have a function to go
from wide to long as that?s a very simple melt command. The data I use
in my work lives in a RDBMS in long format, so I?m sorted. I?ve toyed
with the idea of a ?collapse? function that would aggregate data along
temporal dimensions, or summarize across a hierarchical axis (i.e. sum
all of the territories into a single country). Such a function would
be general enough that it could summarize individual claim
transactions, though that?s likely better done by a database server.
One the second point, note that an absence of a GUI is a universal
issue with R. However, it?s worth emphasizing that this issue also
exists with Excel. A comfortable user interface which permits users to
flag columns as development, or origin period or whatever, must be
constructed by hand. Dom Yarnell?s EBDEx tool was a great example of
this. Shiny is a powerful option for a user interface to support the
kinds of transformation you describe. It?s possible to upload and
download files and there are a number of user widgets for
identification of the structural elements of data. One may also
display some basic ggplot type graphics for basic exploratory
analysis. I?ve started a project called ?shout? which will eventually
be a GUI for MRMR. It?s been dormant for about three months, but have
a look- or have a look at Shiny apps in general- and see if that?s a
direction that makes sense.
Do you have a GitHub repository list where others can contribute? Mine
may be found here: https://github.com/PirateGrunt/.
Also, have a look at XLConnect. RExcel actually requires a commercial
license, whereas XLConnect is GPL 3. XLConnect allows one to read and
write to/from Excel files. This is what I use for the ?write.excel?
method in MRMR.
Regards,
Brian
*From:* Edward Roche [mailto:ed.roche at yahoo.co.uk
<ed.roche at yahoo.co.uk>]
*Sent:* Monday, September 29, 2014 7:19 PM
*To:* Brian Fannin; 'Markus Gesmann'; Christophe Dutang;
chiefmurphy at gmail.com
*Cc:* R-Sig-Insurance at R-Project. Org
*Subject:* Re: [R-sig-ins] General Insurance Data
Hi All,
Thank you all for your responses. What I have in mind is a package to
help with the initial stage of getting claims data into R and ready to
use (formatting etc) before any modeling is done. Effectively to
provide a workflow for working with actuarial claims data.
My past experiences, using GI reserving projection programs, have
usually involved a stage to summarise the data in some format then
later feed this data into a program, be it a proprietary program or one
The key stage I wish to contribute to here is where the data has to be
fed in to R. MRMR and ChainLadder do allow for reshaping of the data
(and loading data) but I would like to build on the functionality in
these packages while not reinventing the wheel. I will take some time
to digest the MRMR package as this does seem to have a lot of
functionality that I could use for manipulating the data.
A package providing a workflow to get data into a basic tabular
structure would act as a stepping stone for getting claims data
available and cleaned for use in packages such as ChainLadder or MRMR.
It would even be easier to perform other more basic analysis. As I see
it at the moment when someone gets to the stage that the data is
loaded and formatted in R they are basically ready to go and use MRMR
or ChainLadder *but* this is a very big hurdle for novice R users who
typically have data stored elsewhere (SAS, SQL Excel or other flat
files). Even moving from Excel to R, where data can be stored in
objects as opposed to tables is a giant leap in its own right and I
fear this is the stage where a lot of novice R Actuarial users are
lost or lose confidence in R. I am not a professional programmer and I
am not very knowledgeable about different types of connections or
porting files from one format to another so I was hoping to leave that
up to packages such as RODBC or xlsx. My idea is to provide a workflow
structure to load claims data into a data.table using a simple GUI
interface. Its worth mentioning, I have come across RExcel which
allows you to call R from Excel but what I am talking about is
somewhat different in that your using R as the backbone, rather than
Excel, for storing and manipulating the data. This would at the same
time leave the data in familiar tabular format but helper functions
could pivot the data and allow novice R users to create triangles
intuitively or extract useful representations such as latest diagonals
cumulative data, in year data etc.
It would still be easy to manipulate the basic underlying data
structure if desired as it is just a data.table but helper functions
would give it intuitive Actuarial functionality. The goal would be to
reduce the time to close to zero for a novice R Actuarial user to go
from using existing data (e.g. Excel or SAS) to a data object that
could then be passed to ChainLadder, MRMR or ggplot2.
The package Rattle,, has a lovely GUI interface for loading data. I
have something like this in mind to get people started. A GUI
interface could be used to specify key fields and formats such as
origin period etc. The data stored in R as a data.table would still be
in a very familiar tabular format and look similar to what novice R
users are used to when they have worked with data stored elsewhere in
SQL, SAS or Excel files. This would provide a bit of reassurance to
get working in R. It would also facilitate novice R users to plot the
data using ggplot or googlevis, reshape it, summarise it in tables for
printing and you also get the benefit that the data.table structure is
quite efficient for working with big in memory data.
My recent background is claims reserving in a GI company and claims
data is what I have initially in mind to tackle. Other data such as
pricing data, time series etc would also certainly lend themselves to
more specialised Actuarial data structures and I think I could work on
these down the road.
From my description above the package might just sound like its to
load and format the data. This is a non trivial exercise for novice R
users and such a package would reduce the barrier for people adopting
R especially at work. At the optimistic end of the scale I also think
it could make development data analysis accessible to non Actuaries
and could even be useful in other fields. I would love the help from
anyone that is interested in contributing as I am working on this solo
at the moment. I wish to work a bit on a basic proof of concept first
and when I get it working I will load it to git hub and post some links
I wanted first to check that such a package does not exist but it will
take a bit of time to get something useful and working. I also
appreciate any comments, contributions or feedback.
Thanks again for the help
Regards
Edward Roche
On 29/09/14 02:03, Brian Fannin wrote:
Edward,
It may not be exactly what you're looking for, but MRMR could have some
things you might like. My primary aim was to develop a package which made
data manipulation for loss reserving a bit more efficient. However, it has
application in a more general context as well. It begins with a clear,
robust yet flexible notion of an origin period
(accident/underwriting/report year). For each origin year, one may store
measured observations, which have an arbitrarily complex set of dimensions.
For example, written premium is allocated to territory, line of business,
currency, etc. For loss reserving, we go a step further and allow those
measures to change at regular evaluation points.
That's that for the data structure. I've tried to build in reasonable
support for simple multidimensional/multivariate visualizations. I've got a
lot more work to do here, but it's a good start. There's also a new generic
method "write.excel" which stores information to a spreadsheet in a
reasonable way. This is a pragmatic feature to facilitate data exchange
with folk who aren't yet using R. The loss reserving methods presume a
linear model framework with chain ladder as a special case. We use mixed
effects regression for "loss reserving with credibility".
Attached is a very short vignette which may help you to understand how
the package is meant to be used. I'm happy to talk more about it, if you're
interested.
Note that I'm describing features on the development version of MRMR.
The CRAN version is more limited. I hope to get the new version submitted
and approved before the end of the year.
Regards,
Brian Fannin
-----Original Message-----
From: Markus Gesmann [mailto:markus.gesmann at googlemail.com
<markus.gesmann at googlemail.com>]
Sent: Sunday, September 28, 2014 6:36 AM
To: Christophe Dutang
Cc: Edward Roche; R-Sig-Insurance at R-Project. Org; Brian Fannin
Subject: Re: [R-sig-ins] General Insurance Data
Hi Edward,
Have you looked at Brian Fanin's MRMR package,
The package provides quite a bit on infrastructure for dealing with
Regards
Markus
On 27 Sep 2014, at 14:08, Christophe Dutang <dutangc at gmail.com> <
dutangc at gmail.com> wrote:
Dear Edward,
To my knowledge, there is no single package doing data manipulation in
read SAS or excel files. Typically the case, where there is a separator for
thousand. I did write some R functions :
- readfunc(filelist, yearlist, sheetname, pattern, nrow, ncol,
colname, col2conv, echo=FALSE, row2rm=NULL) : read a list of excel
file to extract the same sheet and concatenate the whole
- str2num(x) : convert a character string to a numeric dealing with
thousand separator
- concatmultcol(df, nbinfo, nbblock, col2foot, col2conv, col2rmpre,
cm2rmpost, cname2trunc=NULL, echo=FALSE) : concatenate blockwise data
into a rowwise dataframe etc...
In such function, I read excel files with gdata package.
As you say, for manipulating triangles, there are functions in
ChainLadder. By the way, I propose cum2incr and incr2cum to M. Gesmann.
Another function shrinkTriangle was not kept in this package: this function
computes an annual triangle from a monthly or quarterly basis.
For manipulating string, there is a good book by Gaston Sanchez, see
http://gastonsanchez.com/work/
At the last R in insurance conf, there was a good presentation of data
Do you have a detailed list of what type of data you want to
manipulate? for which purpose? (is it only for reserving purpose?) If
you are willing to create a package, I will be happy to contribute.
Regards, Christophe
-
Christophe Dutang
LMM, UdM, Le Mans, France
web: http://dutangc.free.fr
Le 27 sept. 2014 ? 01:16, Edward Roche <ed.roche at yahoo.co.uk> <
ed.roche at yahoo.co.uk> a ?crit :
Hi all,
I am interested in a package for working with General Insurance data,
i.e. something that deals with the initial stage of getting GI data into R,
reshaping it and subsetting it before performing projections using packages
such as ChainLadder or claim development visualisations using googlevis.
When working with GI data before you can do any meaningful analysis or
projections you need to get your data in the right format. Typically you
might start with incremental long format data showing claim transactions
and this needs to be reshaped and summarised into triangle matrices (e.g.
paid and incurreds). You can then perform projections with packages such as
ChainLadder or other simple development methods. This initial manipulation
stage is not always easy to do especially for novice users of R. There are
helper functions in the ChainLadder package such as incr2cum, as.triangle
etc that let you perform conversions on the fly but I cant find anything
for the wholesale restructuring or manipulation of GI data such as
converting it from Annual Quarterly to Annual Annual etc. This type of
manipulation is very easy to do in some proprietary software (not R based)
that I use on a daily basis.
I am considering working on a package that would generally provide
helper functions to load untidy GI data into R and let a novice R user
perform restructuring and manipulation on the fly. I envisage a GUI such
as is available in Rattle to load the data and specify the key variables
and formats. Once the data is loaded intuitive helper functions would let
you manipulate it on the fly. For example you might wish to pick out paid
and incurred triangles, subset the data in some way or convert it to Annual
Annual or from cumulative to incremental.
My question for this mailing list is are there any such packages out
there? or is anyone working on something like this? I would love to get
involved if so. Any other thoughts?