#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.