#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #mixed-model #permutation-test Two sample test clustered data, continuous variable with zeros

Bounty: 150

I have a situation where an experiment is being run in the following manner:

A one stage cluster sampling (I think this is accurate description ) is conducted whereby there are multiple organizations selected (10 of them)and within each of these organization, there are sub-organizations, whose members are all assigned as either treatment or control. So, within the larger selected organizations, the members of the smaller organizations are all treatment or all control. An analogy could be school districts being the top level selected organizations, whereby a school is selected within the district and their students are all treatment or all control.

The variable being measured is a continuous variable with a mass at zero.

I need to determine a confidence interval on the treatment effect taking into account the clustering and the zeros in the target variable.

What are my best options?

I have thought about maybe some permutation test where all combinations of treatment and control are scrambled , where all treated sub organization become control and vice versa.

The other thought was a tweedie regression with a random intercept. The cplm package in R seems to do this, but a confidence interval needs to be a wald CI ( estimate +/- 1.96 * SE).


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!

#StackBounty: #regression #logistic #categorical-data #prediction Categorising US prescription data as 'Loading Dose' or 'M…

Bounty: 100

I have a dataset of patient prescriptions that I need to label ‘Loading Dose’ [LD] or ‘Maintenance Dose’ [MD]. Only the MD part of treatment is thought effective and I need to run some models on this part.

For each prescription I have the following data:-

| Name        | Level          | Type       | Desc                                                
----------------------------------------------------------------------------------------- 
| Patid       | Patient        | Ordinal    | ID value uniquely identifying patient                |
| Ther_Grp    | Therapy Group  | Categorical| Therapy group name (e.g. Infliximab, Secukinumab). 7 Groups |
| NDC_Code    | Therapy        | Categorical| Code uniquely identifying therapy (approx 47 values)|
| NDC_Desc    | Therapy        | Categorical| Description of pack.  1:1 with NDC_code.    |
| NDC_Size    | Therapy        | Ordinal    | # of syringes in a standard pack. 1:1 with NDC_Code|
| Metsize     | Therapy        | Categorical| Metric Size.  Often ML of base liquid in syringe. 1:1 with NDC_Code  | 
| Strength    | Therapy        | Ordinal    | Mg per syringe. 1:1 with NDC_Code |
| Days_Supply | Prescription   | Continuous | Number of days script was prescribed for           |
| Days_Next   | Prescription   | Continuous | Days to next script (of same therapy type)     |
| Daily_Dose  | Prescription   | Continuous | Day Dose (mg/ml/day).  Num syringes * strength / days_supply|
| Num_Syr     | Prescription   | Ordinal    | Number of syringes prescribed (often 1,2,4,8 or 12)   |
| Pay         | Prescription   | Continuous | Amount paid for prescription ($)| 
| Scr_Date    | Prescription   | Ordinal    | Date of prescription |
| Total_Dose  | Prescription   | Continuous | Total script dose (Num Syringes * Strength) |
----------------------------------------------------------------------------------------- 

There’s no data that’s correctly labeled which could be used as training data.

The therapy manufacturer lists the dosing regime for their product. Here’s one:-

[Loading Dose] covers your first month of injections, which are provided at the start of therapy at week 0 (day 0), week 2 (day 14), and week 4 (day 28). Each of these doses is 400 mg, given under the skin as two separate injections of 200 mg each.

…there are two maintenance dosing options available to you:

  • One 200-mg injection taken every two weeks (14 days) or
  • Two 200-mg injections taken every four weeks (28 days)

I could use the date only. Anything after week 6 would be labeled MD. However, it’s felt a large minority start their MD early.

Instead, we could sum dose. Anything after 3 x 400mg = 1200mg of medication would be MD. But the gaps between scripts might be too big for the drug to reasonably be at the MD level in the patient’s body. Clearly some type of date restriction is needed.

I guess I need a function with both time and cumulative dose that increases the probability of MD when either increases. This suggests a logistic regression. However, how does one incorporate the target of ‘around’ 6 weeks or ‘at least’ 1200mg into the model?

Thanks.


Get this bounty!!!