*Bounty: 100*

*Bounty: 100*

I understand the title is too generic. I tried to look for similar questions and although there were a few that were seemingly about the same issue, either they provided answers in the negative or had no convincing answers or they suggested the use of copulas.

Since I have no working knowledge of copulas if they are actually the answer to my problem I am going to have to invest some time in getting acquainted with them, but before I do I would like to know if I should indeed invest the time, in the first place. Hence this question.

I have a population of individuals with a certain number of characteristics eg unemployed persons over some period of time; I know how many of them are located in a certain district (characteristic #1) also I know how many of them have achieved a certain education level eg MSc or relevant level (characteristic #2) but I don’t have data on location *and* education for the same individual.

Given that the available info is something like the following table (for simplicity I don’t include all the relevant characteristics-just ‘*location*‘ (rows) and ‘*education*‘ (columns)):

```
| "MSc or higher" "other edu" | sum
___________|________________________________|_______________________________________
"Region A" | x a | n_A (unemployed in region A)
| |
"rest regs"| y b | n_U-n_A (unemployed in other regions)
___________|________________________________|_______________________________________
sum | n_MSc n_U-n_MSc | n_U (unemployed persons)
| (unemployed (unemployed |
| with MSc) with other |
| education) |
```

- is it warranted to claim that eg $frac{n_{MSc}}{n_{U}}$ is a measure of the risk of unemployment that a person with an education level equivalent or better than a MSc degree faces? Similarly, is eg $frac{n_{A}}{n_{U}}$ a measure of the risk of unemployment for a person situated in Region A?
- If the table above is reinterpreted as representing the unemployment risk associated with the relevant cell each time (ie if we divide the rightmost column and bottom row with $n_U$ to obtain marginal prrobabilities for the corresponding rows/columns and replace $x,y,a$ and $b$ with $p_x,p_y,p_a$ and $p_b$-the respective–
*unknown*–joint probabilities) is there a way to retrieve those joint probabilities using only what information is contained in the tables presented above? - Are there plausible assumptions/restrictions that would assist or facilitate the calculations for finding the joint probabilities (eg some proposed/assumed relation between conditional frequencies) within reasonable bounds and for the purpose of having a rough estimate of what the actual figures about the joint instances of characteristics would be eg if more refined data sources (eg data sources detailing those joint frequencies) are considered?

( *I apologize for the crude table layout but I was unable to use latex properly* )