Subgroup analysis

This is a tip message.

When running a pay gap analysis, it is essential to consider whether performing a subgroup analysis is appropriate for your organization's structure. We recommend starting with the Creating a compensation model article, where you will find detailed guidance on how to run a pay gap analysis. This article builds on that, walking you through why, when, and how to use subgroup analysis as part of your pay equity assessment.

When and why should you run a subgroup analysis?

If your organization has diverse pay structures across different regions or departments, it can be beneficial to conduct a subgroup analysis. For instance, in a global organization, pay structures may vary significantly between countries. In Country A, salaries might be heavily based on experience, or education and skills, whereas in Country B, the job family might play a more important role in determining pay.

By running a subgroup analysis, you can generate separate models for each country (or any other subgroup), providing unique analysis results for each one. This approach allows you to understand how different variables impact pay structures in each subgroup, offering unique insights at a local level.

With subgroup analysis, you obtain individual compensation models for each subgroup, showing how the variables affect salaries differently across the subgroups. Each model will have its own health indicators such as the R² value, reflecting the proportion of variability explained by the model in that specific subgroup.

Instead of evaluating a single, overarching model, you will have multiple models corresponding to each subgroup, enabling a more nuanced and accurate analysis of pay equity within your organization.

Choosing the right approach: whole-population vs. subgroup models

Before running your analysis, a key question to consider is: Does grade 3 mean the same thing in Germany as it does in the UK? More broadly, should the analysis be based on the whole dataset, or broken down by subgroups?

The answer depends on how pay is structured in your organization and the purpose of the analysis. Pay logic refers to the rules your organization uses to assign grades or job families and set pay levels. If the same pay logic applies across your dataset, a whole-population model is usually the right choice. If pay logic differs across groups, subgroup analysis may be needed.

Whole-population models

Use a whole-population model if:

The logic of pay is consistent across the organization, meaning grades, job families, and evaluation criteria are applied the same way everywhere.
The goal is strategic oversight, enabling organization-wide reporting, benchmarking, and progress tracking.
You want stable, generalizable results, as larger datasets reduce the risk of overfitting and random noise.
Splitting the data would be artificial, since differences in pay levels alone do not justify subgroups.
You want to minimize instability from very small groups that can distort results.

Subgroup models

Use subgroup models if…

The logic of pay differs across groups. For example, the same grade has different meanings in different countries or entities. If grade 8 includes very different jobs in Austria vs. Germany, a subgroup model captures that difference, while a whole-population model would blur it.
You need local actionability, for instance for country-level compliance checks or direct entity-level diagnostics.
Decentralized HR structures require local models so managers can act on their own workforce insights.
Groups have enough employees spread across key categories (like grades or job families) so that the model can detect real patterns in pay, rather than simply describing the pay of one or two individuals – and to ensure results do not compromise individual privacy.

Subgrouping should never be used just to produce a "nicer" gap. You need a diagnostic lens, not a cosmetic one.

Checks for model quality

Regardless of which approach you choose, there are a few quality checks worth applying. An R² of approximately 85% or higher typically indicates a strong fit for compensation data, though this should be treated as a guideline rather than a rigid cutoff. Key categories should also be repeatedly represented, ideally with 5–10 or more employees per grade, job family, or other grouping. This way, the model reflects real patterns in groups, not only for a few individuals. Finally, ensure there is sufficient gender (or demographic) mix to allow for meaningful comparisons.

Practical guidance

Choose an approach built on the purpose of the analysis. Beginning with a whole-population model usually provides coherence and captures broad patterns.
Test subgroup models only when you have reason to believe pay logic differs. Document those reasons clearly.
Prioritize stability over appearances. A model's role is to reflect reality, not to deliver the most flattering number.
Communicate purposefully. Whole-population models are useful for high-level reporting, while subgroup results should be positioned as diagnostic tools.

Key takeaways

Whole-population models offer robust, strategic, and organization-wide insights. On the other hand, subgroup models are best suited to local analysis, and only when groups are large enough and governed by different pay logic. Above all, don't let outcomes drive model choice. The structure of your pay system and the purpose of the analysis should always determine your approach.

Running a subgroup analysis

The subgroup analysis feature allows you to run the same analysis on subsets of the data, for example for each business unit, each department, or each country.

To run a subgroup analysis, proceed as follows:

Enter the run analysis form.
Check the Run a subgroup analysis checkbox.
Select the group Variable.
Choose the specific Subgroups for which to run the analysis.
Run the analysis.

After the analysis has completed running, an overview panel is presented at the top of the page and the overview page is shown. The overview graph has one point for each subgroup:

the size represents the number of employees in the subgroup,
the location indicates the unadjusted and adjusted pay gap,
the color indicates which group is underpaid for the unadjusted and the adjusted pay gap.

This information is also displayed in the table on the left.

We also include an overview summary, which combines the results of all the subgroup analyses. As the point of a subgroup analysis is to run independent models for each subgroup, the overview of the subgroup analyzes presents the pay gap as the weighted average of the pay gap in the individual subgroups. The weighted average is the subgroups' pay gaps weighted by the number of employees in each subgroup.

Therefore, the pay gaps shown in the overview analysis for a subgroup analysis are different from the results you will get if you run one analysis on the entire dataset. In that case, the pay gaps are based on the analysis from a single regression for all employees.

To view the results of a specific subgroup analysis, click either the dot in the graph or the row in the table. This brings up the results from the corresponding subset analysis, displayed in the same manner as a single-level analysis. Clicking the Compensation Model Results tab will show the compensation model for the selected subgroup, and similarly the Reports tab will highlight the selected subgroup.

This is a tip message.

For more information about the compensation model, see Understanding & analyzing the compensation model.

You can use the drop-down to switch between subgroups or click the x icon next to it to go back to the summary for all groups. If you want to have a quick look at the subgroup overview graph and table, click Display subgroup overview.

This is an note/information message.

Editing a suggested raise in the subgroup will also edit the raise for that employee in the overview analysis for all employees.

When running a subgroup analysis, the pay gap is measured and closed separately in each subgroup, meaning that there may be some groups where raises are suggested for women and others where raises are suggested for men (assuming we run the analysis for gender).

Backwards elimination in subgroup analysis

Given the often small size of the analysis groups, the system automatically selects to apply backwards elimination with a default p-value threshold of 0.25. In other words, from the analysis of each subgroup, variables that are not significant are removed from the model. As a result, the variables included in each model may differ from one subgroup to another. Further, the coefficients for the variables in each subgroup will also differ - for example, in some job roles education may play a major part in determining compensation, while in others experience may be more important. To use another p-value for backwards elimination for subgroups, you can check the optional backwards elimination checkbox and apply a new threshold number. Please note that selecting this option also applies backwards elimination to the overall analysis.

This is an note/information message.

In case your dataset involves large subgroups and your analysis takes a long time to run, you can consider manually disabling backwards elimination to speed the analysis up. We do, however, recommend checking the significance of variables in the compensation results tab, and manually removing any that do not significantly contribute to your model.