banner



How To Apply Filter In Tableau By Removing Duplicates

One of the questions that comes up quite oft is "How do I remove duplicate records in Tableau Prep?" or "How do I dedup information in Tableau Prep?" Some data preparation tools have a specific feature to do this. At start glance, the first version of Tableau Prep doesn't seem to take this feature. BUT information technology is definitely possible to remove duplicate records in Tableau Prep. And not only that, only it's very EASY to remove duplicates. (And if you lot make it to the end, you'll get a a bonus: An LOD adding in Tableau Prep!)

Allow's talk nearly three possible cases of removing duplicate records in Tableau Prep (each gets a bit harder):

i: Exact Indistinguishable Records in Tableau Prep

Permit's say you've got a data set that looks like this:

Yous can see the Employee ID iii has 2 exactly identical duplicate records.  Everything is the same: the ID, the Name, the Appointment.  It's a true duplicate.  No matter how many duplicate records yous had, you could do the following in Tableau Prep:

All you have to do is add together an amass and add all fields to the Grouped Fields section.  Zip volition exist aggregated.  Simply unique rows are retained while duplicates volition vanish earlier your optics.  You can proceed the Tableau Prep flow with a nicely deduped data set.  (and thank you to Tom Fuller for pointing out this approach!)

ii: Similar, but not exactly, Duplicate Records

Let'south modify the data set slightly and consider how nosotros might eliminate the duplicate records in Tableau Prep:

In this example, Walvoord was hired once in 1997 so subsequently re-hired in 2014 (this information set doesn't indicate the reason – did he have an intermediate job or was it rent to a new position?)  Whatever the cause for this information, permit's say we simply intendance to know the most recent rent engagement.

You've probably already jumped to a possible solution for removing this duplicate data.  It might wait similar this:

It'southward very similar to the previous solution, but here, we're grouping simply past Employee ID and Name while the Date Hired field gets aggregated equally a MAX.  That means for every unique gear up of employees, nosotros'll go the max hire date.  We remove the unwanted near-duplicate record and terminate up with this output:

three: Extended Near Indistinguishable Records

I've already covered how to get the latest snapshot of recordselsewhere, simply it fits nicely here because information technology is a case of duplicate data that needs to be deduped.  Allow's extend the previous input data set and so that not only is there a most recent hire date, just we can't simply group and aggregate because in that location is additional data in the record that nosotros need to retain.

Now we know why Walvoord has two hire dates – it represents two different positions.  And then, we need to be able to continue the tape with the latest hire appointment but we also need to go along the other values for that record intact.  Nosotros can't utilize the previous approach to GROUP by Position because nosotros'd get ii records.  And nosotros can't Amass by Position either because we'll cease up with an arbitrary value (should information technology exist MIN or MAX – one of those might piece of work in this specific example, but other combinations of Positions will have dissimilar results whatever we chose).

Instead, we'll extend our solution just as the data set has been extended.   The first step is the same as Case #2.  We'll group by Employee ID and Name and find the Maximum hire date.  The output shown to a higher place is our outcome.  Simply we demand to have that information and utilise it to match to the full data set to identify the unabridged records with the latest date.  It'll wait something like this:

The Max Date footstep is only a renaming of the Dedup step in case #ii.  But then, we've joined it to the previous step in the menstruum!  Yes, you lot can bring together to previous steps in the menstruum – but by dragging and dropping!

And the bring together, as you can run into above, is washed on the Employee ID equally well as the Date Hired – which is all dates in the original data set matched to the latest (MAX) dates from the aggregation.  And you can even encounter the reddish color of the earlier date for Employee iii that we wanted to exclude.

Bonus: LOD Calculation in Tableau Prep

Guess what? The solution to Case #3 is an LOD adding in Tableau Prep!  If yous were wondering if you could write a Fixed LOD – you can! Information technology'south just visual!

Imagine you lot were writing a calculation to find the latest record in Tableau Desktop.  Y'all'd write:

{FIXED [Employee ID] : MAX([Date Hired]}

which would give you the latest appointment per employee.  you could extend the adding to lucifer information technology to the engagement of the record and get a boolean to determine whether it was the latest or not:

[Date Hired] = {FIXED [Employee ID] : MAX([Date Hired]}

When the Hired Appointment matched the LOD issue, yous take the most recent tape for that employee.  Filter to go along only true values and you've deduped your data set up.

"But wait!" you say.  "Tableau Prep doesn't back up LOD syntax (at least in version 1)."

Ah, just we merely matched the logic of the calculation with our flow in Tableau Prep.  Check information technology out:

And there yous become! You can now remove indistinguishable rows in Tableau Prep and even write LOD expressions using a Tableau Prep information menses!

How To Apply Filter In Tableau By Removing Duplicates,

Source: https://vizpainter.com/how-to-remove-duplicate-records-in-tableau-prep/

Posted by: kellyeldis1975.blogspot.com

0 Response to "How To Apply Filter In Tableau By Removing Duplicates"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel